Data Profile Report
Generated on November 24, 2025 at 09:54 PM
Overview
This dataset captures granular student activity data from the OpenStax Tutor learning platform, documenting every step within assigned tasks including exercise responses, reading interactions, completion timestamps, and grading information. It enables analysis of student learning behaviors, assignment effectiveness, and educational outcomes.
Unit of Analysis
- Task Step - Each row represents a single step within a student's assignment (e.g., one exercise question, one reading section, one video). A complete assignment contains multiple steps, and each student's assignment generates separate step records.
Scope & Filters
- Temporal: Tasks created between specified date range (time1 to time2)
- Course Type: Production courses only (excludes preview courses; test courses optional via qtest parameter)
- Task Types: Includes homework (0), reading (1), external (2), event (3), practice (4), chapter practice (5), page practice (6), mixed practice (7), and concept coach (9)
- Task Plans: Excludes preview assignment templates
Key Dimensions
Student Context
- Anonymized student identifiers (research_identifier)
- Course and period enrollment information
- Student demographics (name fields for authorized use)
Assignment Structure
- Task hierarchy:
- Assignment metadata (title, type, creation date)
- Content ecosystem version tracking
Step Content
- Polymorphic step types (exercises, readings, videos, interactives)
- Content references (pages, exercises) with book location
- Exercise-specific data (questions, answers, tags)
Student Responses (for exercise steps)
- Answer submissions and correctness
- Free-response text and grading
- Response validation and feedback
- Attempt tracking
Temporal Data
- Step creation timestamps
- First and last completion times
- Assignment lifecycle tracking
Content Metadata
- Learning objective (LO) and skill tags
- Book and page references with hierarchical location
- Exercise difficulty and classification tags
- Textbook ecosystem versioning
Primary Use Cases
- Learning Analytics: Student engagement patterns, time-on-task analysis
- Content Effectiveness: Exercise difficulty calibration, content performance
- Educational Research: Learning behavior studies, intervention analysis
- Instructor Insights: Assignment completion rates, common misconceptions
- Adaptive Learning: Personalization based on performance patterns
Data Granularity
- Finest grain: Individual step within an assignment (e.g., question 3 of homework 5)
- Aggregation potential: Roll up to assignment level, student level, course level, or content level
- Temporal resolution: Precise timestamps for creation and completion events
Coverage
All student activity on non-preview assignments within the specified date range, including:
- Complete and incomplete assignments
- All step types (exercises, readings, videos, etc.)
- Graded and ungraded work
- Core and supplemental content
Technical Notes
- Generated from PostgreSQL production database via parquet file exports
- Optimized for large-scale data processing (millions of rows)
- Preserves referential integrity across related tables
- Left joins preserve steps without exercises/pages (NULL values expected)
Executive Dashboard
🟢 Data Quality Grade: A (92/100)
| Metric | Value | Status |
|---|---|---|
| Total Rows | 27,996,816 | ✅ |
| Total Columns | 95 | ✅ |
| Completeness | 80.3% | ⚠️ |
| Columns with Issues | 1 | ⚠️ |
| Processing Time | 90.4s | ✅ |
🔍 Data Quality Alerts
❌ Critical Issues
These issues require immediate attention:
- dropped_at: 98.0% null values (sparse data)
- school_district_school_id: 96.2% null values (sparse data)
- withdrawn_at: 93.7% null values (sparse data)
- title_exercise: Column is completely empty (100% null)
- grader_points: 99.9% null values (sparse data)
- grader_comments: 99.9% null values (sparse data)
- last_graded_at: 99.9% null values (sparse data)
- free_response_grade: 99.9% null values (sparse data)
- published_comments: 99.9% null values (sparse data)
⚠️ Warnings
Review these columns for potential issues:
- fragment_index: 67.2% null values (partial data)
- role_type: Only 1 unique values in 27,996,816 rows
- description: 52.4% null values (partial data)
- updated_by_instructor_at: 63.1% null values (partial data)
- title_exercise: Only 0 unique values in 27,996,816 rows
- free_response: 58.7% null values (partial data)
- answer_id: 52.0% null values (partial data)
- response_validation: 60.1% null values (partial data)
- book_location: 337 unique categories (consider if this should be text/identifier)
- course_name: 920 unique categories (consider if this should be text/identifier)
- city: 313 unique categories (consider if this should be text/identifier)
- title_1_school: Only 1 unique values in 27,996,816 rows
📊 Column Profiles
Columns are organized by data type for easier navigation.
🔑 Identifier (28 columns)
Unique identifiers (IDs, keys, UUIDs)
View identifier columns
| Column | Data Type | Variable Type | Completeness | Unique Values | Cardinality | Sample Values | Description |
|---|---|---|---|---|---|---|---|
| INTEGER | identifier | 100.0% | 27,996,816 | 100.0% | 28927409, 26352572, 40486573 | ||
| INTEGER | identifier | 100.0% | 1,325,391 | 4.7% | 1495846, 1640791, 2061782 | ||
| INTEGER | identifier | 100.0% | 24,886,892 | 88.9% | 14705499, 4923798, 39417216 | ||
| INTEGER | identifier | 98.1% | 28,019 | 0.1% | 214112, 199873, 128501 | Foreign key linking an individual student's task instance back to the instructor's original task plan (assignment template). This connects a student's specific assignment to the main plan that generated it. | |
| DOUBLE | identifier | 100.0% | 1,368 | 0.0% | 15534.0, 8142.0, 12789.0 | Unique identifier for teacher-created course on OpenStax Tutor | |
| INTEGER | identifier | 100.0% | 54,138 | 0.2% | 70579, 191533, 109791 | Unique identifier for a user's role within the system. This is the primary key that links users to their various roles (student, teacher, etc.) and connects them to their activities across courses. | |
| INTEGER | identifier | 100.0% | 1,368 | 0.0% | 3889, 10667, 16879 | ||
| VARCHAR | identifier | 100.0% | 54,138 | 0.2% | 27741d9b-20a1-49a8-baf1-b96111f89b8d, 49fae85e-53e5-485b-8f5e-4a9e5533bfaa, c1b9e7e4-e0f6-434c-bdb2-7950932f68b3 | Foreign key identifying which course this record belongs to | |
| INTEGER | identifier | 100.0% | 1,725 | 0.0% | 11951, 23255, 32965 | Course-period id in which the student was enrolled | |
| INTEGER | identifier | ⚠️ 3.8% | 16 | 0.0% | 1291, 1297, 1102 | Identifier for the school district for a specific OpenStax course | |
| INTEGER | identifier | 98.1% | 1,368 | 0.0% | 8538, 7788, 8563 | Unique identifier for the task plan that generated this task. A task plan is the instructor's assignment template/blueprint that defines what content should be assigned, to which students, and when. | |
| VARCHAR | identifier | 98.1% | 28,019 | 0.1% | 4325eaf3-dfd9-4cf9-a3e6-1452d8c51b0c, 1b6417f6-6628-4ddf-9eb1-4b53f2297b8a, 4427bca9-16bb-4b7b-ad6e-58721fd76f29 | ||
| INTEGER | identifier | 98.1% | 2,686 | 0.0% | 25525, 10008, 9658 | ||
| INTEGER | identifier | 59.4% | 139,456 | 0.8% | 544259, 585359, 333322 | ||
| VARCHAR | identifier | ⚠️ 48.0% | 185,631 | 1.4% | 393425, 316849, 371854 | Index of the correct response | |
| VARCHAR | identifier | 59.1% | 55,496 | 0.3% | 368103, 381186, 596529 | Index of the correct response | |
| VARCHAR | identifier | 59.4% | 57,355 | 0.3% | 94083, 96743, 147007 | Internal numeric identifier for questions | |
| VARCHAR | identifier | 59.4% | 16,642,469 | 100.0% | 3aa054c8-24fb-4ea3-8acc-645d75a569c2, d5c46ded-fdfd-469e-b534-4eedbda9c7cb, efc2674c-fad7-48e5-afc9-31ca35d26b4f | Internal unique identifier for exercises | |
| VARCHAR | identifier | 100.0% | 1,725 | 0.0% | a85fcd18-de0d-4ba0-b6ee-024b189f158e, 1c4c4018-16a8-4695-b826-0254d8a74a8b, 350c48f6-8063-4193-a750-5d1af43bf2c5 | Internal unique identifier for course periods created on OpenStax Tutor | |
| VARCHAR | identifier | 99.2% | 1,360 | 0.0% | fe9c820c-1816-4564-b46c-6a3f3cdee2cd, dfd8ce33-f709-4f85-bea0-8ac27a7e1f1f, 44b48e09-cb5a-4dee-85ed-3053b29caba7 | Internal unique identifier for courses created on OpenStax Tutor |
📑 Categorical (24 columns)
Categorical variables with limited distinct values
View categorical columns
| Column | Data Type | Variable Type | Completeness | Unique Values | Cardinality | Sample Values | Description |
|---|---|---|---|---|---|---|---|
| VARCHAR | categorical | 100.0% | 6 | 0.0% | Reading, Placeholder, Video, Interactive, Exercise ... (6 total levels) | Type of activity for the assignment step | |
| VARCHAR | categorical | 100.0% | 3 | 0.0% | core, personalized, spaced_practice | Type of intervention/formative assessment |
|
|
|
📈 Continuous (5 columns)
Continuous numeric variables (decimals, floats)
View continuous columns
| Column | Data Type | Variable Type | Completeness | Unique Values | Min | Max | Mean | Sample Values | Description |
|---|---|---|---|---|---|---|---|---|---|
| DOUBLE | continuous | ⚠️ 0.1% | 59 | 0 | 12 | 1.66 | 1.0, 2.0 | Instructor/Teaching Assistant assigned points for each assignment step but may not be published | |
| DOUBLE | continuous | ⚠️ 0.1% | 54 | 0 | 12 | 1.7 | Grader points when published |
🔢 Discrete (4 columns)
Discrete numeric variables (integers, counts)
View discrete columns
| Column | Data Type | Variable Type | Completeness | Unique Values | Min | Max | Mean | Sample Values | Description |
|---|
|
☑️ Boolean (6 columns)
Boolean variables (true/false)
View boolean columns
| Column | Data Type | Variable Type | Completeness | Unique Values | Cardinality | Sample Values | Description |
|---|---|---|---|---|---|---|---|
| BOOLEAN | boolean | 100.0% | 2 | 0.0% | true, false | Any steps that an instructor selected as core, as well as questions at the end of reading, that all students enrolled in the course had to complete | |
| BOOLEAN | boolean | 100.0% | 1 | 0.0% | false | The course/assignment is for preview | |
| BOOLEAN | boolean | 100.0% | 1 | 0.0% | false | It is a test course | |
| BOOLEAN | boolean | 77.9% | 2 | 0.0% | true, false | If it is a college course | |
| BOOLEAN | boolean | 98.1% | 1 | 0.0% | false | If assignment is for preview | |
| BOOLEAN | boolean | 59.4% | 2 | 0.0% | false, true | If the question is part of a multi-part |
📅 Datetime (22 columns)
Date and time columns
View datetime columns
| Column | Data Type | Variable Type | Completeness | Unique Values | Cardinality | Sample Values | Description |
|---|---|---|---|---|---|---|---|
| VARCHAR | datetime | 68.5% | 19,187,762 | 100.0% | 2021-10-03 18:44:10.275544, 2021-09-08 03:03:32.871312, 2019-02-20 20:40:24.434679 | ||
| VARCHAR | datetime | 68.5% | 19,187,770 | 100.0% | 2018-10-26 01:27:23.855325, 2019-04-15 19:07:12.109664, 2018-11-06 06:24:48.643782 | ||
| VARCHAR | datetime | 100.0% | 986,532 | 3.5% | 2023-03-24 00:54:36.306526, 2022-01-23 21:24:00.93717, 2021-01-25 18:17:16.641405 | ||
| VARCHAR | datetime | 100.0% | 19,663,182 | 70.2% | 2021-09-29 02:12:37.022564, 2019-01-24 22:05:25.188974, 2021-10-28 02:10:51.110241 | ||
| TIMESTAMP WITH TIME ZONE | datetime | 100.0% | 621,940 | 2.2% | 2022-09-15 03:30:52.468544+00, 2021-10-28 01:42:06.328193+00, 2019-09-24 17:36:19.947157+00 | ||
| VARCHAR | datetime | 100.0% | 54,138 | 0.2% | 2020-02-02 13:04:16.631878, 2021-02-05 17:23:39.246093, 2021-01-25 08:04:16.53745 | ||
| VARCHAR | datetime | 100.0% | 54,138 | 0.2% | 2021-08-30 22:53:30.280348, 2021-12-16 16:25:56.025706, 2019-05-22 13:58:35.870204 | ||
| VARCHAR | datetime | 100.0% | 47,121 | 0.2% | 2019-09-02 19:31:17.349562, 2021-07-05 15:13:11.590168, 2020-01-25 21:40:24.847115 | ||
| VARCHAR | datetime | 100.0% | 47,121 | 0.2% | 2021-05-13 18:57:40.071454, 2020-07-06 23:58:56.751613, 2023-08-25 17:22:29.451434 | When the OpenStax account profile was updated | |
| VARCHAR | datetime | ⚠️ 2.0% | 1,516 | 0.3% | 2019-02-05 20:55:08.849286, 2019-11-07 21:24:36.504898, 2022-09-19 15:38:11.75522 | Student dropping out of course | |
| VARCHAR | datetime | 100.0% | 54,138 | 0.2% | 2021-06-16 15:37:58.362784, 2018-08-14 14:48:47.963283, 2019-01-09 04:18:07.495438 | When the student profile was created | |
| VARCHAR | datetime | 100.0% | 54,138 | 0.2% | 2020-02-07 21:02:38.41562, 2022-01-10 18:56:04.076358, 2022-01-04 04:36:40.124625 | ||
| VARCHAR | datetime | 98.1% | 28,019 | 0.1% | 2018-09-26 15:02:52.172843, 2022-02-01 02:39:28.265445, 2020-09-05 01:42:50.605016 | To update an assignment | |
| VARCHAR | datetime | 98.1% | 28,019 | 0.1% | 2021-08-26 19:07:06.782938, 2022-01-06 22:46:57.852721, 2020-06-17 01:12:30.329702 | ||
| VARCHAR | datetime | 98.1% | 28,019 | 0.1% | 2022-03-22 16:11:59.587938, 2020-05-28 04:14:44.986028, 2023-04-21 12:53:12.472326 | ||
| VARCHAR | datetime | 98.1% | 21,887 | 0.1% | 2020-06-24 12:38:37.784231, 2018-03-27 14:16:59.144611, 2018-09-07 22:28:32.240547 | ||
| VARCHAR | datetime | ⚠️ 6.3% | 1,460 | 0.1% | 2022-09-06 22:58:18.844438, 2022-04-08 21:33:45.875596, 2018-09-13 03:23:18.767757 | When was the assignment withdrawn by the instructor | |
| VARCHAR | datetime | 98.1% | 20,616 | 0.1% | 2022-07-08 21:50:16, 2020-06-26 04:42:38.040295, 2019-05-06 16:49:56.640985 | ||
| VARCHAR | datetime | ⚠️ 36.9% | 12,932 | 0.1% | 2022-02-19 00:13:13.459835, 2021-08-10 18:47:47.197136, 2022-11-30 14:59:24.258577 | When was the assignment updated by instructor | |
| VARCHAR | datetime | 59.4% | 3,072,812 | 18.5% | 2021-09-22 16:43:37.548762, 2021-06-28 12:49:04.415223, 2021-09-20 16:41:09.351951 | When was the exercise created | |
| VARCHAR | datetime | 59.4% | 13,924,518 | 83.7% | 2020-10-06 13:42:37.260637, 2018-10-22 15:59:01.459424, 2020-11-04 21:47:31.200636 | When was the exercise updated | |
| VARCHAR | datetime | ⚠️ 0.1% | 37,905 | 100.0% | 2022-03-11 17:25:49.697624 | When the assignment step was last graded |
📝 Text (15 columns)
Free-form text columns
View text columns
| Column | Data Type | Variable Type | Completeness | Unique Values | Cardinality | Sample Values | Description |
|---|---|---|---|---|---|---|---|
| VARCHAR | text | 100.0% | 13,544 | 0.1% | module 1, 2.6, 2.7 reading, Homework 5 | Assignment title | |
| VARCHAR | text | 100.0% | 54,138 | 0.2% | r315a4f9d, rbfd1701b, r51bcbae0 | Anonymized student ID |
|
📈 Summary Statistics
Dataset Overview
| Attribute | Value |
|---|---|
| Total Rows | 27,996,816 |
| Total Columns | 95 |
| Total Cells | 2,939,665,680 |
| Profiling Time | 90.41 seconds |
| Profiling Speed | 32,515,040 cells/second |
Column Types Distribution
| Data Type | Count | Percentage |
|---|---|---|
| VARCHAR | 64 | 61.0% |
| INTEGER | 25 | 23.8% |
| DOUBLE | 8 | 7.6% |
| BOOLEAN | 6 | 5.7% |
| TIMESTAMP WITH TIME ZONE | 1 | 1.0% |
| VARCHAR[] | 1 | 1.0% |
Variable Types Distribution
| Variable Type | Count | Percentage |
|---|---|---|
| Identifier | 28 | 26.7% |
| Categorical | 24 | 22.9% |
| Datetime | 22 | 21.0% |
| Text | 15 | 14.3% |
| Boolean | 6 | 5.7% |
| Continuous | 5 | 4.8% |
| Discrete | 4 | 3.8% |
| Empty | 1 | 1.0% |
Data Completeness
| Completeness Level | Column Count | Status |
|---|---|---|
| Complete (0% nulls) | 39 | ✅ |
| Mostly Complete (1-10% nulls) | 33 | ✅ |
| Partial (11-50% nulls) | 16 | ⚠️ |
| Sparse (51-90% nulls) | 6 | ❌ |
| Mostly Empty (>90% nulls) | 11 | ❌ |
Overall Data Completeness: 80.3%
Cardinality Analysis
Cardinality indicates the uniqueness of values in each column.
| Cardinality Level | Column Count | Description |
|---|---|---|
| Very High (>95% unique) | 5 | Likely identifiers |
| High (50-95% unique) | 4 | High variability |
| Medium (10-50% unique) | 4 | Moderate variability |
| Low (<10% unique) | 91 | Categorical/Boolean |
📖 Glossary
View term definitions
| Term | Definition |
|---|---|
| Cardinality | The number of unique values in a column relative to total non-null values. High cardinality means many unique values. |
| Completeness | Percentage of non-null values in a column. Higher is better. |
| Data Type | The technical storage type (e.g., INTEGER, VARCHAR, BOOLEAN). |
| Identifier | A column containing unique values that identify records (e.g., ID, UUID). |
| Missing Values | Null, empty, or placeholder values (NA, null, empty string). |
| Null Percentage | The proportion of null/missing values in a column. |
| Sample Values | Example values from the column to illustrate its contents. |
| Variable Type | The semantic meaning of the column (categorical, continuous, etc.). |