Data Profile Report

Generated on November 24, 2025 at 09:54 PM

Overview

This dataset captures granular student activity data from the OpenStax Tutor learning platform, documenting every step within assigned tasks including exercise responses, reading interactions, completion timestamps, and grading information. It enables analysis of student learning behaviors, assignment effectiveness, and educational outcomes.

Unit of Analysis

  • Task Step - Each row represents a single step within a student's assignment (e.g., one exercise question, one reading section, one video). A complete assignment contains multiple steps, and each student's assignment generates separate step records.

Scope & Filters

  • Temporal: Tasks created between specified date range (time1 to time2)
  • Course Type: Production courses only (excludes preview courses; test courses optional via qtest parameter)
  • Task Types: Includes homework (0), reading (1), external (2), event (3), practice (4), chapter practice (5), page practice (6), mixed practice (7), and concept coach (9)
  • Task Plans: Excludes preview assignment templates

Key Dimensions

Student Context

  • Anonymized student identifiers (research_identifier)
  • Course and period enrollment information
  • Student demographics (name fields for authorized use)

Assignment Structure

  • Task hierarchy:
    Plan → Task → Steps
  • Assignment metadata (title, type, creation date)
  • Content ecosystem version tracking

Step Content

  • Polymorphic step types (exercises, readings, videos, interactives)
  • Content references (pages, exercises) with book location
  • Exercise-specific data (questions, answers, tags)

Student Responses (for exercise steps)

  • Answer submissions and correctness
  • Free-response text and grading
  • Response validation and feedback
  • Attempt tracking

Temporal Data

  • Step creation timestamps
  • First and last completion times
  • Assignment lifecycle tracking

Content Metadata

  • Learning objective (LO) and skill tags
  • Book and page references with hierarchical location
  • Exercise difficulty and classification tags
  • Textbook ecosystem versioning

Primary Use Cases

  • Learning Analytics: Student engagement patterns, time-on-task analysis
  • Content Effectiveness: Exercise difficulty calibration, content performance
  • Educational Research: Learning behavior studies, intervention analysis
  • Instructor Insights: Assignment completion rates, common misconceptions
  • Adaptive Learning: Personalization based on performance patterns

Data Granularity

  • Finest grain: Individual step within an assignment (e.g., question 3 of homework 5)
  • Aggregation potential: Roll up to assignment level, student level, course level, or content level
  • Temporal resolution: Precise timestamps for creation and completion events

Coverage

All student activity on non-preview assignments within the specified date range, including:

  • Complete and incomplete assignments
  • All step types (exercises, readings, videos, etc.)
  • Graded and ungraded work
  • Core and supplemental content

Technical Notes

  • Generated from PostgreSQL production database via parquet file exports
  • Optimized for large-scale data processing (millions of rows)
  • Preserves referential integrity across related tables
  • Left joins preserve steps without exercises/pages (NULL values expected)

Executive Dashboard

🟢 Data Quality Grade: A (92/100)

MetricValueStatus
Total Rows27,996,816
Total Columns95
Completeness80.3%⚠️
Columns with Issues1⚠️
Processing Time90.4s

🔍 Data Quality Alerts

❌ Critical Issues

These issues require immediate attention:

  • dropped_at: 98.0% null values (sparse data)
  • school_district_school_id: 96.2% null values (sparse data)
  • withdrawn_at: 93.7% null values (sparse data)
  • title_exercise: Column is completely empty (100% null)
  • grader_points: 99.9% null values (sparse data)
  • grader_comments: 99.9% null values (sparse data)
  • last_graded_at: 99.9% null values (sparse data)
  • free_response_grade: 99.9% null values (sparse data)
  • published_comments: 99.9% null values (sparse data)

⚠️ Warnings

Review these columns for potential issues:

  • fragment_index: 67.2% null values (partial data)
  • role_type: Only 1 unique values in 27,996,816 rows
  • description: 52.4% null values (partial data)
  • updated_by_instructor_at: 63.1% null values (partial data)
  • title_exercise: Only 0 unique values in 27,996,816 rows
  • free_response: 58.7% null values (partial data)
  • answer_id: 52.0% null values (partial data)
  • response_validation: 60.1% null values (partial data)
  • book_location: 337 unique categories (consider if this should be text/identifier)
  • course_name: 920 unique categories (consider if this should be text/identifier)
  • city: 313 unique categories (consider if this should be text/identifier)
  • title_1_school: Only 1 unique values in 27,996,816 rows

📊 Column Profiles

Columns are organized by data type for easier navigation.

🔑 Identifier (28 columns)

Unique identifiers (IDs, keys, UUIDs)

View identifier columns
ColumnData TypeVariable TypeCompletenessUnique ValuesCardinalitySample ValuesDescription
id
INTEGERidentifier100.0%27,996,816100.0%28927409, 26352572, 40486573
tasks_task_id
INTEGERidentifier100.0%1,325,3914.7%1495846, 1640791, 2061782
tasked_id
INTEGERidentifier100.0%24,886,89288.9%14705499, 4923798, 39417216
tasks_task_plan_id
INTEGERidentifier98.1%28,0190.1%214112, 199873, 128501Foreign key linking an individual student's task instance back to the instructor's original task plan (assignment template). This connects a student's specific assignment to the main plan that generated it.
course_id
DOUBLEidentifier100.0%1,3680.0%15534.0, 8142.0, 12789.0Unique identifier for teacher-created course on OpenStax Tutor
entity_role_id
INTEGERidentifier100.0%54,1380.2%70579, 191533, 109791Unique identifier for a user's role within the system. This is the primary key that links users to their various roles (student, teacher, etc.) and connects them to their activities across courses.
course_profile_course_id_student
INTEGERidentifier100.0%1,3680.0%3889, 10667, 16879
uuid
VARCHARidentifier100.0%54,1380.2%27741d9b-20a1-49a8-baf1-b96111f89b8d, 49fae85e-53e5-485b-8f5e-4a9e5533bfaa, c1b9e7e4-e0f6-434c-bdb2-7950932f68b3Foreign key identifying which course this record belongs to
period_id
INTEGERidentifier100.0%1,7250.0%11951, 23255, 32965Course-period id in which the student was enrolled
school_district_school_id
INTEGERidentifier⚠️ 3.8%160.0%1291, 1297, 1102Identifier for the school district for a specific OpenStax course
plan_id
INTEGERidentifier98.1%1,3680.0%8538, 7788, 8563Unique identifier for the task plan that generated this task. A task plan is the instructor's assignment template/blueprint that defines what content should be assigned, to which students, and when.
publish_job_uuid
VARCHARidentifier98.1%28,0190.1%4325eaf3-dfd9-4cf9-a3e6-1452d8c51b0c, 1b6417f6-6628-4ddf-9eb1-4b53f2297b8a, 4427bca9-16bb-4b7b-ad6e-58721fd76f29
tasks_grading_template_id
INTEGERidentifier98.1%2,6860.0%25525, 10008, 9658
exercise_id
INTEGERidentifier59.4%139,4560.8%544259, 585359, 333322
answer_id
VARCHARidentifier⚠️ 48.0%185,6311.4%393425, 316849, 371854Index of the correct response
correct_answer_id
VARCHARidentifier59.1%55,4960.3%368103, 381186, 596529Index of the correct response
question_id
VARCHARidentifier59.4%57,3550.3%94083, 96743, 147007Internal numeric identifier for questions
uuid_exercise
VARCHARidentifier59.4%16,642,469100.0%3aa054c8-24fb-4ea3-8acc-645d75a569c2, d5c46ded-fdfd-469e-b534-4eedbda9c7cb, efc2674c-fad7-48e5-afc9-31ca35d26b4fInternal unique identifier for exercises
period_uuid
VARCHARidentifier100.0%1,7250.0%a85fcd18-de0d-4ba0-b6ee-024b189f158e, 1c4c4018-16a8-4695-b826-0254d8a74a8b, 350c48f6-8063-4193-a750-5d1af43bf2c5Internal unique identifier for course periods created on OpenStax Tutor
course_uuid
VARCHARidentifier99.2%1,3600.0%fe9c820c-1816-4564-b46c-6a3f3cdee2cd, dfd8ce33-f709-4f85-bea0-8ac27a7e1f1f, 44b48e09-cb5a-4dee-85ed-3053b29caba7Internal unique identifier for courses created on OpenStax Tutor

📑 Categorical (24 columns)

Categorical variables with limited distinct values

View categorical columns
ColumnData TypeVariable TypeCompletenessUnique ValuesCardinalitySample ValuesDescription
tasked_type
VARCHARcategorical100.0%60.0%Reading, Placeholder, Video, Interactive, Exercise ... (6 total levels)Type of activity for the assignment step
group_type
VARCHARcategorical100.0%30.0%core, personalized, spaced_practiceType of intervention/formative assessment

|

labels
| VARCHAR | categorical | 100.0% | 9 | 0.0% | ["review"], [], ["phet-explorations"], ["example"], ["interactive"] ... (9 total levels) | Type of activity as indicated by the book processing instructions | |
fragment_index
| INTEGER | categorical | ⚠️ 32.8% | 21 | 0.0% | 2, 3, 4, 8, 7 ... (21 total levels) | Tutor assignment splits the page into fragments and the index denotes which fragment the step is associated with | |
task_type
| VARCHAR | categorical | 100.0% | 5 | 0.0% | homework, page_practice, practice_worst_topics, reading | The type of assignment | |
term
| INTEGER | categorical | 100.0% | 4 | 0.0% | 4, 3, 5, 2 | Academic semester, 2 = Spring, 3 = Summer, 4 = Fall, 5 = Winter | |
year
| INTEGER | categorical | 100.0% | 7 | 0.0% | 2021, 2023, 2024, 2022, 2019 ... (7 total levels) | Calendar year |

|

wrq_count
| INTEGER | categorical | 98.1% | 26 | 0.0% | 0, 12, 7, 5, 6 ... (26 total levels) | Written response question count | |
question_index
| INTEGER | categorical | 59.4% | 8 | 0.0% | 2, 0, 1 ... (8 total levels) | Multi-part question index, few questions of this type existed | |
attempt_number
| INTEGER | categorical | 59.4% | 9 | 0.0% | 0, 2, 1 ... (9 total levels) | How many times a student responds to an assignment step | |
book_location
| VARCHAR | categorical | 96.6% | 337 | 0.0% | [44,1], [4,4], [29,1], [29,4], [27,9] ... (337 total levels) | Chapter, section of OpenStax textbook | |
book_title
| VARCHAR | categorical | 96.6% | 11 | 0.0% | College Physics for AP® Courses, Anatomy and Physiology, U.S. History, Biology for AP® Courses, Biology ... (11 total levels) | Name of the book | |
book_version
| VARCHAR | categorical | 96.6% | 52 | 0.0% | 8.12, 15.1, 6.202, 7.3, 4.2 ... (52 total levels) | Version of the book |

|

step_type
| VARCHAR | categorical | 100.0% | 6 | 0.0% | exercise, reading, interactive, placeholder, video ... (6 total levels) | Type of the part of the assignment | |
course_start_date
| VARCHAR | categorical | 99.2% | 26 | 0.0% | 1/1/18, 5/1/18, 1/1/23, 4/1/23, 1/1/24 ... (26 total levels) | When the course starts | |
book_name
| VARCHAR | categorical | 99.2% | 9 | 0.0% | AP Physics, US History, College Physics (Algebra), Psychology, Biology ... (9 total levels) | Book Name | |
course_name
| VARCHAR | categorical | 99.2% | 920 | 0.0% | Elements of Physics I, Introductory Physics II, Biology 182, PHYS 1112 Introductory Physics 2, Elements of Physics II section 11 ... (920 total levels) | Instrutor assigned name of course | |
school_type
| VARCHAR | categorical | 99.2% | 6 | 0.0% | K-12 School, High School, College/University (4), Technical/Community College (2), Home School ... (6 total levels) | Type of institution, comes from our business data and are verified to the best of our knowledge | |
city
| VARCHAR | categorical | 99.2% | 313 | 0.0% | Glenside, Chalmette, Espanola, Bluffton, San Francisco ... (313 total levels) | City of use | |
state_province
| VARCHAR | categorical | 99.1% | 51 | 0.0% | California, North Carolina, Iowa, Wisconsin, Kentucky ... (51 total levels) | State of use | |
country
| VARCHAR | categorical | 99.2% | 2 | 0.0% | United States, United States Territory | Country of use |

📈 Continuous (5 columns)

Continuous numeric variables (decimals, floats)

View continuous columns
ColumnData TypeVariable TypeCompletenessUnique ValuesMinMaxMeanSample ValuesDescription
grader_points
DOUBLEcontinuous⚠️ 0.1%590121.661.0, 2.0Instructor/Teaching Assistant assigned points for each assignment step but may not be published
free_response_grade
DOUBLEcontinuous⚠️ 0.1%540121.7Grader points when published

🔢 Discrete (4 columns)

Discrete numeric variables (integers, counts)

View discrete columns
ColumnData TypeVariable TypeCompletenessUnique ValuesMinMaxMeanSample ValuesDescription

|

role_type
| INTEGER | discrete | 100.0% | 1 | 3 | 3 | 3 | 3 | Number that corresponds to 2 = instructor, 3 = students, 4 = teacher viewing as student | |
ungraded_step_count
| INTEGER | discrete | 98.1% | 93 | 0 | 341 | 0.64 | 13, 38, 0 | Number of assignment steps that are left to be manually graded | |
gradable_step_count
| INTEGER | discrete | 98.1% | 178 | 0 | 727 | 2.06 | 32, 23, 64 | Total number of assignment steps that are eligible to be manually graded |

☑️ Boolean (6 columns)

Boolean variables (true/false)

View boolean columns
ColumnData TypeVariable TypeCompletenessUnique ValuesCardinalitySample ValuesDescription
is_core
BOOLEANboolean100.0%20.0%true, falseAny steps that an instructor selected as core, as well as questions at the end of reading, that all students enrolled in the course had to complete
is_preview
BOOLEANboolean100.0%10.0%falseThe course/assignment is for preview
is_test
BOOLEANboolean100.0%10.0%falseIt is a test course
is_college
BOOLEANboolean77.9%20.0%true, falseIf it is a college course
is_preview_plan
BOOLEANboolean98.1%10.0%falseIf assignment is for preview
is_in_multipart
BOOLEANboolean59.4%20.0%false, trueIf the question is part of a multi-part

📅 Datetime (22 columns)

Date and time columns

View datetime columns
ColumnData TypeVariable TypeCompletenessUnique ValuesCardinalitySample ValuesDescription
first_completed_at
VARCHARdatetime68.5%19,187,762100.0%2021-10-03 18:44:10.275544, 2021-09-08 03:03:32.871312, 2019-02-20 20:40:24.434679
last_completed_at
VARCHARdatetime68.5%19,187,770100.0%2018-10-26 01:27:23.855325, 2019-04-15 19:07:12.109664, 2018-11-06 06:24:48.643782
created_at
VARCHARdatetime100.0%986,5323.5%2023-03-24 00:54:36.306526, 2022-01-23 21:24:00.93717, 2021-01-25 18:17:16.641405
updated_at
VARCHARdatetime100.0%19,663,18270.2%2021-09-29 02:12:37.022564, 2019-01-24 22:05:25.188974, 2021-10-28 02:10:51.110241
created_at_task
TIMESTAMP WITH TIME ZONEdatetime100.0%621,9402.2%2022-09-15 03:30:52.468544+00, 2021-10-28 01:42:06.328193+00, 2019-09-24 17:36:19.947157+00
created_at_role
VARCHARdatetime100.0%54,1380.2%2020-02-02 13:04:16.631878, 2021-02-05 17:23:39.246093, 2021-01-25 08:04:16.53745
updated_at_role
VARCHARdatetime100.0%54,1380.2%2021-08-30 22:53:30.280348, 2021-12-16 16:25:56.025706, 2019-05-22 13:58:35.870204
created_at_profile
VARCHARdatetime100.0%47,1210.2%2019-09-02 19:31:17.349562, 2021-07-05 15:13:11.590168, 2020-01-25 21:40:24.847115
updated_at_profile
VARCHARdatetime100.0%47,1210.2%2021-05-13 18:57:40.071454, 2020-07-06 23:58:56.751613, 2023-08-25 17:22:29.451434When the OpenStax account profile was updated
dropped_at
VARCHARdatetime⚠️ 2.0%1,5160.3%2019-02-05 20:55:08.849286, 2019-11-07 21:24:36.504898, 2022-09-19 15:38:11.75522Student dropping out of course
created_at_student
VARCHARdatetime100.0%54,1380.2%2021-06-16 15:37:58.362784, 2018-08-14 14:48:47.963283, 2019-01-09 04:18:07.495438When the student profile was created
updated_at_student
VARCHARdatetime100.0%54,1380.2%2020-02-07 21:02:38.41562, 2022-01-10 18:56:04.076358, 2022-01-04 04:36:40.124625
publish_last_requested_at
VARCHARdatetime98.1%28,0190.1%2018-09-26 15:02:52.172843, 2022-02-01 02:39:28.265445, 2020-09-05 01:42:50.605016To update an assignment
first_published_at
VARCHARdatetime98.1%28,0190.1%2021-08-26 19:07:06.782938, 2022-01-06 22:46:57.852721, 2020-06-17 01:12:30.329702
created_at_plan
VARCHARdatetime98.1%28,0190.1%2022-03-22 16:11:59.587938, 2020-05-28 04:14:44.986028, 2023-04-21 12:53:12.472326
updated_at_plan
VARCHARdatetime98.1%21,8870.1%2020-06-24 12:38:37.784231, 2018-03-27 14:16:59.144611, 2018-09-07 22:28:32.240547
withdrawn_at
VARCHARdatetime⚠️ 6.3%1,4600.1%2022-09-06 22:58:18.844438, 2022-04-08 21:33:45.875596, 2018-09-13 03:23:18.767757When was the assignment withdrawn by the instructor
last_published_at
VARCHARdatetime98.1%20,6160.1%2022-07-08 21:50:16, 2020-06-26 04:42:38.040295, 2019-05-06 16:49:56.640985
updated_by_instructor_at
VARCHARdatetime⚠️ 36.9%12,9320.1%2022-02-19 00:13:13.459835, 2021-08-10 18:47:47.197136, 2022-11-30 14:59:24.258577When was the assignment updated by instructor
created_at_exercise
VARCHARdatetime59.4%3,072,81218.5%2021-09-22 16:43:37.548762, 2021-06-28 12:49:04.415223, 2021-09-20 16:41:09.351951When was the exercise created
updated_at_exercise
VARCHARdatetime59.4%13,924,51883.7%2020-10-06 13:42:37.260637, 2018-10-22 15:59:01.459424, 2020-11-04 21:47:31.200636When was the exercise updated
last_graded_at
VARCHARdatetime⚠️ 0.1%37,905100.0%2022-03-11 17:25:49.697624When the assignment step was last graded

📝 Text (15 columns)

Free-form text columns

View text columns
ColumnData TypeVariable TypeCompletenessUnique ValuesCardinalitySample ValuesDescription
title
VARCHARtext100.0%13,5440.1%module 1, 2.6, 2.7 reading, Homework 5Assignment title
research_identifier
VARCHARtext100.0%54,1380.2%r315a4f9d, rbfd1701b, r51bcbae0Anonymized student ID

|

title_plan
| VARCHAR | text | 98.1% | 13,511 | 0.1% | 1D Kinematics (2.1 - 2.8) second chance, HW 2, Ch 5 Applications of Newton's Law | Assignment title | |
description
| VARCHAR | text | ⚠️ 47.6% | 2,680 | 0.0% | The Immune System, Reading assignments are to be completed prior t..., Complete these questions to check your understa... | Assignment instructions | |
settings
| VARCHAR | text | 98.1% | 17,037 | 0.1% | {"page_ids":["76887","76888","76889"]}, {"exercises":[{"id":"500581","points":[1]},{"id..., {"page_ids":["76964","76965","76966","76967","7... | Instructor selections while creating assignment | |
url
| VARCHAR | text | 59.0% | 54,418 | 0.3% | https://exercises.openstax.org/exercises/16045@2, https://exercises.openstax.org/exercises/315@5, https://exercises.openstax.org/exercise/12113@7 | Exercise URL to retrieve question details from | |
free_response
| VARCHAR | text | ⚠️ 41.3% | 4,872,271 | 42.1% | When I push down on a table while I am on a sca..., UAG CAA CAU GAC AUC CUA UUU, 6.22*10^5 | Student response to the question | |
response_validation
| VARCHAR | text | ⚠️ 39.9% | 7,996,818 | 71.7% | {"attempts": [{"nudge": "Give it another shot",..., {"attempts": [{"nudge": "Take another chance", ..., {"attempts": [{"nudge": "Try again", "valid": t... | Free response input quality evaluation | |
answer_ids
| VARCHAR | text | 59.4% | 159,661 | 1.0% | {300880,300881,300882,300883}, {224567,224568,224566,224565}, {299536,299537,299538,299539} | Multiple-choice response options index | |
grader_comments
| VARCHAR | text | ⚠️ 0.1% | 6,078 | 16.0% | No samples available | Grader feedback to free-response grades | |
published_comments
| VARCHAR | text | ⚠️ 0.1% | 4,575 | 17.1% | No samples available | Grader feedback shared with students | |
page_url
| VARCHAR | text | 96.6% | 8,421 | 0.0% | https://archive.cnx.org/contents/8d50a0af-948b-..., https://archive.cnx.org/contents/405335a3-7cff-..., https://openstax.org/apps/archive/20210713.2056... | Link to textbook content, not valid | |
tags_array
| VARCHAR[] | text | 59.4% | 72,124 | 0.4% | blooms:2, dok:2, k12phys-ch01-s02-lo03 | Array of all metadata tags associated with the exercise |


📈 Summary Statistics

Dataset Overview

AttributeValue
Total Rows27,996,816
Total Columns95
Total Cells2,939,665,680
Profiling Time90.41 seconds
Profiling Speed32,515,040 cells/second

Column Types Distribution

Data TypeCountPercentage
VARCHAR6461.0%
INTEGER2523.8%
DOUBLE87.6%
BOOLEAN65.7%
TIMESTAMP WITH TIME ZONE11.0%
VARCHAR[]11.0%

Variable Types Distribution

Variable TypeCountPercentage
Identifier2826.7%
Categorical2422.9%
Datetime2221.0%
Text1514.3%
Boolean65.7%
Continuous54.8%
Discrete43.8%
Empty11.0%

Data Completeness

Completeness LevelColumn CountStatus
Complete (0% nulls)39
Mostly Complete (1-10% nulls)33
Partial (11-50% nulls)16⚠️
Sparse (51-90% nulls)6
Mostly Empty (>90% nulls)11

Overall Data Completeness: 80.3%

Cardinality Analysis

Cardinality indicates the uniqueness of values in each column.

Cardinality LevelColumn CountDescription
Very High (>95% unique)5Likely identifiers
High (50-95% unique)4High variability
Medium (10-50% unique)4Moderate variability
Low (<10% unique)91Categorical/Boolean

📖 Glossary

View term definitions
TermDefinition
CardinalityThe number of unique values in a column relative to total non-null values. High cardinality means many unique values.
CompletenessPercentage of non-null values in a column. Higher is better.
Data TypeThe technical storage type (e.g., INTEGER, VARCHAR, BOOLEAN).
IdentifierA column containing unique values that identify records (e.g., ID, UUID).
Missing ValuesNull, empty, or placeholder values (NA, null, empty string).
Null PercentageThe proportion of null/missing values in a column.
Sample ValuesExample values from the column to illustrate its contents.
Variable TypeThe semantic meaning of the column (categorical, continuous, etc.).

Generated by Data Profiler v5.2.3
Report Date: 2025-11-24 21:54:53
Status: ✅ All columns profiled successfully