Data Profile Report

Generated on November 24, 2025 at 09:54 PM

Overview

This dataset captures granular student activity data from the OpenStax Tutor learning platform, documenting every step within assigned tasks including exercise responses, reading interactions, completion timestamps, and grading information. It enables analysis of student learning behaviors, assignment effectiveness, and educational outcomes.

Unit of Analysis

  • Task Step - Each row represents a single step within a student's assignment (e.g., one exercise question, one reading section, one video). A complete assignment contains multiple steps, and each student's assignment generates separate step records.

Scope & Filters

  • Temporal: Tasks created between specified date range (time1 to time2)
  • Course Type: Production courses only (excludes preview courses; test courses optional via qtest parameter)
  • Task Types: Includes homework (0), reading (1), external (2), event (3), practice (4), chapter practice (5), page practice (6), mixed practice (7), and concept coach (9)
  • Task Plans: Excludes preview assignment templates

Key Dimensions

Student Context

  • Anonymized student identifiers (research_identifier)
  • Course and period enrollment information
  • Student demographics (name fields for authorized use)

Assignment Structure

  • Task hierarchy:
    Plan → Task → Steps
  • Assignment metadata (title, type, creation date)
  • Content ecosystem version tracking

Step Content

  • Polymorphic step types (exercises, readings, videos, interactives)
  • Content references (pages, exercises) with book location
  • Exercise-specific data (questions, answers, tags)

Student Responses (for exercise steps)

  • Answer submissions and correctness
  • Free-response text and grading
  • Response validation and feedback
  • Attempt tracking

Temporal Data

  • Step creation timestamps
  • First and last completion times
  • Assignment lifecycle tracking

Content Metadata

  • Learning objective (LO) and skill tags
  • Book and page references with hierarchical location
  • Exercise difficulty and classification tags
  • Textbook ecosystem versioning

Primary Use Cases

  • Learning Analytics: Student engagement patterns, time-on-task analysis
  • Content Effectiveness: Exercise difficulty calibration, content performance
  • Educational Research: Learning behavior studies, intervention analysis
  • Instructor Insights: Assignment completion rates, common misconceptions
  • Adaptive Learning: Personalization based on performance patterns

Data Granularity

  • Finest grain: Individual step within an assignment (e.g., question 3 of homework 5)
  • Aggregation potential: Roll up to assignment level, student level, course level, or content level
  • Temporal resolution: Precise timestamps for creation and completion events

Coverage

All student activity on non-preview assignments within the specified date range, including:

  • Complete and incomplete assignments
  • All step types (exercises, readings, videos, etc.)
  • Graded and ungraded work
  • Core and supplemental content

Technical Notes

  • Generated from PostgreSQL production database via parquet file exports
  • Optimized for large-scale data processing (millions of rows)
  • Preserves referential integrity across related tables
  • Left joins preserve steps without exercises/pages (NULL values expected)

Executive Dashboard

🟢 Data Quality Grade: A (92/100)

MetricValueStatus
Total Rows27,996,816
Total Columns95
Completeness80.3%⚠️
Columns with Issues1⚠️
Processing Time90.4s

🔍 Data Quality Alerts

❌ Critical Issues

These issues require immediate attention:

  • dropped_at: 98.0% null values (sparse data)
  • school_district_school_id: 96.2% null values (sparse data)
  • withdrawn_at: 93.7% null values (sparse data)
  • title_exercise: Column is completely empty (100% null)
  • grader_points: 99.9% null values (sparse data)
  • grader_comments: 99.9% null values (sparse data)
  • last_graded_at: 99.9% null values (sparse data)
  • free_response_grade: 99.9% null values (sparse data)
  • published_comments: 99.9% null values (sparse data)

⚠️ Warnings

Review these columns for potential issues:

  • fragment_index: 67.2% null values (partial data)
  • role_type: Only 1 unique values in 27,996,816 rows
  • description: 52.4% null values (partial data)
  • updated_by_instructor_at: 63.1% null values (partial data)
  • title_exercise: Only 0 unique values in 27,996,816 rows
  • free_response: 58.7% null values (partial data)
  • answer_id: 52.0% null values (partial data)
  • response_validation: 60.1% null values (partial data)
  • book_location: 337 unique categories (consider if this should be text/identifier)
  • course_name: 920 unique categories (consider if this should be text/identifier)
  • city: 313 unique categories (consider if this should be text/identifier)
  • title_1_school: Only 1 unique values in 27,996,816 rows

📊 Column Profiles

Columns are organized by data type for easier navigation.

🔑 Identifier (28 columns)

Unique identifiers (IDs, keys, UUIDs)

ColumnData TypeVariable TypeCompletenessUnique ValuesCardinalitySample ValuesDescription
tasks_task_id
INTEGERidentifier100.0%1,325,3914.7%1495846, 1640791, 2061782Assignment identifier for a specific student
tasked_id
INTEGERidentifier100.0%24,886,89288.9%14705499, 4923798, 39417216Unique id for a specific step for a specific assignment and specific student
tasks_task_plan_id
INTEGERidentifier98.1%28,0190.1%214112, 199873, 128501Foreign key linking an individual student's task instance back to the instructor's original task plan (assignment template). This connects a student's specific assignment to the main plan that generated it.
course_id
DOUBLEidentifier100.0%1,3680.0%15534.0, 8142.0, 12789.0Unique identifier for teacher-created course on OpenStax Tutor
entity_role_id
INTEGERidentifier100.0%54,1380.2%70579, 191533, 109791Unique identifier for a user's role within the system. This is the primary key that links users to their various roles (student, teacher, etc.) and connects them to their activities across courses.
course_profile_course_id_student
INTEGERidentifier100.0%1,3680.0%3889, 10667, 16879Course section id from
course_membership_students
linking the learner’s enrollment to
course_profile_courses
uuid
VARCHARidentifier100.0%54,1380.2%27741d9b-20a1-49a8-baf1-b96111f89b8d, 49fae85e-53e5-485b-8f5e-4a9e5533bfaa, c1b9e7e4-e0f6-434c-bdb2-7950932f68b3Foreign key identifying which course this record belongs to
period_id
INTEGERidentifier100.0%1,7250.0%11951, 23255, 32965Course-period id in which the student was enrolled
school_district_school_id
INTEGERidentifier⚠️ 3.8%160.0%1291, 1297, 1102Identifier for the school district for a specific OpenStax course
plan_id
INTEGERidentifier98.1%1,3680.0%8538, 7788, 8563Unique identifier for the task plan that generated this task. A task plan is the instructor's assignment template/blueprint that defines what content should be assigned, to which students, and when.
tasks_grading_template_id
INTEGERidentifier98.1%2,6860.0%25525, 10008, 9658Teacher created grading template
exercise_id
INTEGERidentifier59.4%139,4560.8%544259, 585359, 333322
tasks_tasked_exercises.content_exercise_id
, the canonical question record in
content_exercises
used for the step
answer_id
VARCHARidentifier⚠️ 48.0%185,6311.4%393425, 316849, 371854Index of the correct response
correct_answer_id
VARCHARidentifier59.1%55,4960.3%368103, 381186, 596529Index of the correct response
question_id
VARCHARidentifier59.4%57,3550.3%94083, 96743, 147007Internal numeric identifier for questions
uuid_exercise
VARCHARidentifier59.4%16,642,469100.0%3aa054c8-24fb-4ea3-8acc-645d75a569c2, d5c46ded-fdfd-469e-b534-4eedbda9c7cb, efc2674c-fad7-48e5-afc9-31ca35d26b4fInternal unique identifier for exercises
period_uuid
VARCHARidentifier100.0%1,7250.0%a85fcd18-de0d-4ba0-b6ee-024b189f158e, 1c4c4018-16a8-4695-b826-0254d8a74a8b, 350c48f6-8063-4193-a750-5d1af43bf2c5Internal unique identifier for course periods created on OpenStax Tutor
course_uuid
VARCHARidentifier99.2%1,3600.0%fe9c820c-1816-4564-b46c-6a3f3cdee2cd, dfd8ce33-f709-4f85-bea0-8ac27a7e1f1f, 44b48e09-cb5a-4dee-85ed-3053b29caba7Internal unique identifier for courses created on OpenStax Tutor

📑 Categorical (24 columns)

Categorical variables with limited distinct values

ColumnData TypeVariable TypeCompletenessUnique ValuesCardinalitySample ValuesDescription
tasked_type
VARCHARcategorical100.0%60.0%Reading, Placeholder, Video, Interactive, Exercise ... (6 total levels)Type of activity for the assignment step
group_type
VARCHARcategorical100.0%30.0%core, personalized, spaced_practiceType of intervention/formative assessment
labels
VARCHARcategorical100.0%90.0%["review"], [], ["phet-explorations"], ["example"], ["interactive"] ... (9 total levels)Type of activity as indicated by the book processing instructions
fragment_index
INTEGERcategorical⚠️ 32.8%210.0%2, 3, 4, 8, 7 ... (21 total levels)Tutor assignment splits the page into fragments and the index denotes which fragment the step is associated with
task_type
VARCHARcategorical100.0%50.0%homework, page_practice, practice_worst_topics, readingThe type of assignment
term
INTEGERcategorical100.0%40.0%4, 3, 5, 2Academic semester, 2 = Spring, 3 = Summer, 4 = Fall, 5 = Winter
year
INTEGERcategorical100.0%70.0%2021, 2023, 2024, 2022, 2019 ... (7 total levels)Calendar year
wrq_count
INTEGERcategorical98.1%260.0%0, 12, 7, 5, 6 ... (26 total levels)Written response question count
question_index
INTEGERcategorical59.4%80.0%2, 0, 1 ... (8 total levels)Multi-part question index, few questions of this type existed
attempt_number
INTEGERcategorical59.4%90.0%0, 2, 1 ... (9 total levels)How many times a student responds to an assignment step.
NA
corresponds to reading assignments, 0 is the default when an exercise is assigned but not completed. When a student completes the exercise, attempt number increases by 1 with every time that it is attempted.
book_location
VARCHARcategorical96.6%3370.0%[44,1], [4,4], [29,1], [29,4], [27,9] ... (337 total levels)Chapter, section of OpenStax textbook
book_title
VARCHARcategorical96.6%110.0%College Physics for AP® Courses, Anatomy and Physiology, U.S. History, Biology for AP® Courses, Biology ... (11 total levels)Name of the book
book_version
VARCHARcategorical96.6%520.0%8.12, 15.1, 6.202, 7.3, 4.2 ... (52 total levels)Version of the book
step_type
VARCHARcategorical100.0%60.0%exercise, reading, interactive, placeholder, video ... (6 total levels)Type of the part of the assignment
course_start_date
VARCHARcategorical99.2%260.0%1/1/18, 5/1/18, 1/1/23, 4/1/23, 1/1/24 ... (26 total levels)When the course starts
book_name
VARCHARcategorical99.2%90.0%AP Physics, US History, College Physics (Algebra), Psychology, Biology ... (9 total levels)Book Name
course_name
VARCHARcategorical99.2%9200.0%Elements of Physics I, Introductory Physics II, Biology 182, PHYS 1112 Introductory Physics 2, Elements of Physics II section 11 ... (920 total levels)Instrutor assigned name of course
school_type
VARCHARcategorical99.2%60.0%K-12 School, High School, College/University (4), Technical/Community College (2), Home School ... (6 total levels)Type of institution, comes from our business data and are verified to the best of our knowledge
city
VARCHARcategorical99.2%3130.0%Glenside, Chalmette, Espanola, Bluffton, San Francisco ... (313 total levels)City of use
state_province
VARCHARcategorical99.1%510.0%California, North Carolina, Iowa, Wisconsin, Kentucky ... (51 total levels)State of use
country
VARCHARcategorical99.2%20.0%United States, United States TerritoryCountry of use

📈 Continuous (5 columns)

Continuous numeric variables (decimals, floats)

ColumnData TypeVariable TypeCompletenessUnique ValuesMinMaxMeanSample ValuesDescription
grader_points
DOUBLEcontinuous⚠️ 0.1%590121.661.0, 2.0Instructor/Teaching Assistant assigned points for each assignment step but may not be published
free_response_grade
DOUBLEcontinuous⚠️ 0.1%540121.7Grader points when published

🔢 Discrete (4 columns)

Discrete numeric variables (integers, counts)

ColumnData TypeVariable TypeCompletenessUnique ValuesMinMaxMeanSample ValuesDescription
role_type
INTEGERdiscrete100.0%13333Number that corresponds to 2 = instructor, 3 = students, 4 = teacher viewing as student
ungraded_step_count
INTEGERdiscrete98.1%9303410.6413, 38, 0Number of assignment steps that are left to be manually graded
gradable_step_count
INTEGERdiscrete98.1%17807272.0632, 23, 64Total number of assignment steps that are eligible to be manually graded

☑️ Boolean (6 columns)

Boolean variables (true/false)

ColumnData TypeVariable TypeCompletenessUnique ValuesCardinalitySample ValuesDescription
is_core
BOOLEANboolean100.0%20.0%true, falseAny steps that an instructor selected as core, as well as questions at the end of reading, that all students enrolled in the course had to complete
is_preview
BOOLEANboolean100.0%10.0%falseThe course/assignment is for preview
is_test
BOOLEANboolean100.0%10.0%falseIt is a test course
is_college
BOOLEANboolean77.9%20.0%true, falseIf it is a college course
is_preview_plan
BOOLEANboolean98.1%10.0%falseIf assignment is for preview
is_in_multipart
BOOLEANboolean59.4%20.0%false, trueIf the question is part of a multi-part

📅 Datetime (22 columns)

Date and time columns

ColumnData TypeVariable TypeCompletenessUnique ValuesCardinalitySample ValuesDescription
first_completed_at
VARCHARdatetime68.5%19,187,762100.0%2021-10-03 18:44:10.275544, 2021-09-08 03:03:32.871312, 2019-02-20 20:40:24.434679Timestamp from
tasks_task_steps.first_completed_at
capturing when the step was first finished by the student
last_completed_at
VARCHARdatetime68.5%19,187,770100.0%2018-10-26 01:27:23.855325, 2019-04-15 19:07:12.109664, 2018-11-06 06:24:48.643782
tasks_task_steps.last_completed_at
, i.e., the most recent completion time for the step
created_at
VARCHARdatetime100.0%986,5323.5%2023-03-24 00:54:36.306526, 2022-01-23 21:24:00.93717, 2021-01-25 18:17:16.641405Row creation time for the step record (
tasks_task_steps.created_at
)
updated_at
VARCHARdatetime100.0%19,663,18270.2%2021-09-29 02:12:37.022564, 2019-01-24 22:05:25.188974, 2021-10-28 02:10:51.110241Last update timestamp for the step row (
tasks_task_steps.updated_at
)
created_at_task
TIMESTAMP WITH TIME ZONEdatetime100.0%621,9402.2%2022-09-15 03:30:52.468544+00, 2021-10-28 01:42:06.328193+00, 2019-09-24 17:36:19.947157+00Parent assignment creation time from
tasks_tasks.created_at
created_at_role
VARCHARdatetime100.0%54,1380.2%2020-02-02 13:04:16.631878, 2021-02-05 17:23:39.246093, 2021-01-25 08:04:16.53745Timestamp when the learner’s
entity_roles
record was inserted
updated_at_role
VARCHARdatetime100.0%54,1380.2%2021-08-30 22:53:30.280348, 2021-12-16 16:25:56.025706, 2019-05-22 13:58:35.870204Last update time for that
entity_roles
row
created_at_profile
VARCHARdatetime100.0%47,1210.2%2019-09-02 19:31:17.349562, 2021-07-05 15:13:11.590168, 2020-01-25 21:40:24.847115Creation time for the linked OpenStax account in
user_profiles.created_at
updated_at_profile
VARCHARdatetime100.0%47,1210.2%2021-05-13 18:57:40.071454, 2020-07-06 23:58:56.751613, 2023-08-25 17:22:29.451434When the OpenStax account profile was updated
dropped_at
VARCHARdatetime⚠️ 2.0%1,5160.3%2019-02-05 20:55:08.849286, 2019-11-07 21:24:36.504898, 2022-09-19 15:38:11.75522Student dropping out of course
created_at_student
VARCHARdatetime100.0%54,1380.2%2021-06-16 15:37:58.362784, 2018-08-14 14:48:47.963283, 2019-01-09 04:18:07.495438Timestamp when the student’s enrollment row was created in
course_membership_students
updated_at_student
VARCHARdatetime100.0%54,1380.2%2020-02-07 21:02:38.41562, 2022-01-10 18:56:04.076358, 2022-01-04 04:36:40.124625Last modification time for that
course_membership_students
record
publish_last_requested_at
VARCHARdatetime98.1%28,0190.1%2018-09-26 15:02:52.172843, 2022-02-01 02:39:28.265445, 2020-09-05 01:42:50.605016When an instructor last requested to publish the task plan (
tasks_task_plans.publish_last_requested_at
)
first_published_at
VARCHARdatetime98.1%28,0190.1%2021-08-26 19:07:06.782938, 2022-01-06 22:46:57.852721, 2020-06-17 01:12:30.329702First time the plan was successfully published (
tasks_task_plans.first_published_at
)
created_at_plan
VARCHARdatetime98.1%28,0190.1%2022-03-22 16:11:59.587938, 2020-05-28 04:14:44.986028, 2023-04-21 12:53:12.472326Plan creation timestamp from
tasks_task_plans.created_at
updated_at_plan
VARCHARdatetime98.1%21,8870.1%2020-06-24 12:38:37.784231, 2018-03-27 14:16:59.144611, 2018-09-07 22:28:32.240547Most recent update to the plan metadata (
tasks_task_plans.updated_at
)
withdrawn_at
VARCHARdatetime⚠️ 6.3%1,4600.1%2022-09-06 22:58:18.844438, 2022-04-08 21:33:45.875596, 2018-09-13 03:23:18.767757When was the assignment withdrawn by the instructor
last_published_at
VARCHARdatetime98.1%20,6160.1%2022-07-08 21:50:16, 2020-06-26 04:42:38.040295, 2019-05-06 16:49:56.640985Timestamp for the latest publish action on the task plan (
tasks_task_plans.last_published_at
)
updated_by_instructor_at
VARCHARdatetime⚠️ 36.9%12,9320.1%2022-02-19 00:13:13.459835, 2021-08-10 18:47:47.197136, 2022-11-30 14:59:24.258577When was the assignment updated by instructor
created_at_exercise
VARCHARdatetime59.4%3,072,81218.5%2021-09-22 16:43:37.548762, 2021-06-28 12:49:04.415223, 2021-09-20 16:41:09.351951When was the exercise created
updated_at_exercise
VARCHARdatetime59.4%13,924,51883.7%2020-10-06 13:42:37.260637, 2018-10-22 15:59:01.459424, 2020-11-04 21:47:31.200636When was the exercise updated
last_graded_at
VARCHARdatetime⚠️ 0.1%37,905100.0%2022-03-11 17:25:49.697624When the assignment step was last graded

📝 Text (15 columns)

Free-form text columns

ColumnData TypeVariable TypeCompletenessUnique ValuesCardinalitySample ValuesDescription
title
VARCHARtext100.0%13,5440.1%module 1, 2.6, 2.7 reading, Homework 5Assignment title
research_identifier
VARCHARtext100.0%54,1380.2%r315a4f9d, rbfd1701b, r51bcbae0Anonymized student ID
title_plan
VARCHARtext98.1%13,5110.1%1D Kinematics (2.1 - 2.8) second chance, HW 2, Ch 5 Applications of Newton's LawAssignment title
description
VARCHARtext⚠️ 47.6%2,6800.0%The Immune System, Reading assignments are to be completed prior t..., Complete these questions to check your understa...Assignment instructions
settings
VARCHARtext98.1%17,0370.1%{"page_ids":["76887","76888","76889"]}, {"exercises":[{"id":"500581","points":[1]},{"id..., {"page_ids":["76964","76965","76966","76967","7...Instructor selections while creating assignment
url
VARCHARtext59.0%54,4180.3%https://exercises.openstax.org/exercises/16045@2, https://exercises.openstax.org/exercises/315@5, https://exercises.openstax.org/exercise/12113@7Exercise URL to retrieve question details from
free_response
VARCHARtext⚠️ 41.3%4,872,27142.1%When I push down on a table while I am on a sca..., UAG CAA CAU GAC AUC CUA UUU, 6.22*10^5Student response to the question
response_validation
VARCHARtext⚠️ 39.9%7,996,81871.7%{"attempts": [{"nudge": "Give it another shot",..., {"attempts": [{"nudge": "Take another chance", ..., {"attempts": [{"nudge": "Try again", "valid": t...Free response input quality evaluation
answer_ids
VARCHARtext59.4%159,6611.0%{300880,300881,300882,300883}, {224567,224568,224566,224565}, {299536,299537,299538,299539}Multiple-choice response options index
grader_comments
VARCHARtext⚠️ 0.1%6,07816.0%No samples availableGrader feedback to free-response grades
published_comments
VARCHARtext⚠️ 0.1%4,57517.1%No samples availableGrader feedback shared with students
page_url
VARCHARtext96.6%8,4210.0%https://archive.cnx.org/contents/8d50a0af-948b-..., https://archive.cnx.org/contents/405335a3-7cff-..., https://openstax.org/apps/archive/20210713.2056...Link to textbook content, not valid
tags_array
VARCHAR[]text59.4%72,1240.4%blooms:2, dok:2, k12phys-ch01-s02-lo03Array of all metadata tags associated with the exercise

📈 Summary Statistics

Dataset Overview

AttributeValue
Total Rows27,996,816
Total Columns95
Total Cells2,939,665,680
Profiling Time90.41 seconds
Profiling Speed32,515,040 cells/second

Column Types Distribution

Data TypeCountPercentage
VARCHAR6461.0%
INTEGER2523.8%
DOUBLE87.6%
BOOLEAN65.7%
TIMESTAMP WITH TIME ZONE11.0%
VARCHAR[]11.0%

Variable Types Distribution

Variable TypeCountPercentage
Identifier2826.7%
Categorical2422.9%
Datetime2221.0%
Text1514.3%
Boolean65.7%
Continuous54.8%
Discrete43.8%
Empty11.0%

Data Completeness

Completeness LevelColumn CountStatus
Complete (0% nulls)39
Mostly Complete (1-10% nulls)33
Partial (11-50% nulls)16⚠️
Sparse (51-90% nulls)6
Mostly Empty (>90% nulls)11

Overall Data Completeness: 80.3%

Cardinality Analysis

Cardinality indicates the uniqueness of values in each column.

Cardinality LevelColumn CountDescription
Very High (>95% unique)5Likely identifiers
High (50-95% unique)4High variability
Medium (10-50% unique)4Moderate variability
Low (<10% unique)91Categorical/Boolean

📖 Glossary

TermDefinition
CardinalityThe number of unique values in a column relative to total non-null values. High cardinality means many unique values.
CompletenessPercentage of non-null values in a column. Higher is better.
Data TypeThe technical storage type (e.g., INTEGER, VARCHAR, BOOLEAN).
IdentifierA column containing unique values that identify records (e.g., ID, UUID).
Missing ValuesNull, empty, or placeholder values (NA, null, empty string).
Null PercentageThe proportion of null/missing values in a column.
Sample ValuesExample values from the column to illustrate its contents.
Variable TypeThe semantic meaning of the column (categorical, continuous, etc.).

Generated by Data Profiler v5.2.3
Report Date: 2025-11-24 21:54:53
Status: ✅ All columns profiled successfully