Data Profile Report
Generated on November 24, 2025 at 09:54 PM
Overview
This dataset captures granular student activity data from the OpenStax Tutor learning platform, documenting every step within assigned tasks including exercise responses, reading interactions, completion timestamps, and grading information. It enables analysis of student learning behaviors, assignment effectiveness, and educational outcomes.
Unit of Analysis
- Task Step - Each row represents a single step within a student's assignment (e.g., one exercise question, one reading section, one video). A complete assignment contains multiple steps, and each student's assignment generates separate step records.
Scope & Filters
- Temporal: Tasks created between specified date range (time1 to time2)
- Course Type: Production courses only (excludes preview courses; test courses optional via qtest parameter)
- Task Types: Includes homework (0), reading (1), external (2), event (3), practice (4), chapter practice (5), page practice (6), mixed practice (7), and concept coach (9)
- Task Plans: Excludes preview assignment templates
Key Dimensions
Student Context
- Anonymized student identifiers (research_identifier)
- Course and period enrollment information
- Student demographics (name fields for authorized use)
Assignment Structure
- Task hierarchy:
- Assignment metadata (title, type, creation date)
- Content ecosystem version tracking
Step Content
- Polymorphic step types (exercises, readings, videos, interactives)
- Content references (pages, exercises) with book location
- Exercise-specific data (questions, answers, tags)
Student Responses (for exercise steps)
- Answer submissions and correctness
- Free-response text and grading
- Response validation and feedback
- Attempt tracking
Temporal Data
- Step creation timestamps
- First and last completion times
- Assignment lifecycle tracking
Content Metadata
- Learning objective (LO) and skill tags
- Book and page references with hierarchical location
- Exercise difficulty and classification tags
- Textbook ecosystem versioning
Primary Use Cases
- Learning Analytics: Student engagement patterns, time-on-task analysis
- Content Effectiveness: Exercise difficulty calibration, content performance
- Educational Research: Learning behavior studies, intervention analysis
- Instructor Insights: Assignment completion rates, common misconceptions
- Adaptive Learning: Personalization based on performance patterns
Data Granularity
- Finest grain: Individual step within an assignment (e.g., question 3 of homework 5)
- Aggregation potential: Roll up to assignment level, student level, course level, or content level
- Temporal resolution: Precise timestamps for creation and completion events
Coverage
All student activity on non-preview assignments within the specified date range, including:
- Complete and incomplete assignments
- All step types (exercises, readings, videos, etc.)
- Graded and ungraded work
- Core and supplemental content
Technical Notes
- Generated from PostgreSQL production database via parquet file exports
- Optimized for large-scale data processing (millions of rows)
- Preserves referential integrity across related tables
- Left joins preserve steps without exercises/pages (NULL values expected)
Executive Dashboard
🟢 Data Quality Grade: A (92/100)
| Metric | Value | Status |
|---|---|---|
| Total Rows | 27,996,816 | ✅ |
| Total Columns | 95 | ✅ |
| Completeness | 80.3% | ⚠️ |
| Columns with Issues | 1 | ⚠️ |
| Processing Time | 90.4s | ✅ |
🔍 Data Quality Alerts
❌ Critical Issues
These issues require immediate attention:
- dropped_at: 98.0% null values (sparse data)
- school_district_school_id: 96.2% null values (sparse data)
- withdrawn_at: 93.7% null values (sparse data)
- title_exercise: Column is completely empty (100% null)
- grader_points: 99.9% null values (sparse data)
- grader_comments: 99.9% null values (sparse data)
- last_graded_at: 99.9% null values (sparse data)
- free_response_grade: 99.9% null values (sparse data)
- published_comments: 99.9% null values (sparse data)
⚠️ Warnings
Review these columns for potential issues:
- fragment_index: 67.2% null values (partial data)
- role_type: Only 1 unique values in 27,996,816 rows
- description: 52.4% null values (partial data)
- updated_by_instructor_at: 63.1% null values (partial data)
- title_exercise: Only 0 unique values in 27,996,816 rows
- free_response: 58.7% null values (partial data)
- answer_id: 52.0% null values (partial data)
- response_validation: 60.1% null values (partial data)
- book_location: 337 unique categories (consider if this should be text/identifier)
- course_name: 920 unique categories (consider if this should be text/identifier)
- city: 313 unique categories (consider if this should be text/identifier)
- title_1_school: Only 1 unique values in 27,996,816 rows
📊 Column Profiles
Columns are organized by data type for easier navigation.
🔑 Identifier (28 columns)
Unique identifiers (IDs, keys, UUIDs)
| Column | Data Type | Variable Type | Completeness | Unique Values | Cardinality | Sample Values | Description |
|---|---|---|---|---|---|---|---|
| INTEGER | identifier | 100.0% | 1,325,391 | 4.7% | 1495846, 1640791, 2061782 | Assignment identifier for a specific student | |
| INTEGER | identifier | 100.0% | 24,886,892 | 88.9% | 14705499, 4923798, 39417216 | Unique id for a specific step for a specific assignment and specific student | |
| INTEGER | identifier | 98.1% | 28,019 | 0.1% | 214112, 199873, 128501 | Foreign key linking an individual student's task instance back to the instructor's original task plan (assignment template). This connects a student's specific assignment to the main plan that generated it. | |
| DOUBLE | identifier | 100.0% | 1,368 | 0.0% | 15534.0, 8142.0, 12789.0 | Unique identifier for teacher-created course on OpenStax Tutor | |
| INTEGER | identifier | 100.0% | 54,138 | 0.2% | 70579, 191533, 109791 | Unique identifier for a user's role within the system. This is the primary key that links users to their various roles (student, teacher, etc.) and connects them to their activities across courses. | |
| INTEGER | identifier | 100.0% | 1,368 | 0.0% | 3889, 10667, 16879 | Course section id from linking the learner’s enrollment to | |
| VARCHAR | identifier | 100.0% | 54,138 | 0.2% | 27741d9b-20a1-49a8-baf1-b96111f89b8d, 49fae85e-53e5-485b-8f5e-4a9e5533bfaa, c1b9e7e4-e0f6-434c-bdb2-7950932f68b3 | Foreign key identifying which course this record belongs to | |
| INTEGER | identifier | 100.0% | 1,725 | 0.0% | 11951, 23255, 32965 | Course-period id in which the student was enrolled | |
| INTEGER | identifier | ⚠️ 3.8% | 16 | 0.0% | 1291, 1297, 1102 | Identifier for the school district for a specific OpenStax course | |
| INTEGER | identifier | 98.1% | 1,368 | 0.0% | 8538, 7788, 8563 | Unique identifier for the task plan that generated this task. A task plan is the instructor's assignment template/blueprint that defines what content should be assigned, to which students, and when. | |
| INTEGER | identifier | 98.1% | 2,686 | 0.0% | 25525, 10008, 9658 | Teacher created grading template | |
| INTEGER | identifier | 59.4% | 139,456 | 0.8% | 544259, 585359, 333322 | , the canonical question record in used for the step | |
| VARCHAR | identifier | ⚠️ 48.0% | 185,631 | 1.4% | 393425, 316849, 371854 | Index of the correct response | |
| VARCHAR | identifier | 59.1% | 55,496 | 0.3% | 368103, 381186, 596529 | Index of the correct response | |
| VARCHAR | identifier | 59.4% | 57,355 | 0.3% | 94083, 96743, 147007 | Internal numeric identifier for questions | |
| VARCHAR | identifier | 59.4% | 16,642,469 | 100.0% | 3aa054c8-24fb-4ea3-8acc-645d75a569c2, d5c46ded-fdfd-469e-b534-4eedbda9c7cb, efc2674c-fad7-48e5-afc9-31ca35d26b4f | Internal unique identifier for exercises | |
| VARCHAR | identifier | 100.0% | 1,725 | 0.0% | a85fcd18-de0d-4ba0-b6ee-024b189f158e, 1c4c4018-16a8-4695-b826-0254d8a74a8b, 350c48f6-8063-4193-a750-5d1af43bf2c5 | Internal unique identifier for course periods created on OpenStax Tutor | |
| VARCHAR | identifier | 99.2% | 1,360 | 0.0% | fe9c820c-1816-4564-b46c-6a3f3cdee2cd, dfd8ce33-f709-4f85-bea0-8ac27a7e1f1f, 44b48e09-cb5a-4dee-85ed-3053b29caba7 | Internal unique identifier for courses created on OpenStax Tutor |
📑 Categorical (24 columns)
Categorical variables with limited distinct values
| Column | Data Type | Variable Type | Completeness | Unique Values | Cardinality | Sample Values | Description |
|---|---|---|---|---|---|---|---|
| VARCHAR | categorical | 100.0% | 6 | 0.0% | Reading, Placeholder, Video, Interactive, Exercise ... (6 total levels) | Type of activity for the assignment step | |
| VARCHAR | categorical | 100.0% | 3 | 0.0% | core, personalized, spaced_practice | Type of intervention/formative assessment | |
| VARCHAR | categorical | 100.0% | 9 | 0.0% | ["review"], [], ["phet-explorations"], ["example"], ["interactive"] ... (9 total levels) | Type of activity as indicated by the book processing instructions | |
| INTEGER | categorical | ⚠️ 32.8% | 21 | 0.0% | 2, 3, 4, 8, 7 ... (21 total levels) | Tutor assignment splits the page into fragments and the index denotes which fragment the step is associated with | |
| VARCHAR | categorical | 100.0% | 5 | 0.0% | homework, page_practice, practice_worst_topics, reading | The type of assignment | |
| INTEGER | categorical | 100.0% | 4 | 0.0% | 4, 3, 5, 2 | Academic semester, 2 = Spring, 3 = Summer, 4 = Fall, 5 = Winter | |
| INTEGER | categorical | 100.0% | 7 | 0.0% | 2021, 2023, 2024, 2022, 2019 ... (7 total levels) | Calendar year | |
| INTEGER | categorical | 98.1% | 26 | 0.0% | 0, 12, 7, 5, 6 ... (26 total levels) | Written response question count | |
| INTEGER | categorical | 59.4% | 8 | 0.0% | 2, 0, 1 ... (8 total levels) | Multi-part question index, few questions of this type existed | |
| INTEGER | categorical | 59.4% | 9 | 0.0% | 0, 2, 1 ... (9 total levels) | How many times a student responds to an assignment step. corresponds to reading assignments, 0 is the default when an exercise is assigned but not completed. When a student completes the exercise, attempt number increases by 1 with every time that it is attempted. | |
| VARCHAR | categorical | 96.6% | 337 | 0.0% | [44,1], [4,4], [29,1], [29,4], [27,9] ... (337 total levels) | Chapter, section of OpenStax textbook | |
| VARCHAR | categorical | 96.6% | 11 | 0.0% | College Physics for AP® Courses, Anatomy and Physiology, U.S. History, Biology for AP® Courses, Biology ... (11 total levels) | Name of the book | |
| VARCHAR | categorical | 96.6% | 52 | 0.0% | 8.12, 15.1, 6.202, 7.3, 4.2 ... (52 total levels) | Version of the book | |
| VARCHAR | categorical | 100.0% | 6 | 0.0% | exercise, reading, interactive, placeholder, video ... (6 total levels) | Type of the part of the assignment | |
| VARCHAR | categorical | 99.2% | 26 | 0.0% | 1/1/18, 5/1/18, 1/1/23, 4/1/23, 1/1/24 ... (26 total levels) | When the course starts | |
| VARCHAR | categorical | 99.2% | 9 | 0.0% | AP Physics, US History, College Physics (Algebra), Psychology, Biology ... (9 total levels) | Book Name | |
| VARCHAR | categorical | 99.2% | 920 | 0.0% | Elements of Physics I, Introductory Physics II, Biology 182, PHYS 1112 Introductory Physics 2, Elements of Physics II section 11 ... (920 total levels) | Instrutor assigned name of course | |
| VARCHAR | categorical | 99.2% | 6 | 0.0% | K-12 School, High School, College/University (4), Technical/Community College (2), Home School ... (6 total levels) | Type of institution, comes from our business data and are verified to the best of our knowledge | |
| VARCHAR | categorical | 99.2% | 313 | 0.0% | Glenside, Chalmette, Espanola, Bluffton, San Francisco ... (313 total levels) | City of use | |
| VARCHAR | categorical | 99.1% | 51 | 0.0% | California, North Carolina, Iowa, Wisconsin, Kentucky ... (51 total levels) | State of use | |
| VARCHAR | categorical | 99.2% | 2 | 0.0% | United States, United States Territory | Country of use |
📈 Continuous (5 columns)
Continuous numeric variables (decimals, floats)
| Column | Data Type | Variable Type | Completeness | Unique Values | Min | Max | Mean | Sample Values | Description |
|---|---|---|---|---|---|---|---|---|---|
| DOUBLE | continuous | ⚠️ 0.1% | 59 | 0 | 12 | 1.66 | 1.0, 2.0 | Instructor/Teaching Assistant assigned points for each assignment step but may not be published | |
| DOUBLE | continuous | ⚠️ 0.1% | 54 | 0 | 12 | 1.7 | Grader points when published |
🔢 Discrete (4 columns)
Discrete numeric variables (integers, counts)
| Column | Data Type | Variable Type | Completeness | Unique Values | Min | Max | Mean | Sample Values | Description |
|---|---|---|---|---|---|---|---|---|---|
| INTEGER | discrete | 100.0% | 1 | 3 | 3 | 3 | 3 | Number that corresponds to 2 = instructor, 3 = students, 4 = teacher viewing as student | |
| INTEGER | discrete | 98.1% | 93 | 0 | 341 | 0.64 | 13, 38, 0 | Number of assignment steps that are left to be manually graded | |
| INTEGER | discrete | 98.1% | 178 | 0 | 727 | 2.06 | 32, 23, 64 | Total number of assignment steps that are eligible to be manually graded |
☑️ Boolean (6 columns)
Boolean variables (true/false)
| Column | Data Type | Variable Type | Completeness | Unique Values | Cardinality | Sample Values | Description |
|---|---|---|---|---|---|---|---|
| BOOLEAN | boolean | 100.0% | 2 | 0.0% | true, false | Any steps that an instructor selected as core, as well as questions at the end of reading, that all students enrolled in the course had to complete | |
| BOOLEAN | boolean | 100.0% | 1 | 0.0% | false | The course/assignment is for preview | |
| BOOLEAN | boolean | 100.0% | 1 | 0.0% | false | It is a test course | |
| BOOLEAN | boolean | 77.9% | 2 | 0.0% | true, false | If it is a college course | |
| BOOLEAN | boolean | 98.1% | 1 | 0.0% | false | If assignment is for preview | |
| BOOLEAN | boolean | 59.4% | 2 | 0.0% | false, true | If the question is part of a multi-part |
📅 Datetime (22 columns)
Date and time columns
| Column | Data Type | Variable Type | Completeness | Unique Values | Cardinality | Sample Values | Description |
|---|---|---|---|---|---|---|---|
| VARCHAR | datetime | 68.5% | 19,187,762 | 100.0% | 2021-10-03 18:44:10.275544, 2021-09-08 03:03:32.871312, 2019-02-20 20:40:24.434679 | Timestamp from capturing when the step was first finished by the student | |
| VARCHAR | datetime | 68.5% | 19,187,770 | 100.0% | 2018-10-26 01:27:23.855325, 2019-04-15 19:07:12.109664, 2018-11-06 06:24:48.643782 | , i.e., the most recent completion time for the step | |
| VARCHAR | datetime | 100.0% | 986,532 | 3.5% | 2023-03-24 00:54:36.306526, 2022-01-23 21:24:00.93717, 2021-01-25 18:17:16.641405 | Row creation time for the step record ( ) | |
| VARCHAR | datetime | 100.0% | 19,663,182 | 70.2% | 2021-09-29 02:12:37.022564, 2019-01-24 22:05:25.188974, 2021-10-28 02:10:51.110241 | Last update timestamp for the step row ( ) | |
| TIMESTAMP WITH TIME ZONE | datetime | 100.0% | 621,940 | 2.2% | 2022-09-15 03:30:52.468544+00, 2021-10-28 01:42:06.328193+00, 2019-09-24 17:36:19.947157+00 | Parent assignment creation time from | |
| VARCHAR | datetime | 100.0% | 54,138 | 0.2% | 2020-02-02 13:04:16.631878, 2021-02-05 17:23:39.246093, 2021-01-25 08:04:16.53745 | Timestamp when the learner’s record was inserted | |
| VARCHAR | datetime | 100.0% | 54,138 | 0.2% | 2021-08-30 22:53:30.280348, 2021-12-16 16:25:56.025706, 2019-05-22 13:58:35.870204 | Last update time for that row | |
| VARCHAR | datetime | 100.0% | 47,121 | 0.2% | 2019-09-02 19:31:17.349562, 2021-07-05 15:13:11.590168, 2020-01-25 21:40:24.847115 | Creation time for the linked OpenStax account in | |
| VARCHAR | datetime | 100.0% | 47,121 | 0.2% | 2021-05-13 18:57:40.071454, 2020-07-06 23:58:56.751613, 2023-08-25 17:22:29.451434 | When the OpenStax account profile was updated | |
| VARCHAR | datetime | ⚠️ 2.0% | 1,516 | 0.3% | 2019-02-05 20:55:08.849286, 2019-11-07 21:24:36.504898, 2022-09-19 15:38:11.75522 | Student dropping out of course | |
| VARCHAR | datetime | 100.0% | 54,138 | 0.2% | 2021-06-16 15:37:58.362784, 2018-08-14 14:48:47.963283, 2019-01-09 04:18:07.495438 | Timestamp when the student’s enrollment row was created in | |
| VARCHAR | datetime | 100.0% | 54,138 | 0.2% | 2020-02-07 21:02:38.41562, 2022-01-10 18:56:04.076358, 2022-01-04 04:36:40.124625 | Last modification time for that record | |
| VARCHAR | datetime | 98.1% | 28,019 | 0.1% | 2018-09-26 15:02:52.172843, 2022-02-01 02:39:28.265445, 2020-09-05 01:42:50.605016 | When an instructor last requested to publish the task plan ( ) | |
| VARCHAR | datetime | 98.1% | 28,019 | 0.1% | 2021-08-26 19:07:06.782938, 2022-01-06 22:46:57.852721, 2020-06-17 01:12:30.329702 | First time the plan was successfully published ( ) | |
| VARCHAR | datetime | 98.1% | 28,019 | 0.1% | 2022-03-22 16:11:59.587938, 2020-05-28 04:14:44.986028, 2023-04-21 12:53:12.472326 | Plan creation timestamp from | |
| VARCHAR | datetime | 98.1% | 21,887 | 0.1% | 2020-06-24 12:38:37.784231, 2018-03-27 14:16:59.144611, 2018-09-07 22:28:32.240547 | Most recent update to the plan metadata ( ) | |
| VARCHAR | datetime | ⚠️ 6.3% | 1,460 | 0.1% | 2022-09-06 22:58:18.844438, 2022-04-08 21:33:45.875596, 2018-09-13 03:23:18.767757 | When was the assignment withdrawn by the instructor | |
| VARCHAR | datetime | 98.1% | 20,616 | 0.1% | 2022-07-08 21:50:16, 2020-06-26 04:42:38.040295, 2019-05-06 16:49:56.640985 | Timestamp for the latest publish action on the task plan ( ) | |
| VARCHAR | datetime | ⚠️ 36.9% | 12,932 | 0.1% | 2022-02-19 00:13:13.459835, 2021-08-10 18:47:47.197136, 2022-11-30 14:59:24.258577 | When was the assignment updated by instructor | |
| VARCHAR | datetime | 59.4% | 3,072,812 | 18.5% | 2021-09-22 16:43:37.548762, 2021-06-28 12:49:04.415223, 2021-09-20 16:41:09.351951 | When was the exercise created | |
| VARCHAR | datetime | 59.4% | 13,924,518 | 83.7% | 2020-10-06 13:42:37.260637, 2018-10-22 15:59:01.459424, 2020-11-04 21:47:31.200636 | When was the exercise updated | |
| VARCHAR | datetime | ⚠️ 0.1% | 37,905 | 100.0% | 2022-03-11 17:25:49.697624 | When the assignment step was last graded |
📝 Text (15 columns)
Free-form text columns
| Column | Data Type | Variable Type | Completeness | Unique Values | Cardinality | Sample Values | Description |
|---|---|---|---|---|---|---|---|
| VARCHAR | text | 100.0% | 13,544 | 0.1% | module 1, 2.6, 2.7 reading, Homework 5 | Assignment title | |
| VARCHAR | text | 100.0% | 54,138 | 0.2% | r315a4f9d, rbfd1701b, r51bcbae0 | Anonymized student ID | |
| VARCHAR | text | 98.1% | 13,511 | 0.1% | 1D Kinematics (2.1 - 2.8) second chance, HW 2, Ch 5 Applications of Newton's Law | Assignment title | |
| VARCHAR | text | ⚠️ 47.6% | 2,680 | 0.0% | The Immune System, Reading assignments are to be completed prior t..., Complete these questions to check your understa... | Assignment instructions | |
| VARCHAR | text | 98.1% | 17,037 | 0.1% | {"page_ids":["76887","76888","76889"]}, {"exercises":[{"id":"500581","points":[1]},{"id..., {"page_ids":["76964","76965","76966","76967","7... | Instructor selections while creating assignment | |
| VARCHAR | text | 59.0% | 54,418 | 0.3% | https://exercises.openstax.org/exercises/16045@2, https://exercises.openstax.org/exercises/315@5, https://exercises.openstax.org/exercise/12113@7 | Exercise URL to retrieve question details from | |
| VARCHAR | text | ⚠️ 41.3% | 4,872,271 | 42.1% | When I push down on a table while I am on a sca..., UAG CAA CAU GAC AUC CUA UUU, 6.22*10^5 | Student response to the question | |
| VARCHAR | text | ⚠️ 39.9% | 7,996,818 | 71.7% | {"attempts": [{"nudge": "Give it another shot",..., {"attempts": [{"nudge": "Take another chance", ..., {"attempts": [{"nudge": "Try again", "valid": t... | Free response input quality evaluation | |
| VARCHAR | text | 59.4% | 159,661 | 1.0% | {300880,300881,300882,300883}, {224567,224568,224566,224565}, {299536,299537,299538,299539} | Multiple-choice response options index | |
| VARCHAR | text | ⚠️ 0.1% | 6,078 | 16.0% | No samples available | Grader feedback to free-response grades | |
| VARCHAR | text | ⚠️ 0.1% | 4,575 | 17.1% | No samples available | Grader feedback shared with students | |
| VARCHAR | text | 96.6% | 8,421 | 0.0% | https://archive.cnx.org/contents/8d50a0af-948b-..., https://archive.cnx.org/contents/405335a3-7cff-..., https://openstax.org/apps/archive/20210713.2056... | Link to textbook content, not valid | |
| VARCHAR[] | text | 59.4% | 72,124 | 0.4% | blooms:2, dok:2, k12phys-ch01-s02-lo03 | Array of all metadata tags associated with the exercise |
📈 Summary Statistics
Dataset Overview
| Attribute | Value |
|---|---|
| Total Rows | 27,996,816 |
| Total Columns | 95 |
| Total Cells | 2,939,665,680 |
| Profiling Time | 90.41 seconds |
| Profiling Speed | 32,515,040 cells/second |
Column Types Distribution
| Data Type | Count | Percentage |
|---|---|---|
| VARCHAR | 64 | 61.0% |
| INTEGER | 25 | 23.8% |
| DOUBLE | 8 | 7.6% |
| BOOLEAN | 6 | 5.7% |
| TIMESTAMP WITH TIME ZONE | 1 | 1.0% |
| VARCHAR[] | 1 | 1.0% |
Variable Types Distribution
| Variable Type | Count | Percentage |
|---|---|---|
| Identifier | 28 | 26.7% |
| Categorical | 24 | 22.9% |
| Datetime | 22 | 21.0% |
| Text | 15 | 14.3% |
| Boolean | 6 | 5.7% |
| Continuous | 5 | 4.8% |
| Discrete | 4 | 3.8% |
| Empty | 1 | 1.0% |
Data Completeness
| Completeness Level | Column Count | Status |
|---|---|---|
| Complete (0% nulls) | 39 | ✅ |
| Mostly Complete (1-10% nulls) | 33 | ✅ |
| Partial (11-50% nulls) | 16 | ⚠️ |
| Sparse (51-90% nulls) | 6 | ❌ |
| Mostly Empty (>90% nulls) | 11 | ❌ |
Overall Data Completeness: 80.3%
Cardinality Analysis
Cardinality indicates the uniqueness of values in each column.
| Cardinality Level | Column Count | Description |
|---|---|---|
| Very High (>95% unique) | 5 | Likely identifiers |
| High (50-95% unique) | 4 | High variability |
| Medium (10-50% unique) | 4 | Moderate variability |
| Low (<10% unique) | 91 | Categorical/Boolean |
📖 Glossary
| Term | Definition |
|---|---|
| Cardinality | The number of unique values in a column relative to total non-null values. High cardinality means many unique values. |
| Completeness | Percentage of non-null values in a column. Higher is better. |
| Data Type | The technical storage type (e.g., INTEGER, VARCHAR, BOOLEAN). |
| Identifier | A column containing unique values that identify records (e.g., ID, UUID). |
| Missing Values | Null, empty, or placeholder values (NA, null, empty string). |
| Null Percentage | The proportion of null/missing values in a column. |
| Sample Values | Example values from the column to illustrate its contents. |
| Variable Type | The semantic meaning of the column (categorical, continuous, etc.). |