Member org info
Digital Learning Platform (DLP) Overview
DLP name - (Add DLP Name) ASSISTments
DLP headquarters location - (City, State, Country) MA, United States
DLP description
| ASSISTments is a web-based platform that delivers K–12 mathematics practice integrated with formative assessment. Its purpose is to support student learning by providing immediate feedback, hints, and opportunities for revision while helping teachers monitor progress. The platform’s goals are to improve instructional effectiveness, personalize learning, and generate large-scale data that advance research on teaching and learning. |
|---|
DLP data set description - “big picture” overview of the data available (max 75 words)
| ASSISTments provides multimodal, multi-year integrated datasets that combine content, assignment, and student interaction data to capture the full learning process, reflecting how students learn, how well they perform, and the context in which learning occurs in K–12 math practice across several Common Core–aligned curricula. The data include problem content, assignment information, and detailed student interaction logs documenting submissions, actions, and system events across instructional activities. |
|---|
Platform type - Select yes to all that apply
- Learning Management System (LMS) - No
- Student Information System (SIS) - No
- Online digital curriculum - Yes
- Online educational activity/practice platform - Yes
- Graded activity - Yes
- Game-based learning system - No
- Math learning platform - Yes
- Reading platform - No
- Other subject-specificNo
- Special education system - No
- Assessment system - Yes
- Administrative system - No
- Classroom management - No
- Other administrative data - No
Setting Types - Select yes to all that apply
- Formal classroom - Yes
- Online/virtual - Yes
- Hybrid/blended - Yes
- Self-directed study - No
- Workplace training - No
- Other (specify) - No
System demo video (Screen recording(s), website(s), or other video(s) demonstrating how the platform works, separated by commas) - What is ASSISTments, ASSISTments youtube page,
Education Levels - Select yes to all that apply
- Early childhood (birth to PreK) - (Yes/No)
- K-12 - Yes
- K-5 (elementary) - Yes
- 6-8 (middle) - Yes
- 9-12 (secondary) - Yes
- Higher Ed - Yes
- Undergraduate - Yes
- Graduate - No
- Post-Secondary Non-Degree Seeking - No
- Continuing Education (courses) - No
- Informal Learning (tasks, lessons, or activities) - No
- Professional Development - No
- Certifications for Skills (more extensive courses or classes for broad skills) - No
- Micro-credentials (relatively brief courses or training for specific skills) - No
- Workforce (on-the-job training) - No
Primary Subject Areas Covered - Select yes to all that apply
- STEM - Yes
- Science - No
- Technology/Computer Science - No
- Engineering - No
- Mathematics/Statistics - Yes
- Humanities/Arts - No
- Arts - No
- Language Arts/Literacy - No
- Foreign Languages - No
- Literature - No
- Other - No
- Social Sciences - No
- Psychology - No
- Sociology - No
- Economics - No
- Political Science - No
- History - No
- Other - (Yes/NNo
- Interdisciplinary Studies - No
- Physical Education/Health - No
- Career and Technical Education - No
- Professional Fields - No
- Engineering - No
- Medicine/Healthcare - No
- Law - No
- Business/Economics - No
- Other - No
Institutional characteristics
(if K-12 is selected) Education Sites
- Context
- Urban - Yes
- Rural - Yes
- Suburban - Yes
- Online - Yes
- Multi-site - Yes
- Type of School
- Private - Yes
- Public - Yes
- Charter - Yes
- Homeschool - No
- Other
- Informal education - No
- Other (TEXT) - No
(If Higher Ed is selected) Higher Education
- Type
- Minority-Serving Institution - No
- Hispanic-Serving Institution - No
- HBCU (Historically Black College or University) - No
- Tribally-Controlled College or University - No
- Other Minority-Serving Institution (e.g., Asian American and Native American Pacific Islander-Serving Institutions, Predominantly Black Institutions, Alaska Native and Native Hawaiian-Serving Institutions) - No
- Minority-Serving Institution - No
| 2-year College | 4-year College/University | |
|---|---|---|
| Public | No (e.g., Community College, City Colleges) | No |
| Private | No (e.g., Junior College) | No |
| Technical College | No |
- Trade or Vocational - No
- Mode of instruction
- In-person (100% of normal instruction occurs in person) - Yes
- Online - Yes
- Hybrid - Yes
- International (non-US) - No
- Cross-institutional - No
Geographic location (states and countries) of participants when they interact with the DLP
US States - (Yes/No) (Select all that apply)
- All 50 US States - Yes
- Alabama - Yes
- Alaska - Yes
- Arizona - Yes
- Arkansas - Yes
- California - Yes
- Colorado - Yes
- Connecticut - Yes
- Delaware - Yes
- District of Columbia - Yes
- Florida - Yes
- Georgia - Yes
- Hawaii - Yes
- Idaho - Yes
- Illinois - Yes
- Indiana - Yes
- Iowa - Yes
- Kansas - Yes
- Kentucky - Yes
- Louisiana - Yes
- Maine - Yes
- Maryland - Yes
- Massachusetts - Yes
- Michigan - Yes
- Minnesota - Yes
- Mississippi - Yes
- Missouri - Yes
- Montana - Yes
- Nebraska - Yes
- Nevada - Yes
- New Hampshire - Yes
- New Jersey - Yes
- New Mexico - Yes
- New York - Yes
- North Carolina - Yes
- North Dakota - Yes
- Ohio - Yes
- Oklahoma - Yes
- Oregon - Yes
- Pennsylvania - Yes
- Rhode Island - Yes
- South Carolina - Yes
- South Dakota - Yes
- Tennessee - Yes
- Texas - Yes
- Utah - Yes
- Vermont - Yes
- Virginia - Yes
- Washington - Yes
- West Virginia - Yes
- Wisconsin - Yes
- Wyoming - Yes
US territories - No
- Puerto Rico - No
- US Virgin Islands - No
- Guam - No
- American Samoa - No
- Northern Mariana Islands - No
Other Countries: Country names (separate with commas)
Dataset Size and Complexity
- Number of Participants - (Total unique participants to date)
| K12 | Postsec | |
|---|---|---|
| Students | 1,300,000 | |
| Instructors | 40,000 | |
| Courses | 130,000 | |
| School districts/Postsecondary institutions | 8,800 |
Coverage relative to your platform’s active population -
Reporting window - (YYYY-MM-DD to YYYY-MM-DD)
Students in this reporting window represent
- Data Collection Period - (2018-10-08 - Current Date)
- Data Volume - (1.5 TB)
- Missing Data Percentage - ~10%
Programming Languages Supported
- Python
- Using Athena to query iceberg formatted S3
Example research questions/ DLP research agenda
- (Example: "How does time spent on practice problems correlate with test scores?")
- (Example: "What learning patterns predict student dropout risk?")
- (Example: "How do different feedback types affect student engagement?")
- (Add 3-5 specific research questions)
- How can we accurately estimate students’ mastery of learning?
- How does answering open-ended questions correlate with subsequent learning?
- How do different feedback types affect student engagement and learning?
- How well does knowledge transfer across skills?
- How well do LLMs perform in scoring open-ended responses and providing effective feedback?
Access Type
How researchers can obtain and use the edtech platform's data
- Open access (publicly available) - No
- Restricted access - Yes
- Tiered access - No
- Application required - Yes
- Partnership required - No
- Commercial license - No
Usage Rights
what researchers can legally do with the data once they have access
- Research - Yes
- Educational use - Yes
- Commercial use allowed (of findings) - No
- Restricted sharing - No
- Publication rights - Yes
- Sharing restrictions - (Yes/No - describe)
- Attribution requirements (Specify citation format) - (Yes/No - specify)
- Unrestricted - No
Quantitative Data Types Available
- Assessments - Yes
- Formative Assessments - Yes
- Summative Assessments - Yes
- Performance scores - Yes
- Time-based metrics (e.g., time on task) - Yes
- Frequency counts (e.g., frequency of logins) - Yes
- Rating scales - Yes
- Physiological measures (e.g., eye tracking, heart rate) - No
- Clickstream data - Yes
- Sensor data - No
Qualitative Data Types Available
- Text responses - Yes
- Interview transcripts - No
- Observational notes - No
- Video/Audiovisual recordings - No
- Audio recordings - No
- Images/artifacts - Yes
- Drawings/creative work - Yes
Rubric based on which member will review (Placeholder, will add specifics later)
- Study proposal
- Significance (for the platform, for the field)
- Research design (feasibility, soundness, etc.)
- Talent (right people to do this? And make sense of it?)
- Resources (have the necessary resources to do the work?)
- Code
- Outputs
Member data info
Dataset Title - ASSISTments data lake
Dataset Description - A semi-real-time updated read-optimized copy of the ASSISTments platform database with math education data spanning content (e.g. math problems), usage (e.g. times content was assigned to students), and assessment performance.
Dataset ID - assistments-data-lake-v1.0.0
Owner/Maintainer Name - Neil Heffernan
Owner/Maintainer Contact - nth@wpi.edu
Owner/Maintainer ORC ID/OSF ID (optional) -
DLP Research Organization Registry (ROR) ID (Optional) - https://ror.org/123ab4567
Last Updated - (YYYY-MM-DD)
Update cadence - (daily)
Data versioning policy - Data is updated daily
Data Format - (Iceberg format in AWS S3)
Accessing Data within Enclave - (Instructions to be developed with the Rice SafeInsights Team)
Demo/Sample Access - Work in progress
Documentation Quality - (Minimal - work in progress)
Cost Estimation for Queries - To be estimated during enclave development
Data Lineage and Dependencies
Source Systems - Data is sourced from the ASSISTments platform postgres database in AWS rds, stored in AWS us-east-1 region
Transformation Process - The dataset is populated via an ETL pipeline using Airflow in AWS to transform and populate the data lake in S3 on a nightly cadence
Dependencies - N/A
Downstream Usage - This data is subsequently transformed into the 'gold layer' of the data lake, which contains bespoke copies of data transformed to optimize for specific purposes, some of which are later streamed back to the application database for in-app consumption, and others are used for reporting or model training.
AI features - (Yes/No)
Example research questions (When possible) -
- How can we accurately estimate students’ mastery of learning?
- How does answering open-ended questions correlate with subsequent learning?
- How do different feedback types affect student engagement and learning?
- How well does knowledge transfer across skills?
- How well do LLMs perform in scoring open-ended responses and providing effective feedback?
Technical Specifications
Dataset Size and Complexity -
- Number of Participants - 1,300,000 total users
- Number of Variables - 3,500 columns
- Number of Records - 2 billion rows
- Data Volume - 1.5TB
- Contextual Notes - Data is updated daily from live db ETL pipeline
Data Structure
- Total Tables/Files - 200 tables
- Total Records - 2 billion rows
- Dataset Size - 1.5TB
- Update Frequency - Daily
- Partitioning - data is organized into cas_content, and cas_core
- cas_content stores the math content which teachers assign and students complete
- cas_core stores all the users and interaction data (e.g. times content was assigned and records of student performance)
Temporal Coverage -
- Date Range - 2018-10-08 to current date
- Academic Years - 2018-19 to current school year
- Reporting Periods - (Annual/Semester/Quarter/Monthly/Other (Text))
Geographic Coverage -
- Geographic Scope - (National/State/District/School level)
- Number of Regions - (Count of regions covered)
- Coverage Gaps - N/A
Data Availability
Privacy Level -
- Fully anonymized (All identifying details have been removed) - No
- De-identified (Direct identifiers have been removed) - No
- Pseudonymized (Direct identifiers have been replaced with pseudonyms) - No
- Identifiable (restricted) - Yes
- Other - No
Technical Specifications
Dataset Size and Complexity -
- Number of Participants - 1,300,000 total users
- Number of Variables - 3,500 columns
- Number of Records - 2 billion rows
- Data Volume - 1.5TB
- Contextual Notes - Data is updated daily from live db ETL pipeline
Data Structure
- Total Tables/Files - 200 tables
- Total Records - 2 billion rows
- Dataset Size - 1.5TB
- Update Frequency - Daily
- Partitioning - data is organized into cas_content, and cas_core
- cas_content stores the math content which teachers assign and students complete
- cas_core stores all the users and interaction data (e.g. times content was assigned and records of student performance)
Temporal Coverage -
- Date Range - 2018-10-08 to current date
- Academic Years - 2018-19 to current school year
- Reporting Periods - (Annual/Semester/Quarter/Monthly/Other (Text))
Geographic Coverage -
- Geographic Scope - (National/State/District/School level)
- Number of Regions - (Count of regions covered)
- Coverage Gaps - (Any notable geographic exclusions)
Temporal Granularity Options -
- Real-time/continuous - No
- Per interaction/click - (Yes/No)
- Per session/class - (Yes/No)
- Daily - (Yes/No)
- Weekly - (Yes/No)
- Monthly - (Yes/No)
- Semester/term - (Yes/No)
- Academic year - (Yes/No)
- Multi-year - (Yes/No)
Unit of Analysis Levels -
- Individual learner - (Yes/No)
- Event level - (Yes/No)
- Learning group/team - (Yes/No)
- Classroom/cohort - (Yes/No)
- Course/program - (Yes/No)
- Institution - (Yes/No)
- District/system - (Yes/No)
- Regional/national - (Yes/No)
Data Availability
Privacy Level -
- Fully anonymized (All identifying details have been removed) - No
- De-identified (Direct identifiers have been removed) - No
- Pseudonymized (Direct identifiers have been replaced with pseudonyms) - No
- Identifiable (restricted) - Yes
- Other - (Yes/No) (Specify)
Data Characteristics
Table/File Inventory
ASSISTments data has three major categories:
-
Core Data describes ids, users and courses
- ID Tables
- cas_core.core.external_references - stores uuids pointing to other records e.g. users, assignments, used in the API
- User Tables
- cas_core.users.users - one row for each user with email address etc
- cas_core.master_users.t_setting - one row for each teacher user with additional information about their usage
- Courses
- cas_core.groups_principal_group_definitions
- Represents a course as a whole
- cas_core.groups_principal_group_memberships
- Designates a (student) user as a member of a course
- cas_core.groups_principal_group_definitions
- ID Tables
-
Content represents the individual assessment items (Problems), how they are organized into digital worksheets to be delivered to students (Problem Sets) and supplementary/supporting content used to assist students with completing individual problems (Tutor Strategies)
- Problem Tables
- cas_content.core.problems
- cas_content.core.answer_set_definitions
- cas_content.core.answer_set_memberships
- cas_content.core.answer_parts
- cas_content.core.answer_values
- Problem Set Tables
- cas_content.core.problem_set_definitions
- cas_content.core.problem_set_memberships
- Problem Tables
-
Assessment data describes instances where teachers assigned Problem Sets to one or more students (Assignments), as well as log data describing students performance on their assignments (Assignment Logs, Problem Logs, and Actions)
- cas_core.core.assignments
- Represents an instance where an assessment was delivered to a group of students (e.g a course) or individual student
- cas_core.student_data.assignment_logs
- Summarizes one student’s performance on an entire assignment
- cas_core.student_data.problem_logs
- Summarizes one student’s performance on a single problem (assessment item) within an assignment
- cas_core.student_data.actions
- Describes an individual action (e.g. submitted answer, requested hint) made by one user on one
- cas_core.core.assignments
| Table/File Name | Description | Record Count | Primary Key | Related Tables |
|---|---|---|---|---|
| cas_core.core.external_references | 1 or more rows per user and assignment, assigning them a UUID. Users can have multiple external references if they interact with different ASSISTments Services, one per partner_id. Generally you will want the external_reference row with partner_id=5 (TNG) | 7.5 million | id - Serial |
|
| cas_core.users.users | 1 Row for each user, including student users | 1.5 million | id - Serial | Which users own or are members of which courses is captured in the principal groups tables (groups.principal_groups_definitions and groups.principal_groups_memberships) |
| cas_core.master_user.t_setting | One record for each teacher user. Points to a cas_coreusers.users row and additionally describes the teacher's usage and demographics | 200K | id - Serial | teacher_id references cas_core.users.users(id) |
| cas_core.groups.principal_group_definitions | Each row represents a group of either users (e.g. a course) or of other groups | 165K | id - Serial | Referenced by cas_core.groups_principal_group_memberships |
| cas_core.groups.principal_group_memberships | Each row describes one member (either user or other group) as being a member of a designated group | 2 million | id - Serial |
|
| cas_core.core.assignments | 1 Row for each time a problem set was assigned to a student or group of students/course. | 2 million | id - Serial |
|
| cas_core.student_data.assignment_logs | 1 Row for student's summarized performance on 1 assignment (if 20 students work on 2 assignments, that creates 40 assignment_logs) | 28.5 Million | id - Serial |
|
| cas_core.student_data.problem_logs | 1 Row for student's summarized performance on 1 problem within an assignment (if 20 students work on 2 assignments that each contain 10 problems, that creates 400 problem_logs if each student at least starts each problem) | 215 Million | id - Serial |
|
| cas_core.student_data.actions | 1 Row for each action (e.g. answer submitted, hint requested) taken by a student while working on an assignment | 215 Million | id - Serial |
|
Schema Information
Table -
Description - Contains uuid's that are mapped to records in other tables across the system and which are reciprocally referenced by other records.
Primary Use - Mainly used to cross reference data as a central record which multiple others will point to. The
Data Schema
| Column/Entity Name | Data Type | Description | Example Values | Null handling (how does the data manage and respond to the absence of a value) | Primary Key | Foreign Key |
|---|---|---|---|---|---|---|
| integer primary key | N/A | Yes | ||||
| text UUID being applied to the record | N/A | No | ||||
|
Integer id of which backend API the UUID is associated with for the given record
(e.g. a user can have up to one xref per partner). Generally you will want to look
for xrefs with corresponding to the TNG partner.
| N/A | Yes | ||||
|
Integer id designating which kind of record
(e.g. )
the xref/uuid applies to.
| N/A | No | ||||
|
Integer id referencing the primary key of the record the xref/uuid applies to.
Which table this can be found in depends on the
.
| N/A | No |
|
Table-Specific Statistics
- Total Records - 28.5 Million
- Unique Entities - Users, Groups, Assignments, Partners (APIs interacted with), Folders, Skills
Table Relationships
Join Patterns
- references the PK of another record in one of the abovementioned tables, based on viaas described in the above table
- Any other table which has a column name ending int is a foreign key referencing
Common Multi-Table Queries
SELECT * FROM cas_core.core.assignment_definitions ad INNER JOIN cas_core.core.external_references er ON ad.assignee_xid = er.id -- joining the assignment_definition.assignee_xid on its external_references(id) reference INNER JOIN cas_core.users.users usrs ON er.target_id = usrs.id and er.xref_type = 1 WHERE ad.problem_set_id = 604751; -- assignment info for assignments of a specific problem set
Data Quality Indicators
- Completeness - 100%
- Accuracy - No known issues
- Consistency - (Standards and constraints)
- Uniqueness - Some users will have multiple rows. Generally, you should use the one where(TNG partner) as this represents the user's interaction withe core assistments system
Table -
Description - Table with one row for each user in the system (across all roles e.g. teachers + students) Primary Use - (Main purpose/analysis this table supports)
Table-Specific Statistics
- Total Records - 1.5 million
- Unique Entities - Users (students, teachers, researchers, administrators)
Data Schema
| Column/Entity Name | Data Type | Description | Example Values | Total number of unique values | Null handling (how does the data manage and respond to the absence of a value) | Primary Key | Foreign Key | Expected Cardinality (per parent) | Can be missing (Yes/No) | Parent entity | Missing data (percentage) |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Primary Key | 1.5 Million | Never Null | Yes | 0% missing | |||||||
| No | No | No | N/A` | No | |||||||
| No | No | No | N/A` | No | |||||||
Table -
Description - Table describing all teacher users in the system. One row per teacher. Does not include users who are not teachers (e.g. student-only users)
Primary Use - (Main purpose/analysis this table supports)
Table-Specific Statistics
- Total Records - 212K
- Unique Entities - Teacher users
Data Schema
| Column/Entity Name | Data Type | Description | Example Values | Total number of unique values | Null handling (how does the data manage and respond to the absence of a value) | Primary Key | Foreign Key | Expected Cardinality (per parent) | Can be missing (Yes/No) | Parent entity | Missing data (percentage) |
|---|---|---|---|---|---|---|---|---|---|---|---|
Query Example
Single Table 1 Queries
-- Find all teacher users SELECT teacher_id, school_name, curriculums_taught FROM cas_core.master_user.t_setting;
Multi-Table Queries
-- Find all teacher users who teach Illustrative Mathematics and join settings with user row SELECT * FROM cas_core.master_user.t_setting ts INNER JOIN cas_core.users.users on usr ts.teacher_id = usr.id WHERE ts.curriculums_taught like '%IM K-12 Math by Illustrative Mathematics%';
Table -
Description - Each row represents a group of either users (e.g. a course) or of other groups
Primary Use - Used in conjunction with
Table-Specific Statistics
- Total Records - 165K
- Unique Entities - Groups of users and other groups
Data Schema
| Column/Entity Name | Data Type | Description | Example Values | Total number of unique values | Null handling (how does the data manage and respond to the absence of a value) | Primary Key | Foreign Key | Expected Cardinality (per parent) | Can be missing (Yes/No) | Parent entity | Missing data (percentage) |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Primary Key | , | 165K | Never Null | Yes | 0% missing | ||||||
| Name of the group | Red 1 | 165K | |||||||||
| Foreign Key to describing which kind of group this is. | No | No | No | No | 0% missing | ||||||
| No | No | No | No | ||||||||
| Whether group is currently active/enabled | 2 | ||||||||||
| When group was created | 2 |
Group Type
| id/val | xref_type | name of group_type |
|---|---|---|
| 1 | 2 | Principals |
| 2 | 8 | Plain Old Problem Sets |
| 3 | 11 | Assignments Folder |
| 4 | 11 | Content Folder |
| 5 | 9 | Meta Problem Set |
| 6 | 10 | Problem Sets Container |
| 7 | 2 | ASSISTments Course |
| 8 | 2 | Assignee Group |
| 9 | 2 | Access Group |
| 10 | 2 | Mentee Group |
Note: Courses are designated by
Table -
Description - Each row represents a one user or group being a member of a specified group
Primary Use - Used in conjunction with
Table-Specific Statistics
- Total Records - 2 Million
Data Schema
| Column/Entity Name | Data Type | Description | Example Values | Total number of unique values | Null handling (how does the data manage and respond to the absence of a value) | Primary Key | Foreign Key | Expected Cardinality (per parent) | Can be missing (Yes/No) | Parent entity | Missing data (percentage) |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Primary Key | 165K | Never Null | Yes | 0% missing | |||||||
| Foreign key to | |||||||||||
| Foreign Key to describing which kind of record the member of the group is e.g. a User or another Goup. Users belonging to groups are designated by , which is what you will most commonly need | No | No | No | No | 0% missing | ||||||
| Foreign key to the member record, found in a table determined by the member_type ie for users ( ), this will reference ) | No | No | No | References different tables depending on the group type | No | ||||||
| Whether this member is actively in the group or was soft-removed | 2 | ||||||||||
| When group was created | 2 |
Member Type
| id/val | xref_type | name of group_type |
|---|---|---|
| 1 | 1 | User |
| 4 | 2 | Principals Group |
| 2 | 5 | Problem |
| 3 | 3 | Assignment |
| 5 | 6 | Problem Set |
| 6 | 11 | Folder |
| 7 | 11 | Content Folder |
| 8 | 9 | Meta Problem Set |
| 9 | 10 | Problem Set Container |
Note: User members are designated by
Common Multi-Table Queries
-- Find all students who were given assignment 12345 SELECT * FROM cas_core.core.assignment_definitions ad INNER JOIN cas_core.core.external_references group_er ON ad.group_context_xid = group_er.id -- find the external_reference of the group assigned INNER JOIN cas_core.groups.principal_group_memberships pgm -- join with the members of that group ON pgm.member_id = group_er.target_id and pgm.member_id = 1 INNER JOIN cas_core.users.users usrs -- join with users to see their usernames etc ON usrs.id = pgm.member_id WHERE ad.id = 604751; -- for some specified assignment