Member org info

Digital Learning Platform (DLP) Overview

DLP name - (Add DLP Name) ASSISTments

DLP headquarters location - (City, State, Country) MA, United States

DLP description

ASSISTments is a web-based platform that delivers K–12 mathematics practice integrated with formative assessment. Its purpose is to support student learning by providing immediate feedback, hints, and opportunities for revision while helping teachers monitor progress. The platform’s goals are to improve instructional effectiveness, personalize learning, and generate large-scale data that advance research on teaching and learning.

DLP data set description - “big picture” overview of the data available (max 75 words)

ASSISTments provides multimodal, multi-year integrated datasets that combine content, assignment, and student interaction data to capture the full learning process, reflecting how students learn, how well they perform, and the context in which learning occurs in K–12 math practice across several Common Core–aligned curricula. The data include problem content, assignment information, and detailed student interaction logs documenting submissions, actions, and system events across instructional activities.

Platform type - Select yes to all that apply

  • Learning Management System (LMS) - No
  • Student Information System (SIS) - No
  • Online digital curriculum - Yes
  • Online educational activity/practice platform - Yes
  • Graded activity - Yes
  • Game-based learning system - No
  • Math learning platform - Yes
  • Reading platform - No
  • Other subject-specificNo
  • Special education system - No
  • Assessment system - Yes
  • Administrative system - No
    • Classroom management - No
    • Other administrative data - No

Setting Types - Select yes to all that apply

  • Formal classroom - Yes
  • Online/virtual - Yes
  • Hybrid/blended - Yes
  • Self-directed study - No
  • Workplace training - No
  • Other (specify) - No

System demo video (Screen recording(s), website(s), or other video(s) demonstrating how the platform works, separated by commas) - What is ASSISTments, ASSISTments youtube page,

Education Levels - Select yes to all that apply

  • Early childhood (birth to PreK) - (Yes/No)
  • K-12 - Yes
    • K-5 (elementary) - Yes
    • 6-8 (middle) - Yes
    • 9-12 (secondary) - Yes
  • Higher Ed - Yes
    • Undergraduate - Yes
    • Graduate - No
  • Post-Secondary Non-Degree Seeking - No
    • Continuing Education (courses) - No
    • Informal Learning (tasks, lessons, or activities) - No
  • Professional Development - No
    • Certifications for Skills (more extensive courses or classes for broad skills) - No
    • Micro-credentials (relatively brief courses or training for specific skills) - No
  • Workforce (on-the-job training) - No

Primary Subject Areas Covered - Select yes to all that apply

  • STEM - Yes
    • Science - No
    • Technology/Computer Science - No
    • Engineering - No
    • Mathematics/Statistics - Yes
  • Humanities/Arts - No
    • Arts - No
    • Language Arts/Literacy - No
    • Foreign Languages - No
    • Literature - No
    • Other - No
  • Social Sciences - No
    • Psychology - No
    • Sociology - No
    • Economics - No
    • Political Science - No
    • History - No
    • Other - (Yes/NNo
  • Interdisciplinary Studies - No
  • Physical Education/Health - No
  • Career and Technical Education - No
  • Professional Fields - No
    • Engineering - No
    • Medicine/Healthcare - No
    • Law - No
    • Business/Economics - No
  • Other - No

Institutional characteristics

(if K-12 is selected) Education Sites

  • Context
    • Urban - Yes
    • Rural - Yes
    • Suburban - Yes
    • Online - Yes
  • Multi-site - Yes
  • Type of School
    • Private - Yes
    • Public - Yes
    • Charter - Yes
    • Homeschool - No
  • Other
    • Informal education - No
    • Other (TEXT) - No

(If Higher Ed is selected) Higher Education

  • Type
    • Minority-Serving Institution - No
      • Hispanic-Serving Institution - No
      • HBCU (Historically Black College or University) - No
      • Tribally-Controlled College or University - No
      • Other Minority-Serving Institution (e.g., Asian American and Native American Pacific Islander-Serving Institutions, Predominantly Black Institutions, Alaska Native and Native Hawaiian-Serving Institutions) - No
2-year College4-year College/University
PublicNo (e.g., Community College, City Colleges)No
PrivateNo (e.g., Junior College)No
Technical CollegeNo
  • Trade or Vocational - No
  • Mode of instruction
    • In-person (100% of normal instruction occurs in person) - Yes
    • Online - Yes
    • Hybrid - Yes
  • International (non-US) - No
  • Cross-institutional - No

Geographic location (states and countries) of participants when they interact with the DLP

US States - (Yes/No) (Select all that apply)

  • All 50 US States - Yes
  • Alabama - Yes
  • Alaska - Yes
  • Arizona - Yes
  • Arkansas - Yes
  • California - Yes
  • Colorado - Yes
  • Connecticut - Yes
  • Delaware - Yes
  • District of Columbia - Yes
  • Florida - Yes
  • Georgia - Yes
  • Hawaii - Yes
  • Idaho - Yes
  • Illinois - Yes
  • Indiana - Yes
  • Iowa - Yes
  • Kansas - Yes
  • Kentucky - Yes
  • Louisiana - Yes
  • Maine - Yes
  • Maryland - Yes
  • Massachusetts - Yes
  • Michigan - Yes
  • Minnesota - Yes
  • Mississippi - Yes
  • Missouri - Yes
  • Montana - Yes
  • Nebraska - Yes
  • Nevada - Yes
  • New Hampshire - Yes
  • New Jersey - Yes
  • New Mexico - Yes
  • New York - Yes
  • North Carolina - Yes
  • North Dakota - Yes
  • Ohio - Yes
  • Oklahoma - Yes
  • Oregon - Yes
  • Pennsylvania - Yes
  • Rhode Island - Yes
  • South Carolina - Yes
  • South Dakota - Yes
  • Tennessee - Yes
  • Texas - Yes
  • Utah - Yes
  • Vermont - Yes
  • Virginia - Yes
  • Washington - Yes
  • West Virginia - Yes
  • Wisconsin - Yes
  • Wyoming - Yes

US territories - No

  • Puerto Rico - No
  • US Virgin Islands - No
  • Guam - No
  • American Samoa - No
  • Northern Mariana Islands - No

Other Countries: Country names (separate with commas)

Dataset Size and Complexity

  • Number of Participants - (Total unique participants to date)
K12Postsec
Students1,300,000
Instructors40,000
Courses130,000
School districts/Postsecondary institutions8,800

Coverage relative to your platform’s active population -

Reporting window - (YYYY-MM-DD to YYYY-MM-DD)

Students in this reporting window represent

XX%
of student users of our platform

  • Data Collection Period - (2018-10-08 - Current Date)
  • Data Volume - (1.5 TB)
  • Missing Data Percentage - ~10%

Programming Languages Supported

  • Python
    • Using Athena to query iceberg formatted S3

Example research questions/ DLP research agenda

  • (Example: "How does time spent on practice problems correlate with test scores?")
  • (Example: "What learning patterns predict student dropout risk?")
  • (Example: "How do different feedback types affect student engagement?")
  • (Add 3-5 specific research questions)
  • How can we accurately estimate students’ mastery of learning?
  • How does answering open-ended questions correlate with subsequent learning?
  • How do different feedback types affect student engagement and learning?
  • How well does knowledge transfer across skills?
  • How well do LLMs perform in scoring open-ended responses and providing effective feedback?

Access Type

How researchers can obtain and use the edtech platform's data

  • Open access (publicly available) - No
  • Restricted access - Yes
  • Tiered access - No
  • Application required - Yes
  • Partnership required - No
  • Commercial license - No

Usage Rights

what researchers can legally do with the data once they have access

  • Research - Yes
  • Educational use - Yes
  • Commercial use allowed (of findings) - No
  • Restricted sharing - No
  • Publication rights - Yes
  • Sharing restrictions - (Yes/No - describe)
  • Attribution requirements (Specify citation format) - (Yes/No - specify)
  • Unrestricted - No

Quantitative Data Types Available

  • Assessments - Yes
    • Formative Assessments - Yes
    • Summative Assessments - Yes
  • Performance scores - Yes
  • Time-based metrics (e.g., time on task) - Yes
  • Frequency counts (e.g., frequency of logins) - Yes
  • Rating scales - Yes
  • Physiological measures (e.g., eye tracking, heart rate) - No
  • Clickstream data - Yes
  • Sensor data - No

Qualitative Data Types Available

  • Text responses - Yes
  • Interview transcripts - No
  • Observational notes - No
  • Video/Audiovisual recordings - No
  • Audio recordings - No
  • Images/artifacts - Yes
  • Drawings/creative work - Yes

Rubric based on which member will review (Placeholder, will add specifics later)

  • Study proposal
    • Significance (for the platform, for the field)
    • Research design (feasibility, soundness, etc.)
    • Talent (right people to do this? And make sense of it?)
    • Resources (have the necessary resources to do the work?)
  • Code
  • Outputs

Member data info

Dataset Title - ASSISTments data lake

Dataset Description - A semi-real-time updated read-optimized copy of the ASSISTments platform database with math education data spanning content (e.g. math problems), usage (e.g. times content was assigned to students), and assessment performance.

Dataset ID - assistments-data-lake-v1.0.0

Owner/Maintainer Name - Neil Heffernan

Owner/Maintainer Contact - nth@wpi.edu

Owner/Maintainer ORC ID/OSF ID (optional) -

0000-0002-3280-288X

DLP Research Organization Registry (ROR) ID (Optional) - https://ror.org/123ab4567

Last Updated - (YYYY-MM-DD)

Update cadence - (daily)

Data versioning policy - Data is updated daily

Data Format - (Iceberg format in AWS S3)

Accessing Data within Enclave - (Instructions to be developed with the Rice SafeInsights Team)

Demo/Sample Access - Work in progress

Documentation Quality - (Minimal - work in progress)

Cost Estimation for Queries - To be estimated during enclave development

Data Lineage and Dependencies

Source Systems - Data is sourced from the ASSISTments platform postgres database in AWS rds, stored in AWS us-east-1 region

Transformation Process - The dataset is populated via an ETL pipeline using Airflow in AWS to transform and populate the data lake in S3 on a nightly cadence

Dependencies - N/A

Downstream Usage - This data is subsequently transformed into the 'gold layer' of the data lake, which contains bespoke copies of data transformed to optimize for specific purposes, some of which are later streamed back to the application database for in-app consumption, and others are used for reporting or model training.

AI features - (Yes/No)

Example research questions (When possible) -

  • How can we accurately estimate students’ mastery of learning?
  • How does answering open-ended questions correlate with subsequent learning?
  • How do different feedback types affect student engagement and learning?
  • How well does knowledge transfer across skills?
  • How well do LLMs perform in scoring open-ended responses and providing effective feedback?

Technical Specifications

Dataset Size and Complexity -

  • Number of Participants - 1,300,000 total users
  • Number of Variables - 3,500 columns
  • Number of Records - 2 billion rows
  • Data Volume - 1.5TB
  • Contextual Notes - Data is updated daily from live db ETL pipeline

Data Structure

  • Total Tables/Files - 200 tables
  • Total Records - 2 billion rows
  • Dataset Size - 1.5TB
  • Update Frequency - Daily
  • Partitioning - data is organized into cas_content, and cas_core
    • cas_content stores the math content which teachers assign and students complete
    • cas_core stores all the users and interaction data (e.g. times content was assigned and records of student performance)

Temporal Coverage -

  • Date Range - 2018-10-08 to current date
  • Academic Years - 2018-19 to current school year
  • Reporting Periods - (Annual/Semester/Quarter/Monthly/Other (Text))

Geographic Coverage -

  • Geographic Scope - (National/State/District/School level)
  • Number of Regions - (Count of regions covered)
  • Coverage Gaps - N/A

Data Availability

Privacy Level -

  • Fully anonymized (All identifying details have been removed) - No
  • De-identified (Direct identifiers have been removed) - No
  • Pseudonymized (Direct identifiers have been replaced with pseudonyms) - No
  • Identifiable (restricted) - Yes
  • Other - No

Technical Specifications

Dataset Size and Complexity -

  • Number of Participants - 1,300,000 total users
  • Number of Variables - 3,500 columns
  • Number of Records - 2 billion rows
  • Data Volume - 1.5TB
  • Contextual Notes - Data is updated daily from live db ETL pipeline

Data Structure

  • Total Tables/Files - 200 tables
  • Total Records - 2 billion rows
  • Dataset Size - 1.5TB
  • Update Frequency - Daily
  • Partitioning - data is organized into cas_content, and cas_core
    • cas_content stores the math content which teachers assign and students complete
    • cas_core stores all the users and interaction data (e.g. times content was assigned and records of student performance)

Temporal Coverage -

  • Date Range - 2018-10-08 to current date
  • Academic Years - 2018-19 to current school year
  • Reporting Periods - (Annual/Semester/Quarter/Monthly/Other (Text))

Geographic Coverage -

  • Geographic Scope - (National/State/District/School level)
  • Number of Regions - (Count of regions covered)
  • Coverage Gaps - (Any notable geographic exclusions)

Temporal Granularity Options -

  • Real-time/continuous - No
  • Per interaction/click - (Yes/No)
  • Per session/class - (Yes/No)
  • Daily - (Yes/No)
  • Weekly - (Yes/No)
  • Monthly - (Yes/No)
  • Semester/term - (Yes/No)
  • Academic year - (Yes/No)
  • Multi-year - (Yes/No)

Unit of Analysis Levels -

  • Individual learner - (Yes/No)
  • Event level - (Yes/No)
  • Learning group/team - (Yes/No)
  • Classroom/cohort - (Yes/No)
  • Course/program - (Yes/No)
  • Institution - (Yes/No)
  • District/system - (Yes/No)
  • Regional/national - (Yes/No)

Data Availability

Privacy Level -

  • Fully anonymized (All identifying details have been removed) - No
  • De-identified (Direct identifiers have been removed) - No
  • Pseudonymized (Direct identifiers have been replaced with pseudonyms) - No
  • Identifiable (restricted) - Yes
  • Other - (Yes/No) (Specify)

Data Characteristics

Table/File Inventory

ASSISTments data has three major categories:

  • Core Data describes ids, users and courses

    • ID Tables
      • cas_core.core.external_references - stores uuids pointing to other records e.g. users, assignments, used in the API
    • User Tables
      • cas_core.users.users - one row for each user with email address etc
      • cas_core.master_users.t_setting - one row for each teacher user with additional information about their usage
    • Courses
      • cas_core.groups_principal_group_definitions
        • Represents a course as a whole
      • cas_core.groups_principal_group_memberships
        • Designates a (student) user as a member of a course
  • Content represents the individual assessment items (Problems), how they are organized into digital worksheets to be delivered to students (Problem Sets) and supplementary/supporting content used to assist students with completing individual problems (Tutor Strategies)

    • Problem Tables
      • cas_content.core.problems
      • cas_content.core.answer_set_definitions
      • cas_content.core.answer_set_memberships
      • cas_content.core.answer_parts
      • cas_content.core.answer_values
    • Problem Set Tables
      • cas_content.core.problem_set_definitions
      • cas_content.core.problem_set_memberships
  • Assessment data describes instances where teachers assigned Problem Sets to one or more students (Assignments), as well as log data describing students performance on their assignments (Assignment Logs, Problem Logs, and Actions)

    • cas_core.core.assignments
      • Represents an instance where an assessment was delivered to a group of students (e.g a course) or individual student
    • cas_core.student_data.assignment_logs
      • Summarizes one student’s performance on an entire assignment
    • cas_core.student_data.problem_logs
      • Summarizes one student’s performance on a single problem (assessment item) within an assignment
    • cas_core.student_data.actions
      • Describes an individual action (e.g. submitted answer, requested hint) made by one user on one
Table/File NameDescriptionRecord CountPrimary KeyRelated Tables
cas_core.core.external_references1 or more rows per user and assignment, assigning them a UUID. Users can have multiple external references if they interact with different ASSISTments Services, one per partner_id. Generally you will want the external_reference row with partner_id=5 (TNG)7.5 millionid - Serial
  • xref_type_id references core.external_reference_types(id) and indcates which kind of resource the external_references row gives a UUID (xref) to
  • target_id column points to a row in another table, depending on the external_reference_type column (e.g. 1 = users.users, 3 = core.assignments). See cas_core.core.external_reference_types for the mapping
  • Other tables e.g. cas_core.core.assignments will sometimes have a {something}_xid column. If so, the xid is a reference to the id in the external_references table e.g. cas_core.core.assignments references cas_core.core.external_references(id)
cas_core.users.users1 Row for each user, including student users1.5 millionid - SerialWhich users own or are members of which courses is captured in the principal groups tables (groups.principal_groups_definitions and groups.principal_groups_memberships)
cas_core.master_user.t_settingOne record for each teacher user. Points to a cas_coreusers.users row and additionally describes the teacher's usage and demographics200Kid - Serialteacher_id references cas_core.users.users(id)
cas_core.groups.principal_group_definitionsEach row represents a group of either users (e.g. a course) or of other groups165Kid - SerialReferenced by cas_core.groups_principal_group_memberships
cas_core.groups.principal_group_membershipsEach row describes one member (either user or other group) as being a member of a designated group2 millionid - Serial
  • group_defn_id references cas_core.groups.principal_group_memberships(id)
  • member_id references either cas_core.users.users(id) or cas_core.groups.principal_group_definitions(id) depending on whether the member_type = 1 (user) or 2 (principal group definition)
cas_core.core.assignments1 Row for each time a problem set was assigned to a student or group of students/course.2 millionid - Serial
  • problem_set_id references cas_content.core.problem_set_definitions(id), representing which problem set was assigned
  • owner_xid references cas_core.core.external_references(id) to indicate the external reference (uuid) of the user who owns/created the assignment
  • group_context references either cas_core.users.users(id) for assignments to individuals (assignee_principal_type_id = 1) or else references cas_core.groups.principal_group_definitions(id) for assignments to groups (e.g. courses) of students (assignee_principal_type = 2)
cas_core.student_data.assignment_logs1 Row for student's summarized performance on 1 assignment (if 20 students work on 2 assignments, that creates 40 assignment_logs)28.5 Millionid - Serial
  • assignment_xid references cas_core.core.external_references(id) to indicate the external_reference (uuid) of the assignment that the student was working on. Mainly used for start_time and end_time and as a record that more granular data (e.g. problem_logs and actions will reference)
  • user_xid references cas_core.core.external_references(id) to indicate the external reference (uuid) of the student user who worked on the assignment
  • More granular data e.g. problem_logs (one per student per problem) and actions (one per student click/action) have assignment_log_id references assignment_logs(id). This lets you aggregate granular data for a given student within a given assignment
cas_core.student_data.problem_logs1 Row for student's summarized performance on 1 problem within an assignment (if 20 students work on 2 assignments that each contain 10 problems, that creates 400 problem_logs if each student at least starts each problem)215 Millionid - Serial
  • assignment_log_id references cas_core.student_data.assignment_logs(id) to indicate the assignment_log (student-assignment pair) in which this problem was completed
cas_core.student_data.actions1 Row for each action (e.g. answer submitted, hint requested) taken by a student while working on an assignment215 Millionid - Serial
  • assignment_log_id references cas_core.student_data.assignment_logs(id) to indicate the assignment_log (student-assignment pair) in which this problem was completed

Schema Information

Table -
cas_core.core.external_references

Description - Contains uuid's that are mapped to records in other tables across the system and which are reciprocally referenced by other records.

Primary Use - Mainly used to cross reference data as a central record which multiple others will point to. The

xref
column contains the uuid, the
xref_type_id
describes what kind of record the uuid applies to (e.g. 1=user, 3 = assignment), and the
target_id
references the corresponding table's
id
column depending on the target type. Reciprocally any table that has a column suffixed
_xid
references external_references(id).

Data Schema

Column/Entity NameData TypeDescriptionExample ValuesNull handling (how does the data manage and respond to the absence of a value)Primary KeyForeign Key
id
Serial
integer primary key
154324
N/AYes
xref
text
text UUID being applied to the record
d3bf8359-7ad0-47fb-b259-6c9c86957ef8
N/ANo
partner_id
int4
Integer id of which backend API the UUID is associated with for the given record (e.g. a user can have up to one xref per partner). Generally you will want to look for xrefs with
partner_id=5
corresponding to the TNG partner.
5
N/AYes
references cas_core.core.partners(id)
xref_type_id
int4
Integer id designating which kind of record (e.g.
1 = user, 2 = group, 3 = assignment
) the xref/uuid applies to.
1
N/ANo
references cas_core.core.external_reference_types(id)
target_id
int4
Integer id referencing the primary key of the record the xref/uuid applies to. Which table this
target_id
can be found in depends on the
xref_type_id
.
15432
N/ANo
  • references cas_core.users.users(id)
    when
    xref_type_id
    = 1
  • references cas_core.groups.principal_group_definitions(id)
    when
    xref_type_id
    = 2
  • references cas_core.core.assignments(id)
    when
    xref_type_id
    = 3
  • references cas_core.folders.folder_definitions(id)
    when
    xref_type_id
    = 11
  • references cas_core.skills.primitive_skills(id)
    when
    xref_type_id
    = 14

Table-Specific Statistics

  • Total Records - 28.5 Million
  • Unique Entities - Users, Groups, Assignments, Partners (APIs interacted with), Folders, Skills

Table Relationships

Join Patterns
  • external_references.target_id
    references the PK of another record in one of the abovementioned tables, based on via
    xref_type
    as described in the above table
  • Any other table which has a column name ending int
    _xid
    is a foreign key referencing
    external_references(id)
Common Multi-Table Queries

external_references
is typically joined on a corresponding table for a given resource type e.g. joining users with their xref or assignments with their xref, depending on what you're looking at. Similarly, any column which references an
_xid
e.g.
assignment_definition
's
assignee_xix
. In such cases, you'll frequently join the table with the
_xid
with both its corresponding rows in
external_references
and in the other related table

SELECT * FROM cas_core.core.assignment_definitions ad INNER JOIN cas_core.core.external_references er ON ad.assignee_xid = er.id -- joining the assignment_definition.assignee_xid on its external_references(id) reference INNER JOIN cas_core.users.users usrs ON er.target_id = usrs.id and er.xref_type = 1 WHERE ad.problem_set_id = 604751; -- assignment info for assignments of a specific problem set

Data Quality Indicators

  • Completeness - 100%
  • Accuracy - No known issues
  • Consistency - (Standards and constraints)
  • Uniqueness - Some users will have multiple
    external_references
    rows. Generally, you should use the one where
    partern_id = 5
    (TNG partner) as this represents the user's interaction withe core assistments system

Table -
(cas_core.users.users)

Description - Table with one row for each user in the system (across all roles e.g. teachers + students) Primary Use - (Main purpose/analysis this table supports)

Table-Specific Statistics

  • Total Records - 1.5 million
  • Unique Entities - Users (students, teachers, researchers, administrators)

Data Schema

Column/Entity NameData TypeDescriptionExample ValuesTotal number of unique valuesNull handling (how does the data manage and respond to the absence of a value)Primary KeyForeign KeyExpected Cardinality (per parent)Can be missing (Yes/No)Parent entityMissing data (percentage)
id
int
Primary Key
11111, 15345
1.5 MillionNever NullYes
No
0% missing
username
text
User’s Email address
bob@madison.edu
200K
Never Null
No
No
N/A
No
first_name
text
User's first name
Bob
NoNoNoN/A`No
last_name
text
User's last name
Barker
NoNoNoN/A`No
display_name
text
Users full name or self-chosen nickname
Bob Barker
200K
Never Null
No
No
N/A
Yes

Table -
(cas_core.master_user.t_setting)

Description - Table describing all teacher users in the system. One row per teacher. Does not include users who are not teachers (e.g. student-only users)

Primary Use - (Main purpose/analysis this table supports)

Table-Specific Statistics

  • Total Records - 212K
  • Unique Entities - Teacher users

Data Schema

Column/Entity NameData TypeDescriptionExample ValuesTotal number of unique valuesNull handling (how does the data manage and respond to the absence of a value)Primary KeyForeign KeyExpected Cardinality (per parent)Can be missing (Yes/No)Parent entityMissing data (percentage)
teacher_id
int
the PK of this user’s row in users.users
11111, 15345
200K
Never Null
No
References users.users(id)
1-1
No
User
XX
username
text
User’s Email address
bob@madison.edu
200K
Never Null
No
No
N/A
No
school_name
text
Name of user’s school
Emmerson High School
8500
Can be null
No
No
N/A
Yes
display_name
text
Users full name
Bob Barker
200K
Never Null
No
No
N/A
Yes
nces_state
text
Name of user’s state, abbreviated
IL, CO
51
Can be null
No
No
N/A
Yes
90% missing
nces_school_level
text
Level of school
Elementary, Middle, High
9
Can be null
No
No
N/A
Yes
90% missing
curriculums_taught
text
Which curriculums the teacher assigns content from (comma separated but most commonly 1 value)
IM K-12 Math by Illustrative Mathematics
291
Can be null
No
No
N/A
Yes
2% missing

Query Example

Single Table 1 Queries

-- Find all teacher users SELECT teacher_id, school_name, curriculums_taught FROM cas_core.master_user.t_setting;

Multi-Table Queries

-- Find all teacher users who teach Illustrative Mathematics and join settings with user row SELECT * FROM cas_core.master_user.t_setting ts INNER JOIN cas_core.users.users on usr ts.teacher_id = usr.id WHERE ts.curriculums_taught like '%IM K-12 Math by Illustrative Mathematics%';

Table -
(cas_core.groups.principal_group_definitions)

Description - Each row represents a group of either users (e.g. a course) or of other groups

Primary Use - Used in conjunction with

principal_group_memberships
to identify which students are in which courses and other groups to which teachers assigned assessments

Table-Specific Statistics

  • Total Records - 165K
  • Unique Entities - Groups of users and other groups

Data Schema

Column/Entity NameData TypeDescriptionExample ValuesTotal number of unique valuesNull handling (how does the data manage and respond to the absence of a value)Primary KeyForeign KeyExpected Cardinality (per parent)Can be missing (Yes/No)Parent entityMissing data (percentage)
id
int
Primary Key
11111
,
15345
165KNever NullYes
No
0% missing
name
text
Name of the groupRed 1165K
Never Null
No
No
N/A
No
group_type_id
int
Foreign Key to
cas_core.groups.group_types
describing which kind of group this is.
8
NoNoNo
references cas_core.groups.group_types(id)
No0% missing
owner_xid
int
Foreign key to the external_references row designating the user who owns the group
184564
NoNoNo
references cas_core.core.external_references(id)
No
is_active
boolean
Whether group is currently active/enabled
true
2
Never Null
No
No
N/A
Yes
created_at
timestampz
When group was created
2022-01-12 20:45:31.869 -0500
2
Never Null
No
No
N/A
Yes

Group Type

group_type
designates which kind of group this principal_group_definition represents, and is one of the following values:

id/valxref_typename of group_type
12Principals
28Plain Old Problem Sets
311Assignments Folder
411Content Folder
59Meta Problem Set
610Problem Sets Container
72ASSISTments Course
82Assignee Group
92Access Group
102Mentee Group

Note: Courses are designated by

group_type = 7
and assignee groups (on-the-fly groups to assign to one-time) are designated by
group_type = 8

Table -
(cas_core.groups.principal_group_memberships)

Description - Each row represents a one user or group being a member of a specified group

Primary Use - Used in conjunction with

principal_group_definitions
and
assignment_definitions
to identify which students are in which courses and other groups to which teachers assigned assessments

Table-Specific Statistics

  • Total Records - 2 Million

Data Schema

Column/Entity NameData TypeDescriptionExample ValuesTotal number of unique valuesNull handling (how does the data manage and respond to the absence of a value)Primary KeyForeign KeyExpected Cardinality (per parent)Can be missing (Yes/No)Parent entityMissing data (percentage)
id
int
Primary Key
11111
165KNever NullYes
No
0% missing
group_id
int
Foreign key to
cas_core.groups.principal_group_definitions
11111
Never Null
No
No
references cas_core.groups.principal_group_definitions(id)
No
group_member_type_id
int
Foreign Key to
cas_core.groups.group_member_types
describing which kind of record the member of the group is e.g. a User or another Goup. Users belonging to groups are designated by
group_member_type_id = 1
, which is what you will most commonly need
8
NoNoNo
references cas_core.groups.group_member_types(id)
No0% missing
member_id
int
Foreign key to the member record, found in a table determined by the member_type ie for users (
group_member_type_id = 1
), this will reference
cas_core.users.users(id
)
184564
NoNoNoReferences different tables depending on the group typeNo
is_active
boolean
Whether this member is actively in the group or was soft-removed
true
2
Never Null
No
No
N/A
Yes
created_at
timestampz
When group was created
2022-01-12 20:45:31.869 -0500
2
Never Null
No
No
N/A
Yes

Member Type

group_member_type
designates which kind of record is a member of the specified group, and is one of the following values:

id/valxref_typename of group_type
11User
42Principals Group
25Problem
33Assignment
56Problem Set
611Folder
711Content Folder
89Meta Problem Set
910Problem Set Container

Note: User members are designated by

group_member_type_id = 1
which is almost always what you'll want to use

Common Multi-Table Queries

principal_group_memberships
is typically joined with
users
and
external_references
to find the usernames etc of users in groups and with
assignment_definitions
to find the users who were given a particular assignment or set of assignments

-- Find all students who were given assignment 12345 SELECT * FROM cas_core.core.assignment_definitions ad INNER JOIN cas_core.core.external_references group_er ON ad.group_context_xid = group_er.id -- find the external_reference of the group assigned INNER JOIN cas_core.groups.principal_group_memberships pgm -- join with the members of that group ON pgm.member_id = group_er.target_id and pgm.member_id = 1 INNER JOIN cas_core.users.users usrs -- join with users to see their usernames etc ON usrs.id = pgm.member_id WHERE ad.id = 604751; -- for some specified assignment