OpenStax R Research Image
https://github.com/safeinsights/openstax-research-image/r-4.5.1
Version: 4.5.1
OpenStax Dataset Connection Functions
This section explains how to use the R functions in
to access OpenStax educational datasets for your research.
3. Tutor Data
# Get something from Tutor tutor_data <- query_optimized_tutor("SELECT * FROM tutor_table")
Notes & Highlights Data
# Get highlights with basic filtering highlights <- query_notes_and_highlights( "SELECT * FROM highlights WHERE book_id = 'biology-2e' LIMIT 10000" ) # Get summary statistics (recommended for large datasets) summary_stats <- query_notes_and_highlights( "SELECT book_id, COUNT(*) as total_highlights, AVG(LENGTH(content)) as avg_length FROM highlights WHERE created_at > '2023-01-01' GROUP BY book_id" )
2. Event Capture Data
# Get page view events for a specific week page_views <- query_event_capture( event = "page_view", start_date = "2023-01-01", end_date = "2023-01-07", sql_query = "SELECT * FROM events WHERE book_id = 'biology-2e'" ) # Get daily highlight creation counts for a month daily_highlights <- query_event_capture( event = "highlight_created", start_date = "2023-01-01", end_date = "2023-01-31", sql_query = "SELECT DATE(timestamp) as day, COUNT(*) as count FROM events GROUP BY DATE(timestamp) ORDER BY day" )
Example Research Workflow
source("openstax.R") source("safeinsights_common.R") ############################################################################### # Initialize Research Container toa_job_start() ############################################################################### # Insert query code here # OpenStax Tutor Example: dataframe <- query_openstax_tutor("SELECT * FROM student_activity LIMIT 10") ############################################################################### # Insert manipulate data and analysis code here write.csv(dataframe, "results.csv", row.names = FALSE) ############################################################################### # Upload results toa_results_upload("results.csv")
Installed R Packages and Versions
ToDo: Verify versions of packages
Core packages
- httr 1.4.7
- RPostgres 1.4.7
- paws 0.7.0
- sqldf 0.4-11
- duckdb 1.1.1
- DBI 1.2.3
- here 1.0.1
- qualtRics 1.4.0
Visualization and manipulation packages
- ggplot2 3.4.3
- dplyr 1.1.4
- tidyr 1.3.1
- readr 2.1.5
- purrr 1.0.1
- tibble 3.2.1
- forcats 1.0.0
- lubridate 1.9.2
- jsonlite 1.8.8
- XML 3.99-0.12
- stringr 1.5.0
- stringi 1.8.4
Additional visualization packages
- ggvis 0.4.8
- rgl 1.2.8
- patchwork 1.2.0
- janitor 2.2.0
- magrittr 2.0.3
- vcd 1.4-12
- maps 3.4.2
- ggmap 4.0.0
- sp 2.1-4
- maptools 1.1-8
Modeling packages
- tidymodels 1.0.0
- future 1.33.2
- furrr 0.3.1
- lme4 1.1-35.5
- nlme 3.1-164
- caret 6.0-94
- survival 3.5-8
- car 3.1-2
- psych 2.4.6.26
- mice 3.16.0
- miceadds 3.17-44
- glmnet 4.1-8
Reporting packages
- rmarkdown 2.26
- shiny 1.10.0
- xtable 1.8-4
- knitr 1.47
- kableExtra 1.4.0
- excluder 0.5.1
- careless 1.2.2
Time series packages
- zoo 1.8-12
- xts 0.13-1
- quantmod 0.4-20
- fable 0.3.4
- tsibble 1.1.4
- urca 1.3-4
- feasts 0.3.2
High performance and development packages
- Rcpp 1.0.11
- data.table 1.14.8
- testthat 3.2.1.1
- devtools 2.4.5
- roxygen2 7.3.2
- lintr 3.1.2
- languageserver 0.3.16