Narrative The workforce in data science and STEM continues to lack representation by key groups: women, minorities, and those from rural and lower socio-economic backgrounds. This novel, experiential problem-based learning approach uses population-based Big Data to engage diverse middle school students to apply real data to solve real community health problems. The goal of the Data Detectives SEPA program is to promote greater diversity in the future science and STEM workforce through an informal science education curriculum focused on data sciences and application to community health issues.
Abstract Data sciences represent key advances for multiple areas of discovery in science and health. However, despite such vast innovations in data science, as is the case with other STEM fields, key groups are significantly under- represented in the current and projected workforce, particularly female and under-represented minority groups (Hispanic or Black). In addition, individuals from rural communities and lower socio-economic backgrounds are less likely to pursue STEM careers and study data sciences. We hypothesize that providing students with a curriculum focused on using population-level Big Data for community health needs assessment, planning, analysis, evaluation, and application will improve students’ understanding of the importance of science and Big Data beyond the laboratory or classroom. We envision such a program will engage students by making science more applicable. To address the gaps in the literature and the lack of practical tools to teach students how to both use and apply population-based Big Data, we will pursue the following Specific Aims for our new SEPA program, Data Detectives: Using Real Data to Solve Real Community Health Problems: 1) to implement a novel, problem-based, experiential learning curriculum to teach under-represented middle school students science and mathematics content and data science principles with direct application to community-based health issues; 2) to conduct a robust evaluation of the program with measures of student knowledge, attitudes, self- efficacy, and pursuit of future STEM careers; and 3) to prepare for broad dissemination of the curriculum throughout Georgia and the US. This program will provide the foundation for K-12 students to use real data to solve real problems focusing on improving health outcomes for communities. The proposed SEPA program meets three NIGMS priority areas: A) teaching students to use Big Data instills needed computational and quantitative skills; B) the curriculum demonstrates applicability to the real world by using problem-based learning (PBL) to challenge students to solve real community-level heath problems using real population-based data; and C) the program follows a robust mixed methods evaluation plan to measure both quantitative and qualitative outcomes. The Research Education Program plan addresses the three Specific Aims and includes rationale for adaptation of the Problem-Based Learning model; a detailed curriculum aligned with MS NGSS; clear identification of population-based datasets to be used; explicit examples of PBL scenarios; a thorough diversity recruitment plan with access to a large, diverse student applicant pool; and clear input from expert community partners and evaluation experts. The Dissemination Plan will share the curriculum and materials across Georgia and the U.S. The ability to evaluate this curriculum in a cohort of middle school students, to measure its effect on potential for future STEM careers, and then ultimately disseminate it nationally to schools and informal science education programs, has the capacity to impact K-12 educational approaches in new and important ways.