Alignment Frameworks for Complex Assessments: Score Interpretations Matter Watch Here:
Monday, August 17, 2020 1:00 – 2:30 ET Organizer: M. Christina Schneider, NWEA
Presentations:
Evaluating Alignment Between Complex Expectations and Assessments Meant to Measure Them Ellen Forte, edCount, LLC
Examining Alignment of Test Score Interpretations on a Computer Adaptive Assessment M. Christina Schneider, NWEA; Mary Veazey, NWEA
Examining Alignment of Test Score Interpretations Using Multiple Alignment Frameworks and Multiple Measures Karla Egan, EdMetric
Embedded Standard Setting: Standard Setting as a Resolution of the Alignment Hypothesis Daniel Lewis, Creative Measurement Solutions; Robert Cook, ACT
Discussant: Paul Nichols, NWEA
The design and validation of an assessment system, intended for both formative and summative purposes, requires careful development processes especially when such assessments are intended to support interpretations regarding how student learning grows more sophisticated over time. Under a principled approach to test design, the intended test score interpretation is defined, the evidence needed to draw a conclusion about where a student is in their learning based on that interpretation is defined, and items are developed according to those evidence pieces. The assessment of complex constructs such as student learning of NGSS and college and career standards may mean that traditional alignment and validity evidence is no longer optimal evidence that a test is aligned to state standards and its purpose. This session will focus on emerging frameworks for alignment and validity evidence explicitly designed to ensure that the assessment development process and evidence collection is cohesively centered in score interpretation. Experts in achievement level descriptors, alignment, principled assessment design, and standard setting will share emerging methodologies that fuse separate and previously distinct activities of test development, so these activities are embedded together into a cohesive whole in which score interpretations centered in student learning are the central focus.
|
College Admissions: Lessons Learned from Across the Globe Watch Here:
Tuesday, August 18, 2020 12:00 - 1:30 PM ET
Organizer: Maria Elena Olivera
Presentations:
An Overview of Higher Education Admissions Processes Rochelle Michel, ERB LEARN
Access, Equity & Admissions Processes in South African Higher Education Naziema Jappie, University of Cape Town
Perspectives on Admissions Practices: The Case of Chilean Universities Monica Silva, Pontificia Universidad Católica de Chile
Character-Based Admissions Criteria: Validity and Diversity Rob Meijer, University of Groningen
In this session, an international group of experts on higher education admissions practices share their insight on opportunities and challenges related to the processes and criteria used in postsecondary admissions decision-making to promote access, equity, and fairness for candidates from diverse backgrounds. The session brings outside voices into the educational measurement community to engage in meaningful discussions about current and future-looking uses of assessments utilized to inform admissions practices, and discusses opportunities and challenges to developing culturally-responsive assessments that are sensitive to the ways of knowing and learning of diverse populations. The presenters discuss challenges in improving diversity, access, and equity in admissions processes used across the globe. The session uses a panel format to bring together voices from educational and professional communities and invites them to discuss various perspectives regarding access issues and challenges to diversifying the admitted student pool. The panel members will discuss their perceptions of what are the most critical measurement-based issues facing higher education admissions in their own country, why it is important to consider that perspective as part of fairness and access to admissions decision-making practices, and possible strategies to address the issue based on the lessons learned from their own country-level perspective.
|
How to Achieve (or Partially Achieve) Comparability of Scores from Large-Scale Assessments Watch Here:
Wednesday, August 19, 2020 12:00 - 1:30 PM ET
Organizer: Amy Berman, National Academy of Education Chairs: Edward Haertel, Stanford University & James Pellegrino, University of Illinois, Chicago
Presentations:
Comparability of Individual Students’ Scores on the “Same Test” Charles DePascale, Center for Assessment; Brian Gong, Center for Assessment;
Comparability of Aggregated Group Scores on the “Same Test” Scott Marion, Center for Assessment; Leslie Keng, Center for Assessment
Comparability Within a Single Assessment System Mark Wilson, University of California, Berkeley; Richard Wolfe, Ontario Institute for Studies in Education of the University of Toronto
Comparability Across Different Assessment Systems Marianne Perie, Measurement in Practice, LLC
Comparability When Assessing English Learner Students Molly Faulkner-Bond, WestEd, and James Soland, University of Virginia/Northwest Evaluation Association (NWEA)
Comparability When Assessing Individuals with Disabilities Stephen Sireci and Maura O’Riordan, University of Massachusetts, Amherst
Comparability in Multilingual and Multicultural Assessment Contexts Kadriye Ercikan, Educational Testing Service/University of British Columbia, and Han-Hui Por, Educational Testing Service
Interpreting Test Score Comparisons Randy E Bennett, Educational Testing Service
How much and what types of flexibility in assessment content and procedures can be allowed, while still maintaining comparability of scores obtained from large-scale assessments that operate across jurisdictions and student populations? This is the question the National Academy of Education (NAEd) set out to answer in its Study on Comparability of Scores from Large-Scale Assessments. This session presents the major findings from eight papers which explore a host of comparability issues that range from examining: (a) the comparability of individual students’ scores or aggregated scores, to (b) scores obtained within single or multiple assessment systems, to (c) specific issues about scores obtained for English language learners and students with disabilities. In each interpretive context, the authors discuss comparability issues as well as possible approaches to addressing the information needs and policy concerns of various stakeholders including state-level educational assessment and accountability decisionmakers/leaders/coordinators, consortia members, technical advisors, vendors, and the educational measurement community.
|
Development and Empirical Recovery of a Learning Progression that Incorporates Student Voice Thursday, August 20, 2020 12:00 - 1:30 PM ET
Organizer: Edith Aurora Graf, Educational Testing Service
Presentations:
Steps in the Design and Validation of the Assessment: An Overview Edith Aurora Graf, Educational Testing Service; Maisha Moses, Young People's Project; Cheryl Eames, Southern Illinois University Edwardsville; Peter van Rijn, ETS
Eliciting Student Feedback on the Assessment Through Focus Groups Maisha Moses, Young People's Project
Response Analysis using the Finite-to-Finite Strand of the Learning Progression for the Function Concept Cheryl Eames, Southern Illinois University Edwardsville
Psychometric results for two strands of a learning progression for the concept of function Peter van Rijn, ETS; Edith Aurora Graf, Educational Testing Service
Discussant: Frank E. Davis, Frank E. Davis Consulting
In keeping with the conference theme, Making Measurement Matter, we will discuss research work on building and validating a learning progression-based assessment for the concept of function, a keystone in students’ understanding of higher mathematics. This effort also speaks to the goals of “bringing outside voices into the educational measurement community,” and fairness as equal priorities. The research includes students and schools served by research collaborators seeking to improve mathematics education for students characterized as underserved in mathematics. Recently, we conducted a computer-delivered pilot of tasks in which data from 1102 students were collected. The first two speakers will focus on the theory and design behind the assessment. The first will discuss the overall design of the project, and summarize work conducted to date. The second will discuss how student feedback on the tasks was elicited during the focus groups, with the intent of making revisions that would enhance meaningfulness and clarity. The last two speakers will discuss outcomes from the pilot, discussing student responses and what they suggest about the validity of the LP, and psychometric results concerning the empirical recovery of the levels of the LP, an essential step in its validation.
|
Advancing Multidimensional Science Assessment Design for Large-scale and Classroom Use Watch Here:
Friday, August 21, 2020 1:00 - 2:30 PM ET
Organizer: Erin Buchanan, edCount, LLC
Presentations:
Ensuring Rigor and Strengthening Score Meaning in State and Local Assessment Systems Ellen Forte, edCount, LLC
A Principled-Design Approach for Creating Multi-Dimensional Large-Scale Science Assessments Daisy Rutstein, SRI International
A Principled-Design Approach for Creating Multi-Dimensional Classroom Science Assessments Charlene Turner, edCount, LLC
State Implementation of SCILLSS Resources: A User's Perspective Rhonda True, Nebraska Department of Education
Discussant: Elizabeth Summers, edCount, LLC
Presenters will share the goals, progress, and national significance of the Strengthening Claims-based Interpretations and Uses of Large-scale Science Assessment Scores (SCILLSS) project funded through the US Department of Education’s Enhanced Assessment Grants (EAG) program. SCILLSS brings together a consortium of three states, four organizations, and a panel of experts to strengthen the knowledge base among state and local educators for using principled-design approaches to design quality science assessments that generate meaningful and useful scores, and to establish a means for connecting statewide assessment results with classroom assessments and student work samples in a complementary system. Presenters will share how SCILLSS partners are applying current research, theory, and best practice to establish replicable and scalable principled-design tools that state and local educators can use to clarify and strengthen the connection between statewide assessments, local assessments, and classroom instruction, enabling all stakeholders to derive maximum meaning and utility from assessment scores. |
The Changing Landscape of Statewide Assessment: Shifts towards Systems of Assessments Watch Here:
Monday, August 24, 2020 12:00 - 1:30 PM ET
Organizer: Nathan Dadey, Center for Assessment
Presentations:
On the Shift Towards Balanced Assessment Systems: Past, Present and Future Brian Gong, Center for Assessment
Developing a Validity Research Agenda for Louisiana’s Innovative Assessment Demonstration Authority Pilot Nathan Dadey, Center for Assessment; Michelle Boyer, Center for Assessment
On the Opportunities Provided by Through-Year Assessment Models, Including a Solution Configured for Districts in Georgia Abby Javurek, NWEA; Paul Nichols, NWEA
Discussant: Carla Evans, Center for Assessment
The landscape of statewide, large-scale educational assessment is shifting away from “stand-alone” summative assessments and towards integrated sets of assessments designed to support various interpretations and uses. For example, several states have provided interim assessments as a part of their statewide assessment program, either individually or as members of a consortium. This coordinated session will explore how the theory of systems of assessments is being applied in multiple contexts and provide insight into challenges and opportunities inherent in developing and implementing integrated sets of assessments in real world settings. An overview will be provided on developments in theory and practice of balanced systems of assessments (e.g., Pellegrino, Chudowsky & Glaser, 2001), emphasizing implications for current practice. Other presentations focus on ongoing initiatives taking place under the Innovative Assessment Demonstration Authority waivers granted to Louisiana and Georgia. These states aim to replace single statewide summative assessments with multiple assessments that work together to produce a single summative score. This type of assessment model has been referred to as through-course (e.g., Wise, 2011) and might be seen as interim (Dadey & Gong, 2017), but at its core the model is organized around the same principles as balanced systems of assessments.
|
Predictive Standard Setting: Improving the Method, Debating the Madness Watch Here:
Tuesday, August 25, 2020 4:00 - 5:30 PM ET
Organizer: Andrew Ho, Harvard University Chair: Walter (Denny) Way, College Board
Presenters: » Jennifer Beimers, Pearson » Wayne J. Camara, Law School Admission Council » Laurie Laughlin Davis, Curriculum Associates » Laura Hamilton, RAND » Deanna Morgan, College Board » Yi Xe Thng, Singapore Ministry of Education
Test scores measure, and test scores predict. Predictions can anchor statements about current performance in terms of future outcomes—including test scores, grades, and graduation—through a process called "predictive standard setting." Presenters in this symposium will debate how and whether predictions should inform standard setting, whether standards should make predictions, and how predictions should count as validity evidence. Contexts include the SAT and ACT college readiness benchmarks, state accountability tests in grades 3-8, interim assessments, and NAEP. These issues are salient as educational policies, policymakers, and practitioners value these predictions in "career and college readiness" frameworks. Presenters will discuss and debate advances in three stages of predictive standard setting: 1) generating accurate predictive statements using statistical methods; 2) managing predictive data in the standard setting process; then 3) communicating results using benchmark and achievement-level descriptors. Some presenters believe strongly that predictive statements build valid consensus among standard setting panelists and help users understand the meaning and relevance of scores. Other presenters believe strongly that predictive statements build false consensus and subjugate the subject-matter relevance of scores in favor of ambiguous future outcomes. Presenters will give short presentations and then engage in moderated discussion with each other and the audience.
|
Social Interaction Time Tuesday, August 25, 2020 5:30 - 6:00 PM ET
Organizer: Andrew Ho, Harvard University
|
Computational Psychometrics as a Validity Framework for Process Data Watch Here:
Wednesday, August 26, 2020 1:00 - 2:30 PM ET
Organizer: Alina von Davier, Duolingo Chair: Ada Woo, TreeCrest Assessment Consulting
Presenters: » Yuchi Huang, ACT » Alina von Davier & Burr Settles, Duolingo » John Whitmer, Chi2 Labs
Discussant: Bruno D. Zumbo, University of British Columbia
In 2015, von Davier coined the term “computational psychometrics” (CP) to describe the fusion of psychometric theories and data-driven algorithms for improving the inferences made from technology-supported learning and assessment systems (LAS). Meanwhile, “computational” [insert discipline] has become a common occurrence. In CP the process data collected from virtual environments should be intentional: we should design & provide ample opportunities for people to display the skills we want to measure. CP uses the expert-developed theory as a map for the measurement efforts using process data. CP is also interested in the knowledge discovery from the (little, big) process data. In this symposium, several examples of applications of computational models for the process data from learning systems and from the assessment of the 21st Century skills are presented. Psychometric theories and data-driven algorithms are fused to make accurate and valid inferences in complex, virtual learning and assessment environments.
|
CATs, BATs, and RATs—The Value of CAT for Educational Assessment
Watch Here:
Thursday, August 27, 2020 12:00 - 1:30 PM ET
Organizer: Laurie Laughlin Davis, Curriculum Associates Chair: Michael Edwards, Arizona State University
Presenters: » Michelle Barrett, Edmentum » Richard Luecht, University of North Carolina, Greensboro » Laurie Laughlin Davis, Curriculum Associates » Michael Edwards, Arizona State University
Discussant: David Thissen, University of North Carolina, Chapel Hill
Computerized Adaptive Testing (CAT) turns 50 years old in 2020 which may be a shock to many in educational assessment who are still struggling to implement CAT in a way that fully realizes its promised advantages in terms of improved efficiency in testing. Licensure and certification assessment have been leveraging CAT successfully for years. While there have been recent several recent examples of CAT implementations in K-12 summative assessment (such as the Smarter-Balanced Assessment Consortium and Virginia’s Standards of Learning assessment), CAT has been relatively slow to catch on in K-12 educational assessment. This is due, in part, to technology limitations and differences between delivering tests to test centers and delivering tests to students in classrooms. However, technology is not the only consideration influencing the effective use of CAT in K-12 assessment. Frequently, constraints are placed on K-12 assessment programs in terms of educational policies, content standards coverage, and comparability that limit the degree to which CAT can deliver assessment efficiently and effectively. This results in assessment programs which are sometimes referred to as “BAT”s (Barely Adaptive Tests) and “RAT”s (Rarely Adaptive Tests). This panel will discuss the challenges associated with CAT in K-12 assessment and forecast its future utility.
|
Modeling Measurement Invariance and Response Biases in International Large-Scale Assessments
Watch Here:
Monday, August 31, 2020 10:00 - 11:30 AM ET
Organizers: Lale Khorramdel, National Board of Medical Examiners, Artur Pokropek, Educational Research Institute (IBE), Warsaw, Poland, & Janine Buchholz, Leibniz Institute for Research and Information in Education (DIPF) Chair: Lale Khorramdel, National Board of Medical Examiners
Presentations:
A comparison of Multigroup-CFA and IRT-based item fit for measurement invariance testing Janine Buchholz, Leibniz Institute for Research and Information in Education (DIPF); Johannes Hartig, DIPF | Leibniz Institute for Research and Information in Education, Frankfurt, Germany
Comparing three-level GLMMs and multiple-group IRT models to detect group DIF Carmen Köhler, Leibniz Institute for Research and Information in Education (DIPF); Lale Khorramdel, National Board of Medical Examiners; Johannes Hartig, DIPF | Leibniz Institute for Research and Information in Education, Frankfurt, Germany
Comparability and Dimensionality of Response Time in PISA Emily Kerzabi, Technical University of Munich; Hyo Jeong Shin, ETS; Seang-Hwane Joo, Educational Testing Service; Frederic Robin, Educational Testing Service; Kentaro Yamamoto, Educational Testing Service
Validation of Extreme Response Style versus Rapid Guessing in Large-Scale Surveys Ulf Kroehne, DIPF | Leibniz Institute for Research and Information in Education, Germany; Lale Khorramdel, National Board of Medical Examiners; Frank Goldhammer, DIPF | Leibniz Institute for Research and Information in Education, Centre for International Student Assessment (ZIB), Germany; Matthias von Davier, National Board of Medical Examiners
Examining the Releation between Measurement Invariance and Response Styles in Cross-Country Surveys Artur Pokropek, Educational Research Institute (IBE), Warsaw, Poland; Lale Khorramdel, National Board of Medical Examiners
Discussant: Leslie Rutkowski, Indiana University, Bloomington
The main goal of international large-scale assessments (ILSAs) – such as PISA, PIAAC, TIMSS, PIRLS – is to provide unbiased and comparable test scores and data which enable valid and meaningful inferences about a variety of educational systems and societies. In contrast to national surveys, ILSAs can provide a frame of reference to extend our understanding of national educational systems and cross-country variability. To enable fair group comparisons (within and across countries) and valid interpretations of statistical results in low-stakes assessments such as ILSAs, two validity aspects need to be accounted for. First, the data need to be tested and corrected for response biases such as response styles (RS) in non-cognitive scales. Second, the comparability of the data and test scores across different countries and languages needs to be established. This is achieved by testing and modelling measurement invariance (MI) assumptions. The proposed coordinated session provides an overview of state of the art and new psychometric approaches to test MI assumptions, handle the problem of response biases, and investigate the relations and interactions between both. The goal is to provide researchers, practitioners and policy makers with comparable and meaningful data for secondary analyses and to enable fair comparisons of groups and countries.
|
Artificial Intelligence in Educational Measurement: Trends, Mindsets, and Practices
Watch Here:
Monday, August 31, 2020 12:00 - 1:30 PM ET
Organizers: Andre Rupp, Independent Consultant, Mindful Measurement & Carol M. Forsyth, Educational Testing Service
Presentations:
AI in STEM Assessment: Trends, Mindsets, and Practices Janice Gobert, Apprendis; Mike Sao Pedro, Apprendis
AI in Education: New Data Sources and Modeling Opportunities Piotr Mitros, ETS; Steven Tang, eMetric
Fairness, Accountability, and Transparency in Machine Learning Collin Lynch, North Carolina State University
Bias and Fairness for Automated Feedback Generation Neil Heffernan, Worcester Polytechnic Institute; Anthony Botelho, Worcester Polytechnic Institute
Building and Picking a Model for Learning and Assessment Michael Yudelson, ACTNext by ACT
Discussants: Alina von Davier, ACTNext Andre A Rupp, Educational Testing Service (ETS) Carol M Forsyth, Educational Testing Service
In this session, various experts from areas connected to artificial intelligence (AI) in assessment will provide thoughtful perspectives on how key issues in educational measurement are conceptually framed, empirically investigated, and critically communicated to key stakeholder groups. After a general overview of the current trends in AI research as it pertains to educational assessment, different presenters will critically discuss how the three core areas of (1) reliability / statistical modeling, (2) validity / construct representation, and (3) equity / fairness are tackled in a changing field of educational assessment. A related goal of the session is to have presenters and participants suggest key lines of work to which current members of NCME can productively contribute to shape best practices in this new world of assessment. In addition, the session will be used as an opportunity to discuss means of cross-community outreach and engagement that can help build further bridges between the current NCME membership and members from neighboring scientific and practitioner communities working with AI technologies in assessment. This session is connected to a newly proposed SIG “Artificial Intelligence in Assessment”.
|
Assessing Indigenous Students: Co-Creating a Culturally Relevant & Sustaining Assessment System
Watch Here:
Monday, August 31, 2020 3:00 - 4:00 PM ET
Organizers: Cristina Anguiano-Carrasco, ACT & Leanne R. Ketterlin-Geller, Southern Methodist University Chair: Cristina Anguiano-Carrasco, ACT
Presenters: » Sherry Saevil, Halton Catholic District School Board » Pohai Kukea Shultz, University of Hawaii » Kerry Englert, Seneca Consulting
The NCME Diversity Issues in Testing Committee is pleased to offer an invited panel session at the NCME 2020 conference in San Francisco focused on assessment issues affecting indigenous students. In the invited panel session that was held at the 2019 NCME conference in Toronto, discussion focused on how to make assessments more equitable for students of color. This session extends that discussion with a focus on the equitable assessment of indigenous students, specifically, by naming and addressing the unique challenges these students face within the context of traditional systems of assessment. How can we help indigenous students succeed in an educational system that has failed them?
|