This event is being organized by the NCME Artificial Intelligence in Measurement and Education (AIME) SIGIMIE.
Automated Scoring Engines (ASEs) apply artificial intelligence to the scoring of constructed response (CR) assessment items. A significant challenge in developing ASEs is the scarcity of training examples for infrequent score categories due to scoring costs. Data augmentation, particularly through generative AI, offers a promising solution to this issue.
In our first study, we explored various data augmentation techniques to assess their impact on ASE performance. We created simulated CR responses using multiple augmentation methods to supplement the training data. Our second, more focused study explored the efficacy of generative AI for data augmentation. We trained ASEs with these augmented datasets and evaluated their performance using score point recall and quadratic weighted kappa to measure agreement between human and ASE scores.
Our findings reveal that ASEs trained with generative AI-augmented data can achieve agreement levels as good as or better than those of human raters. This presentation will discuss the methodologies, results, and implications of our dual-study, highlighting the potential of generative AI to enhance the reliability and accuracy of automated scoring systems.
Presenters:
National Council on Measurement in Education