Vignettes on Algorithmic Fairness and Generative AI Evaluation

This event is being organized by the NCME Artificial Intelligence in Measurement and Education (AIME) SIGIMIE.

The first part of this talk will focus on algorithmic fairness, which has conventionally adopted a perspective of racial color-blindness (i.e., difference aware treatment). We contend that in a range of important settings (e.g., the legal setting where the U.S. compulsory draft applies to men but not women), group difference awareness matters. We show results across ten models that show difference awareness is a distinct dimension of fairness where existing bias mitigation strategies may backfire.

The second part of this talk will discuss new results on AI evaluation. We employ a model-based evaluation framework using Item Response Theory (IRT), which decouples model performance from the test subset selection, for reliable and efficient Generative AI evaluation. We propose two innovations: amortized calibration to reduce the cost of estimating item parameters of the IRT model and an item generator based on a large language model to automate diverse question generation. Our experiments on 25 common natural language processing benchmarks and 184 language models show that this approach is more reliable and resource-efficient compared to traditional evaluation methods, offering a scalable solution to evaluate generative AI models.

Presenters:

  • Sanmi Koyeyo, Stanford University
  • Sang Troung, Stanford University
  • Angelina Wang, Stanford University

When:  Feb 26, 2025 from 04:00 PM to 05:00 PM (ET)

Location

Online Instructions:
Url: http://us02web.zoom.us/j/88510261077?pwd=258jRNzrgAlsr09cGjAVBPBOfTnM68.1
Login: Meeting ID: 885 1026 1077 / Passcode: 164209