From the President: Perspectives on our Past, Present, and Future

By Megan Welsh posted 04-09-2020 03:35 PM

SSireci.jpg  Perspectives on our Past, Present, and Future
  At the time of this writing (April 6, 2020), I have only two weeks left as President of NCME.  I have greatly   
  enjoyed the honor of serving asyour President, and it is somewhat sad to have my term come to an end.  I am
  thankful my transition to Past-President is concomitant with Dr. Ye Tong’s transition to President.  It has been
  a pleasure serving alongside Ye and my other Board colleagues, as we worked hard to increase the vibrancy and
  effectiveness of the Council we all love.

In this final Presidential Column, I reflect on where I see us as a field of measurement practitioners and scientists, and where I think we should be heading.  This reflection starts with a discussion of NCME’s lack of influence on educational assessment policy in the No Child Left Behind era, and how it has damaged the perception of measurement professionals in the general public.  Next, I discuss NCME’s revised mission, vision, and goals, and recent actions the NCME Board has taken to move us forward on the path to accomplish our goals and improve our public engagement. I end with a list of thanks to the many members of our community who have truly been superheroes for our organization.

A Perspective on Our Recent Past: The NCLB era

The No Child Left Behind Act (NCLB) of 2001[1] marshaled in mandated testing in mathematics and reading for public school students in the USA in grades 3 through 8, and one grade in high school.  One of the explicit purposes of NCLB was,

      Ensuring that high-quality academic assessments, accountability systems, teacher preparation and training, curriculum, and instructional
      materials are aligned with challenging state academic standards, so that students, teachers, parents, and administrators can measure   
      progress against common expectations for student academic achievement. (pp. 115 Stat 1439-1440)

These laudable goals presented the measurement field with significant challenges—and we rose to the occasion to meet many of them.  Within a few years, all states had developed assessments with admirable psychometric features with respect to measurement precision, content representation, and scale score stability.  These assessments were supported by independent alignment studies that confirmed the congruence of the assessments with the state curriculum frameworks.  Test results provided information on achievement gaps and hence put a spotlight on the educational inequities that caused them.  In my opinion, these early 21st century tests met the goal of demonstrating how well students had mastered specific aspects of curricular goals at a specific point in time.  But as illustrated above, that was not the only goal of NCLB.

During the early years of the NCLB era, psychometrics became a big business.  States used testing vendors to meet their assessment needs, and established Technical Advisory Committees (TACs) to help ensure the quality of their assessment programs.  One goal of these TACs was to help the state assessments pass the Federal review and approval process known as “peer review.”  Almost all members of these TACs were members of NCME.  I served, and still serve, on several of these TACs and have been proud of the work we have all done to help states ensure quality and valid assessment. 

However, NCLB involved requirements that went beyond simply reporting test scores.  For example, it mandated a new and specific use of test scores—to measure the “adequate yearly progress” (AYP) of schools and districts.  This mandated test score use was the most consequential aspect of the law.  School districts were taken over by the state for failure to make AYP, principals were fired, and school improvement plans were all developed to improve AYP.  Over the past 17 years, fascinating explanation, discussion, and debate regarding AYP was published in our NCME journals Journal of Educational Measurement (3 articles) and Educational Measurement:  Issues and Practice (35 articles).  However, these discussions were largely absent from conversations involving education policy makers and state boards of education.  While we were holding NCME conference sessions debating whether “consequences” should be part of “validity,” the consequences of AYP were affecting millions of children in the USA. 


I believe it was at this point we began to lose the trust of the education community.  Test scores were being used for new accountability purposes, and there was little validity evidence to their use for the purposes of determining AYP.  Did such use put state departments of education in violation of the American Educational Research Association, American Council on Education, and National Council on Measurement in Education’s (1999, 2014) Standards for Educational and Psychological Testing?  If so, did we as responsible TAC members inform the states of such violations?


My impression of these times is we did issue warnings, but they were not strong enough.  The AYP system had some merit, but was implemented in different ways in different states, with little guidance from our field (Linn, 2005; Porter, Linn, & Trimble, 2005).  Thus, the door was open for the distrust of using educational test scores for the improvement of education.  However, this distrust was just a drop in the bucket compared to the deluge that occurred after the next evolution of NCLB—Race-to-the-Top.


NCLB evolves:  Race-to-the-top 

In 2009, in the absence of a reauthorization of NCLB, the Obama administration released $4.35 billion in funding for the Race-to-the-Top (RTT) grant competition as part of the American Reinvestment and Recovery Act.  The RTT competitive grant program was designed to

      encourage and reward States that are…achieving significant improvement in student outcomes, including making substantial gains in 
      student achievement, closing achievement gaps, improving high school graduation rates, and ensuring student preparation for success in          college and careers…[2]

Like NCLB, these are laudable goals, but RTT also encouraged using test score data to evaluate student “growth” for the purposes of teacher and principal evaluation.  Without any research to support their use, student growth percentiles (SGPs) became the most common index for measuring student growth in the United States.  Subsequent research pointed out SGPs were inherently unreliable, particularly for the purposes of teacher evaluation (e.g., Castellano & McCaffrey, 2017; Lash, Makkonen, Tran, & Huang, 2016; McCaffery, Castellano, & Lockwood, 2015; Sireci & Soto, 2016).  This research had no impact on practice, and to my knowledge, most TACs did not inform states of this research.  While both AERA (2015) and the American Statistical Association (2014) cautioned against the use of test scores in value-added models of teaching effectiveness, NCME remained silent on the use of derivative test scores for accountability purposes.  Silent except of course in our involvement in developing the AERA et al. Standards, which state,

      An index that is constructed by manipulating and combining test scores should be subjected to the same validity, reliability, and fairness   
      investigations that are expected for the test scores that underlie the index (p. 210). 

It is not my purpose in this article to reemphasize the misuse or misunderstanding (Clauser, Keller, & McDermott, 2016) of SGPs, but rather to illustrate how NCLB era tests scores are used for high-stakes purposes such as AYP and teacher evaluation, even though validity evidence for such purposes is lacking.  How can we wonder why we have lost the support of teachers, and now parents and students (Bennett, 2016; Marland, Harrick, & Sireci, 2019), when we violate our own professional standards?  Specifically, we co-author standards that require evidence for the validity of test score use, and then we stand idly by, collecting our TAC honoraria, while teachers lose their jobs based on test score derivatives that lack validity evidence for teacher evaluation.  Clearly, our actions must change if we are to partner with the education community in using educational tests to help students learn. 

A lack of attention to consequences

A final reason I will give today for our loss of public confidence is our lack of attention to the (other) consequences of testing.  The importance of evaluating testing consequences dates back to at least Messick (1989), although he credits Cronbach (1971) and others.  What statistical reason or psychometric justification can we provide to the public when competitive high schools in New York City use an admissions test as the single criterion for admission and then only 10 African-American students are selected for enrollment into a class of over 1,000 students[3]?  Would we break any psychometric laws if we supported Mayor Bill DeBlasio’s call to change the admissions system?  If we did, they would surely be 20th-century psychometric laws that need a 21st-century revision.  The recent lawsuit against college admissions test for the University of California system is a similar example of the public outcry against 20th-century testing systems based primarily on norm-referenced testing goals. 

A serious validation effort requires serious consideration of testing consequences, and gathering evidence that confirms the intended positive consequences, and provides discussion and potential remediation of any negative consequences (Kane, 2013; Messick, 1989; Shepard, 1993, 1996).  What does our research say are the consequences of NCLB era tests?  Are the criticisms from teachers and others true—that tests have caused undue student anxiety, narrowed the curriculum, and led to even narrower curricula for historically marginalized groups?  Only specific empirical study of testing consequences can answer these questions.  If someone can show me such evidence in a technical manual for a statewide summative assessment, I will buy the authors of that manual a drink, for those authors have fulfilled the goals required in 21st-century validation.

From the Past to the Present:  NCME’s Mission, Vision, and Goals

I have perhaps complained too much about my perceived inattention on the part of psychometricians to research, public engagement, and validation in the NCLB era.  Thankfully, my experiences as NCME President allow me to easily transition from pessimism to optimism.  I am fortunate to be elected to the NCME Board at a time when it contains some of the most dedicated and talented colleagues with whom I have ever worked.   When I complained about the NCME Mission statement,  “To advance the science and practice of measurement,” the discussion gravitated to a series of actions where we evaluated the NCME Mission, Vision, and Goal statements.  Board member Andrew Ho pointed out that our goals should flow from our mission, and also set the stage for the consistency of our actions and policies across Presidential administrations.  We have a long history of excellent initiatives and priorities set by our Presidents.  However, the logic of forging consistency over time was compelling, and off we went to survey the membership on our mission, vision, and goals.  After two years of discussion with the Board and interaction with the membership, our mission statement was revised to,

      The National Council on Measurement in Education is a community of measurement scientists and practitioners who work together to
      advance theory and applications of educational measurement to benefit society.

This process also led us to articulate five goals consistent with this mission.  Currently, these goals are to: 

  1. Advance the science and scholarship of educational measurement.
  2. Promote knowledge and understanding of best practices in educational measurement.
  3. Increase NCME's partnerships to improve assessment policy and practice.
  4. Create and maintain a vibrant, diverse, and inclusive community of measurement practitioners and researchers.
  5. Provide members with a strong professional identity and intellectual home.

Through the NCME journals, book series, ITEMS modules, our annual meeting, and this Newsletter, I believe we continue to make great progress on our first goal.  My previous criticisms relate to the second goal, but progress in this area includes our recent position statements on the misconceptions present in the college admissions testing lawsuit and on the use of college admissions test scores in state accountability systems (  Progress on our third goal can be seen by our presence at CCSSO’s National Conference on Student Assessment, by our membership in the International Test Commission, and involvement in the Joint Standards for Educational Evaluation.  However, we need to make more progress on Goal 3 with respect to better partnerships with state and national government entities that set policies for educational tests. The question remains, “Can we do more to inform policy makers of the limitations of educational tests and proper versus improper use of test scores?”  I think we can, and with Derek Briggs, Ellen Forte, and Sharyn Rosenberg joining our Board later this month, I think we will be well poised to do so.

I believe we are successfully meeting Goals 4 and 5.  Our community has been enhanced by seven new Special Interest Groups in Measurement in Education (SIGIMIEs[4]), and we remain committed to providing a rewarding annual meeting every year.  Although it may be difficult for us to convene in 2020 due to the coronavirus pandemic, we are still working hard to see if we can reschedule the annual meeting to September 10-13 in Minneapolis.  Check the NCME web site, and your email (including your spam folder!) for information on those efforts.

From the Present to the Future

I am proud of the accomplishments NCME has made over the past year.  I will be describing those accomplishments during the online NCME 2020 Business Meeting that will occur on April 20, 2020 from 12:30-2:00 east cost time  The accomplishments we have made are due to a long list of incredible volunteers starting with the NCME Board (9 members), the Committee chairs and Members (16 committees), SIGIME leaders, the Classroom Assessment Task Force, and publication editors.  The work of Drew Wiley, Ada Woo, and Thanos Patelis on the 2020 conference program has been truly amazing, as has the work of Kim Colvin and Anita Rawls on the 2020 workshop and training program.  Our editors—Sandip Sinharay, Deb Harris, Andre Rupp, Li Cai, Tao Xin, and of course Megan Welsh!—work tirelessly to provide the highest quality publications.  I, and the entire NCME membership, are indebted to you.  NCME is more than a conference—it is a community of volunteers who work hard to improve the science and practice of measurement to benefit society.  Having a brief leadership role for this community of caring, intelligent, and diligent professionals is the greatest honor I have ever had.  Thank you for that, and for all your support over the past year.


American Statistical Association (2014). ASA statement on using value-added models for educational assessment. Downloaded April 5, 2020 from

American Educational Research Association (2015). AERA statement on use of value-added models (VAM) for the evaluation of educators and educator preparation programs. Educational Researcher. Downloaded April 5, 2020 from

Bennett, R. (2016). Opt out: An examination of issues (ETS Research Report Series). Princeton, NJ: Educational Testing Services. Downloaded April 5, 2020 from

Castellano, K. E., & McCaffrey, D. F. (2017).  The accuracy of aggregate student growth percentiles as indicators of educator performance.  Educational Measurement:  Issues and Practice, 36(1), 14-27.

Clauser, A.L., Keller, L.A., McDermott, K.A. (2016). Principals’ uses and interpretations of student growth percentile data. Journal of School Leadership, 26(1), 6-33.

Cronbach, L. J. (1971).  Test Validation.  In R.L. Thorndike (Ed.) Educational measurement (2nd ed., pp. 443-507).  Washington, D.C.:  American Council on Education.

Kane, M. (2013).  Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50, 1-73.

Lash, A., Makkonen, R., Tran, L., & Huang, M. (2016). Analysis of the stability of teacher-level growth scores from the student growth percentile model (REL 2016–104). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory West. Retrieved from

Linn, R. L. (2005, June). Conflicting demands of No Child Left Behind and state systems: Mixed messages about school performance. Education Policy Analysis Archives, 13(33). Available at

Marland, J., Harrick, M., & Sireci, S. G. (2019).  Student assessment opt out and the impact on value-added measures of teacher quality.  Educational and Psychological Measurement. DOI: 10.1177/0013164419860574.

McCaffery, D.F., Castellano, K. E., & Lockwood, J. R. (2015).  The impact of measurement error of individual and aggregate SGP.  Educational Measurement: Issues and Practice, 34, 15-21.

Messick, S. (1989b).  Validity.  In R. Linn (Ed.),  Educational measurement, (3rd ed., pp. 13-100).  Washington, D.C.:   American Council on Education

Porter, A. C., Linn, R. L., &Trimble, C. S. (2005), The effects of state decisions about NCLB adequate yearly progress targets. Educational Measurement: Issues and Practice, 24: 32-39. doi:10.1111/j.1745-3992.2005.00021.x

Shepard, L. A. (1993).  Evaluating test validity.  Review of Research in Education, 19, 405-450. 

Shepard, L. A. (1996).  The centrality of test use and consequences for test validity.  Educational Measurement: Issues and Practice, 16, 5-24.

Sireci, S. G., & Soto, A. (2016).  Validity and accountability:  Test validation for 21st-century educational assessments.  In H. Braun (Ed.).  Meeting the challenges to measurement in an era of accountability (pp. 149-167). New York:  Routledge.


[2] (p. 2).

[3] De Blasio proposes changes to New York’s elite high schools.  New York Times, June 2, 2018.

[4] To view and join a SIGIMIE visit