2008 Annual Meeting
National Council on Measurement in Education
2008 Annual Meeting & Training Sessions, March 23-27, 2008,
New York, New York
Preconference Training Sessions
- The 2008 NCME pre-conference training sessions will be held at the Crowne Plaza Times Square in New York City on Sunday, March 23, 2008, and Monday, March 24, 2008.
- Advance registration for the training sessions is strongly encouraged. The only way to register in advance for the training sessions is through the main meeting registration at http://www.expologic.com/registration/clients/aera/08/
- Registration on-site will be available only for those training sessions that have not been filled through advance registration.
- Refunds of registration fees for the training sessions cannot be made after February 15, 2008.
- Please note that internet connectivity will be unavailable in the hotel meeting rooms. When applicable, participants should download the software required prior to the training sessions.
Sunday, March 23, 2008
Developing Noncognitive Assessments
Presenters: Patrick Kyllonen, Educational Testing Service;
Richard Roberts, Educational Testing Service
Fee: $80
Time: 8:00 a.m. - 5:00 p.m.
Noncognitive qualities are increasingly recognized as important determinants and reflections of success in education from K-12 through graduate and professional school. This session will review the process of developing and evaluating noncognitive assessments. The following topics will be covered:
- noncognitive construct frameworks, models, and theories (personality, attitudes, values, beliefs, and other constructs)
- developing assessments from construct definitions and item pools, including the international personality item pool (IPIP)
- various methods for assessing non-cognitive qualities (self-assessments, others' ratings, situational judgment tests, conditional reasoning, implicit association tests)
- item writing dos and don'ts
- the problem of faking on self-assessments (preventing, detecting and correcting for it)
- delivery platforms (web and paper-and-pencil)
- exploratory factor analysis and other data-structure exploration methods
- confirmatory factor analysis
- advanced methods (IRT, latent class models, unfolding models)
- special topics (rating-scale issues [optimal number of points, presence of neutral point, "do not know"] and reverse-key items)
- indirect measures (e.g., from school records)
- example noncognitive assessments (self-help for community college, institutional reporting for K-12, high stakes for graduate school)
Each of these topics will be organized as a 30-minute (approximately) session with empirical examples provided, Q&A, and some hands-on exercises where appropriate.
Student Involvement and Formative Feedback in Classroom Assessment: Measurement Concepts and Issues
Presenters: Jeffrey Beaudry, University of Southern Maine; Leslie Lukin, Lincoln Public Schools; Lori Nebelsick-Gullet, Lincoln Public Schools
Fee: $80
Time: 8:00 a.m. - 5:00 p.m.
The purpose of this session is to examine current theory and best practice regarding classroom assessment and grading, how to use this knowledge to promote student learning, and understanding how students benefit from direct involvement in assessment and grading. A key element of this discussion will focus on the development and use of formative assessment and feedback as an important part of the learning process. Learning activities will center on issues of assessment quality and utility. Through the discussion of in-depth case studies of practitioners, participants will explore the following topics:
- development of a shared language for classroom assessment literacy
- development of an understanding of the similarities and differences between assessments that are used for system accountability versus assessments used in classrooms to support the learning process
- development and implementation of interpretable and useable formative feedback
- development of a fair and equitable learning environment
- how to create an environment at the systems level that supports the implementation of best practice in the areas of assessment and grading in classrooms
- use of data for student learning, teacher planning, and system improvement
Item Response Theory: Parameter Estimation Techniques
Presenter: Seock-Ho Kim, University of Georgia
Fee: $135
Time: 8:00 a.m. - 5:00 p.m.
Unidimensional models, statistical methods, and computer applications of item response theory to educational and psychological test data will be presented with a specific emphasis on the item and ability parameter estimation techniques. Theory and methods for the educational and psychological measurement of latent variables using item response theory methodology will be discussed. The one-parameter logistic or Rasch, the two-parameter logistic, and the Birnbaum's three-parameter models for dichotomously scored item response data will be reviewed from a theoretical viewpoint with an emphasis on the various estimation techniques of the model parameters.
Applications of these models to practical measurement situations will be studied using item response theory computer programs. Topics of the course will consist of item calibration, scoring, information, and some applications to instrument construction (e.g., equating, differential item functioning, test construction). Models for polytomously scored items will be briefly discussed.
- Prerequisites include knowledge equivalent to one graduate course in theoretical educational measurement and familiarity with differential and integral calculus treated in undergraduate mathematics courses.
- Participants are encouraged to bring their own laptop computers.
- Participants will be provided with the book Item Response Theory: Parameter Estimation Techniques (Baker & Kim, Eds., 2004), which will be used as a principle reference in the training session.
Linking and Aligning Scores and Scales
Presenters: Neil Dorans, Educational Testing Service; Jinghua Liu, Educational Testing Service; Mary Pommerich, Defense Manpower Data Center; Michael Walker, Educational Testing Service
Fee: $110
Time: 8:00 a.m. - 12:00 noon
The communication of linking issues to test score users is a critical component to ensuring the validity of a linkage. This session will seek to facilitate communication about the appropriate use and interpretation of linked scores by emphasizing the different meanings that can be attached to different linkages and the necessary requirements to achieve solid linkages. It is targeted toward testing professionals who conduct linkages and/or convey the results of linkages to nonpractitioners and test score users with a measurement background. A foundations portion of the session will present a historical perspective on score linking, provide definitions and distinctions between types of linkages, discuss relevant data collection designs, and give an overview of linking methodology and assumptions. A linking scenarios portion will make expanded distinctions between types of linkages and discuss practical issues using real world examples. Topics of discussion will be equating, tests in transition, concordance, vertical scaling, and linking group assessments to individual assessments. A tools portion will discuss indices that can be used to choose an appropriate linkage type and methods that can be used to evaluate linkage quality. A score interpretation portion will focus on the appropriate usage and interpretation of linked scores, comparing and contrasting across the different linking scenarios.
- Participants will be provided with the book Linking and Aligning Scores and Scales (Dorans, Pommerich, & Holland, Eds., 2007), plus a copy of the instructional slides.
Test Security: Practices, Policies, and Punishment
Presenters: James Impara, Caveon Test Security; Ardeshir Geranpayeh, University of Cambridge ESOL Examinations; Jamie R. Mulkey, Caveon Test Security
Fee: $45
Time: 1:00 p.m. - 5:00 p.m.
Test security is a growing concern for learning institutions, credentialing organizations, and businesses. Each week, news stories with incidents of cheating, student coaching, teacher intervention, and even outright test theft are exposed. While there is an increase in these activities, new tools and methods are being developed to detect testing irregularities that are most likely caused by test fraud and theft.
This session will take a case study approach to solving test security issues. Participants will first gain an understanding of the impact of test theft on test takers and constituents. They will then be given a primer on statistical analysis techniques used to detect testing irregularities, including a review of current statistical tools that detect answer copying and test administration irregularities. Using the results of statistical analysis techniques, participants will then use a case study to make decisions about applied policies and sanctions.
Nonlinear Mixed Models Approach to Item Response Theory
Presenters: Paul De Boeck, K.U. Leuven; Frank Rijmen, Educational Testing Service; Francis Tuerlinckx, K.U. Leuven; Mark Wilson, University of California-Berkeley
Fee: $65
Time: 8:00 a.m. - 12:00 noon
The central message of the session is that it is beneficial to see item response theory (IRT) models as extensions of generalized linear regression models that seek to model facets of the measurement situation. These facets are most typically persons and items, but the set may be extended to incorporate other facets such as raters and may also be re-labeled to suit particular applications. While the link function and the random component of the regression model remain the same, the most interesting part of the extension concerns the structural part of the model: (1) the kind of predictive function (linear or nonlinear, e.g. bilinear), and (2) the effects (weights) of the predictors (fixed effects or random effects).
Starting from some well-known IRT models, other less well-known models will be framed in this approach, based on a volume published by Springer: Explanatory Item Response Models: A Generalized Linear and Nonlinear Approach (De Boeck & Wilson, Eds., 2004). This session will illustrate how the models can be estimated with the SAS procedure NLMIXED. This session will also discuss and illustrate how multilevel modeling and structural equation modeling (SEM) for categorical data can be expressed from the perspective of nonlinear mixed modeling and vice versa. This will be illustrated with various software related approaches for multilevel analysis and SEM.
- Participants are recommended to buy the book, Explanatory Item Response
Models: A Generalized Linear and Nonlinear Approach (De Boeck & Wilson, Eds., 2004). There will be a discount available for those who buy the book at the Springer Booth at the AERA Conference after the training session.
An Introduction to the Application of BMIRT: Bayesian Multivariate Item Response Theory Software
Presenters: Lihua Yao, CTB/McGraw-Hill; Daniel M. Lewis, CTB/McGraw-Hill
Fee: $65
Time: 1:00 p.m. - 5:00 p.m.
This session is intended to support new users of BMIRT (Yao, 2003, Yao, 2004; Yao, & Boughton, 2005; Yao, & Schwarz, 2005), a computer program that uses the Markov Chain Monte Carlo (MCMC) method to estimate item and ability parameters in the multidimensional IRT framework; exploratory and confirmatory approaches are supported.
BMIRT has been licensed for research purposes since 2006 and has a growing audience of users. This session is intended for researchers interested in working with dichotomous or polytomous data that is multidimensional in nature and that may be generated from single or multiple groups. BMIRT supports the 3PL, 2PPC, Graded-response, and Testlet models.
- Participants should bring laptop computers and any data they would like to use.
- Data requirements and formats, sample data, and input files will be provided to participants prior to the session. Participants will be required to complete licensing agreements prior to the session. One day licenses will be available for those who have not completed the full license agreement prior to the session.
Monday, March 24, 2008
Test Equating Methods and Practices
Presenters: Michael J. Kolen, University of Iowa; Robert L. Brennan, University of Iowa
Fee: $135
Time: 8:00 a.m. - 5:00 p.m.
The need for equating arises whenever a testing program uses multiple forms of a test that are built to the same content and statistical specifications. Equating is used to adjust scores on test forms so that scores can be used interchangeably. The goals of the session are for attendees to be able to understand the principles of equating, to conduct equating, and to interpret the results of equating in reasonable ways. Equating will be contrasted with related linking processes, traditional and IRT equating methodology will be described, and practical issues will be discussed.
The focus is on developing a conceptual understanding of equating through numerical examples and discussion of practical issues. Recent developments in equating and linking performance assessments and computer-based tests will be considered. The session is designed for upper level graduate students, new PhD's, testing professionals with operational or oversight responsibility for equating, and others with interest in learning about equating methods and practices.
- Participants should have at least one graduate course in measurement and two graduate courses in statistics.
- Participants will be provided with the second edition of Test Equating, Scaling, and Linking: Methods and Practices (Kolen & Brennan, 2004).
Applying Hierarchical Models to Causal Inference
Presenters: Guanglei Hong, OISE/University of Toronto; Stephen Raudenbush, University of Chicago
Fee: $80
Time: 8:00 a.m. - 5:00 p.m.
The purpose of this session is to introduce recent developments of causal inference concepts and methods for evaluating
educational policy and program effects in multi-level settings when randomized experiments are infeasible. Hierarchical linear and nonlinear models in combination with propensity score-based methods for causal effect estimation will be presented. Education examples will be used throughout lecture, discussion and hands-on practice. The session is intended for researchers interested in investigating the effectiveness of educational policies, intervention programs, and various educational practices.
- Participants are expected to bring a laptop computer with SPSS installed. Participants should also download and install the free 15-day trial edition of the HLM 6 software available at http://www.ssicentral.com/hlm/downloads.html prior to attending the session.
Considerations in Setting Performance Standards
Presenters: Mary Pitoniak, Educational Testing Service; Michael Zieky, Educational Testing Service
Fee: $80
Time: 8:00 a.m. - 5:00 p.m.
This session intends to answer questions regarding how to choose a standard setting method, which methods are currently being used, and how to know if the cut scores set for an assessment yield valid interpretations within the context of a particular testing program. The fundamentals of standard setting will be presented, including required steps for all methods. Information on vertically moderated standards and adjusting committee-recommended cut scores will also be discussed. Methodologies currently being used by the states in setting performance standards will be reviewed.
Hands-on practice time will be given to allow participants to thoroughly understand the cognitive tasks involved in making the judgments for two of the most commonly used methods, Bookmark (Lewis, Mitzel, & Green, 1996) and modified Angoff (Angoff, 1971). This exercise will also prepare participants to plan and run Bookmark and modified Angoff standard setting workshops.
Finally, significant time will be devoted to studying the validity of standard setting procedures and the resulting cut scores. Using Kane's (1994, 2001) framework, the session will explore three sources of evidence: procedural, internal, and external. This session is intended for anyone who needs to understand how to run a standard setting session and the complexities involved.
- Participants will be provided with a booklet containing a series of articles relevant to the field, as well as some sample standard setting materials.
Bayesian Networks in Educational Assessment
Presenters: Russell G. Almond, Educational Testing Service; Robert J. Mislevy, University of Maryland; David M. Williamson, Educational Testing Service; Duanli Yan, Educational Testing Service
Fee: $80
Time: 8:00 a.m. - 5:00 p.m.
The Bayesian paradigm provides a convenient mathematical system for reasoning about evidence. Bayesian networks provide a graphical language for describing complex systems and reasoning about evidence in complex models. This allows assessment designers to build scoring that has fidelity to cognitive theories about the domain and yet is mathematically tractable and can be refined with observational data. Topics covered in this session will include evidence-centered assessment design, basic Bayesian network representations and computations, available software for manipulating Bayesian networks, refining Bayesian networks using data, and example systems using Bayesian networks.
- It is recommended that participants bring a laptop to run sample exercises using the student version of Netica (http://www.norsys.com/).
Writing Diagnostic Items
Presenters: Dylan Wiliam, Institute of Education, University of London; Caroline Wylie, Educational Testing Service
Fee: $65
Time: 8:00 a.m. - 12:00 noon
Increasingly, test developers are being asked to generate items and tests that not only identify what a student can and can't do, but why and what to do about it. A number of approaches to this challenge have been explored, including the use of sub-scales and standard-by-standard reporting. At Educational Testing Service, a team has been investigating the construction of diagnostic items which can either be used singly by the teachers as part of normal classroom practice or assembled into testlets to support summative inferences. The crucial characteristic of such items (Wylie & Wiliam, NCME 2007) is that they enable teachers to distinguish between students who are operating with a correct or an incorrect cognitive rule (Bart et al., 1994). Specifically, by using the items, teachers are able to identify students who can get the correct answer using incorrect reasoning. This session will present the item-writing process, illustrate the steps with a series of examples, and show the iterative approach to refining items. Participants will have an opportunity to write, critique, and review items.
Skills Diagnosis with Latent Variable Models
Presenters: Jeffrey Douglas, University of Illinois, Urbana-Champaign;
Hua-Hua Chang, University of Illinois, Urbana-Champaign; Jimmy de la Torre, Rutgers University; Robert Henson, University of North Carolina-Greensboro; Jonathan Templin, University of Georgia
Fee: $75
Time: 8:00 a.m. - 12:00 noon
The primary aim of skills diagnosis is to develop and analyze tests in ways that reveal information with more diagnostic value when compared with traditional approaches. In the methods for skills diagnosis, mastery of particular skills or states of knowledge can be represented by a list of binary latent variables indicating mastery of each of a finite set of skills under diagnosis. The main objective of skills diagnosis is to classify examinees according to this list of skills. In this training session, several popular modeling and classification approaches will be discussed. Three conjunctive latent class models known as the DINA, NIDA, and Fusion models will be introduced, and software for fitting these models with Mplus will be demonstrated. Because of the multidimensional nature of these models, estimation benefits greatly if it can adapt to previous responses. To address this, computerized adaptive testing (CAT) is considered. Because Fisher information does not apply to discrete latent variables, alternative and computationally simple item selection rules are introduced. For CAT settings in which both traditional and diagnostic models are being used, CAT algorithms are introduced for ensuring reliable information for these dual objectives. In addition to sequential methods of test construction, indices for use in fixed-length test construction are also given. The training session is meant to provide practical guidelines for implementing skills diagnosis and considers the essential topics of identifying the attributes measured by items as well as test equating.
- Participants will be given access to a website where they can download software that can be used with Mplus for fitting latent variable models for skills diagnosis.
- It is recommended that participants bring a laptop computer with Mplus installed.
ICL and ETIRM: Open Source IRT Estimation Software for Researchers
Presenters: Alan D. Mead, Illinois Institute of Technology; Werner Wothke, American Institutes for Research; Yanwei Zhang, American Institute of Certified Public Accountants
Fee: $65
Time: 1:00 p.m. - 5:00 p.m.
This session will focus on Hanson's (2002) IRT Command Language (ICL) and the Estimation Toolkit for Item Response Models (ETIRM) used by researchers. Participants will learn: (1) how to use ICL to fit dichotomous and polytomous IRT models, as well as advanced ICL features; (2) ICL features for simulation research; and (3) how ETIRM is used. The session will begin with a short "IRT refresher," but the course is designed for participants who already have at least a basic understanding of item response theory.
ICL is a stand-alone computer program for estimating parameters of dichotomous and polytomous IRT models. ICL computes maximum likelihood or Bayes modal estimates of item parameters using the EM algorithm and handles both single and multiple group estimation. The estimation routines are available separately as the ETIRM. Both ICL and ETIRM were released as open source by their author, Bradley Hanson, and may be copied and modified; ETIRM may be incorporated into other software.
- Participants should bring a Windows or Linux laptop (ICL is also available for Apple computers).
Building and Documenting a Valid Assessment System for Students with Disabilities: Psychometric and Practical Considerations for Alternate and Modified Assessments
Presenters: Karen Barton, CTB/McGraw-Hill; Lara Osleson, CTB/McGraw-Hill; Dianne Lefly, Colorado Department of Education
Fee: $65
Time: 1:00 p.m. - 5:00 p.m.
This session is intended for psychometricians, researchers, state Departments of Education personnel, and test development experts who wish to design, build, and document in technical format, reliable, valid, defensible assessments, particularly alternate and modified assessments for students with disabilities. Topics range from assessment policy, design, and development to appropriate statistical design and analysis, special studies, and technical documentation. The session will provide the audience with sound psychometric tools and practices to assure alternate (as well as modified and general) assessments can meet high standards of technical adequacy with practical tips and solutions for documenting evidence in a legally defensible manner.
Participants will be guided through each step in designing and building a valid and defensible alternate assessment, with approaches to collecting appropriate validity evidence linked to the Standards (AERA, NCME, APA) and Critical Elements. Parallels and distinctions will be made between alternate assessments and both modified and general assessments.
Exploring the Validity of State Accountability Systems
Presenters: Brian Gong, Center for Assessment; Marianne Perie, Center for Assessment
Fee: $65
Time: 8:00 a.m. - 12:00 noon
School accountability systems have been instituted as policy mechanisms for improving student achievement since the 1990's. Since the passage of the No Child Left Behind Act of 2001, we as a field have learned many lessons on developing strong accountability systems. However, although standards for educational testing and program and student evaluation have been developed, there are no universally adopted standards for accountability systems. This session will use a validity framework to explore the elements required of a quality accountability system, providing guidance for both developing new systems and evaluating existing systems.
In one part of the session, the focus will be on conceptualizing the validity of accountability systems as contrasted with validity of assessments, drawing on the work of Messick and Kane. This approach provides two lenses for exploring the validity of accountability. Another part of the session will present a framework of guiding questions and key elements that should be addressed in any accountability system. Examples of actual systems and lessons learned will be shared. The challenges inherent in combining the values and goals of state and federal accountability systems will also be discussed.
Tips for Graduate Students: Advice for Finishing School, Obtaining a Job, and Starting a Career
Presenters: Deborah J. Harris, ACT; Julio Sanclemente, CTB/McGraw-Hill; Andrew Ho, University of Iowa
Fee: $15
Time: 1:00 p.m. - 5:00 p.m.
The training session will have three main components of discussion to include
finishing up the PhD, obtaining a job, and beginning a career. Specifics that will be discussed include:
- finding a dissertation topic and how to maximize experiences while still a
student (classes, internships, work experiences, networking, professional associations) - locating where jobs are available
(universities, testing companies, school districts, state departments, professional/ licensing organizations, etc.), how to apply for jobs (including targeting cover letters, references, and resumes) and the interview process - understanding job politics, adjusting to the environment, career path, publishing, professional service, being a mentor/ finding a mentor, balancing work and life, and what to do if you hate your job.

