The Functions of Summative Assessment
Wilson, M. (2016, September). NCME Newsletter, 24(3), 1-3.
In the previous president’s message, I made a case for the importance of classroom assessment as a site for educational measurement research and development and as a principal context in which educational measurement can be a positive influence for educational success. (See the June NCME Newsletter [vol. 23, no. 2] if you missed that first installment.) In this message, I will turn my attention to summative assessment.
Few members of NCME will need to have the importance of summative assessment pointed out to them, so I will not devote space to that here. Instead, I will start with a discussion of the multiple aspects of how summative assessments are deployed in the broad educational domain. In my view, these aspects split into two functions, the information uses of summative assessments, and the signification uses.
The information uses of summative assessment are the ones that are the main focus of research, development, and application in educational measurement in general, and I would say, for the majority of NCME members. By information uses I mean the many ways that the actual information from the measurements are used in the educational system. For example, suppose that the instrument is a state test, designed to assess students in a specific domain, such as writing. Then the direct information uses of this test would be to provide estimates of student location on a variable of student performance in the domain of writing. The results may be used in a variety of ways: They might be aggregated across multiple levels and groupings of students, and they may also be combined with the results of other tests in various ways. Individual results might be used summatively by a teacher for classroom use or for sharing with parents. They might also be used summatively in aggregation and/or combination to make educational decisions by appropriate people, such as teachers, parents, administrators, those beyond the classroom (including those at the building level, up to the state levels), and educational policy makers of many kinds.
The signification uses of summative assessments are seldom referred to in research papers in educational measurement, though they are commonly understood in educational policy circles, and, in my view actually carry greater weight in the education system. These signification uses include the signaling to teachers and others of what standards they should be teaching (i.e., because those are the standards represented by the items in the summative tests) and of the relative weighting of those standards (i.e., through the relative numbers of items (or scores) representing different standards). A second use is to give teachers concrete examples of what the standards mean through their embodiment in specific items. (A similar distinction was made in Wilson , but the term signification is new.)
These two sets of usages are, in my view, legitimately the aim of many policy-makers in using summative tests. It must be recognized that they are also somewhat limited—for instance, that the summative information provided about individual students is relatively coarse, so coarse in fact that most teachers would already know as much about their students within a month or less of starting classes.
However, these positive usages need to be seen as being balanced by complementary negative effects. In the case of information uses, for instance, there may be attempts to gain from them diagnostic information for use by teachers. This is fraught with risk however, as there is strong temptation to try and use subscores from summative assessments at too fine a grain size compared to the actual information content of the items (or, put another way, without due recognition of the uncertainty of the results from using the subscores). Equally, the (quite appropriate) qualification of summative results using such technical concepts as standard error and reliability, may give policy-makers undue confidence that the results of the summative assessments are “the right stuff”—this is sometimes referred to as the white-coat bias (i.e., because technical experts wear white coats in their labs).
And why would it not be the right stuff, you might ask? Well, this is where the other side of signification comes through. One negative signification effect is that the summative assessments may narrow the range of the curriculum that teachers aim for by leaving out standards that are hard to test and hence not on the test—a related negative effect is through imbalances between the predominance of items relating to specific standards, and the weight of those standards in the overall set of standards. A second negative signification effect is that summative assessments may narrow the ways that teachers think about what a standard means (i.e., the items may represent only the easily testable parts of specific standards).
These negative effects of summative tests are compounded in a school managerial setting where the success or otherwise of teachers and/or schools is predicated principally on external summative test results, and which can have a dire effect on teacher and school morale. My colleague, Paul Black, from King’s College, London has devised a visual means of expressing this. He first noted the commonly used CIA–Triangle to symbolize the relationship between curriculum, instruction, and assessment, as in Figure 1.
And he then contrasted it with the situation in Figure 2, which he labeled as a “vicious triangle” (Black, Wilson, & Yao, 2011), where the teachers’ instructional practices are squeezed between the legitimate aims of the curriculum, and the focusing and narrowing effects of external tests subject to the negative effects outlined above.
These negative effects of the signification uses of summative assessments are ones that can be hard to discern from the narrow technical point of view. Nevertheless, as I mention above, at least in my own view, the signification uses tend to be the most important in bringing about changes in the educational system (both positive and negative) and hence need to be attended to very carefully by us, as professionals, scholars, and as players in the policy realm.
I will continue this story, connecting back to classroom assessment, in my next newsletter message.