Assessment is one of those things that you think you know what it is until you start to really think hard about it.
Then the problem of trying to delineate exactly what it is and what it isn’t seems rather more complex than at first sight.
Part of the problem is that assessment covers such a wide range of things: some assessment is really part of pedagogy: simply good teaching; some assessment is about measurement: operationalising a trait or attribute; some is about evaluation, stretching into accountability and inseparable from the consequences that are attached to outcomes.
One common principle that links a lot of these aspects is that assessment should contain information. It seems obvious enough, but a (perhaps surprising) corollary is that an assessment that cannot surprise you is not an assessment.
An assessment should tell you something new: often enough for it to be valuable, but not so often that it undermines your judgement. Assessment should inform judgement, not replace it; but equally, a single judgement is not an assessment.
Hence an impressionistic ‘best fit’ level (whether in the old money of 4b, 7a, etc, or the new of ‘working towards’, ‘working at’, ‘greater depth’) is not really an assessment either, or certainly not a very good one.
A simple rule of thumb for how much information is in an assessment is how many independent bits of information make it up. For example, a twenty-question test contains twenty bits of information. A single, holistic judgement contains one, unless it can be disaggregated into independent component parts.
Another way in which an assessment can fail to contain information is if it is pre-constrained by what is desirable or acceptable. One example is when teachers are asked to fill tracking systems with a termly number or category that indicates each learner’s progress. Such data tend to show improbable over-regularity: a genuine assessment process would show less consistent patterns, purely as a result of chance or ‘measurement error’.
Further examples are found in the distributions of outcomes produced by ‘assessment’ processes such as the Early Years Foundation Stage Profile (with a massive spike at 34, the score obtained by a child reaching the ‘expected’ level on all 17 early learning goals) or the phonics screening test (with a huge discontinuity at the ‘expected standard’, especially when the qualifying score was pre-announced).
Great assessment is based on sound information, and does not show these kinds of anomalies.
There are a number of challenges and some big unanswered questions in assessment research. Assessment is an area where there is actually a lot of very good research, but despite this some big questions remain, especially in relation to practical implications.
Formative assessment appears to have a strong backing from research as enhancing learning, although perhaps not by as much as the routinely cited ‘effect sizes between 0.4 and 0.7’, according to a 2011 meta-analysis by Kingston and Nash : their estimate is 0.2. But it has been criticised (eg by Bennett, 2011 ) as ill-defined, conceptually incoherent and impossible to operationalise and implement.
Even so, most of the things that are traditionally lumped together as ‘Assessment for Learning’ do have good research support. They also readily get support from teachers, who generally find them consistent with their beliefs about what is good practice.
Despite this, and the massive efforts and spending by numerous governments and other agencies to promote it, there is no convincing evidence that Assessment for Learning has yet led to any detectable improvements.
Apparently convincing and plausible arguments support the need for teachers to have a good understanding of assessment, and it is often claimed that the gap is too large between widespread current knowledge and what is desirable.
Some evidence does support the positive impact of training teachers in assessment literacy, though there is debate about what teachers actually need to know and, perhaps ironically, most assessments of assessment literacy do not seem to be very good measures anyway.
One action the teaching profession could take would be to support a joined up research agenda to address questions like these. A research-engaged profession would want to find ways to address these kinds of applied research questions: setting a research agenda that includes such questions and bringing researchers and practitioners (and researcher practitioners) together to conduct studies.
Related to this is the need to create and reinforce opportunities and mechanisms for plausibly effective professional learning. Persuading and enabling teachers to sign up for a programme of training that lasts more than half a day seems to be quite challenging, yet the best research suggests that this is far too little to be worthwhile.
If we want teachers to learn about great assessment, we need an infrastructure for their learning.
In England at the moment, there are good reasons to be optimistic that this may happen. For example, researchED is a grass-roots, teacher-led, social media-connected organisation that has mobilised both teachers and researchers to talk about questions that matter to both.
The Educational Endowment Foundation (EEF) has been funding high-quality, practically relevant research for five years now. As well as funding trials of interventions, EEF has produced reviews of existing evidence, evaluation tools and led the ‘research schools’ initiative. And of course the new Chartered College of Teaching promises to be a forum for engaging teachers in research, scholarship and professional learning.
This blog post is an excerpt from a panel discussion ‘What makes great assessment?’ hosted by Evidence Based Education, in partnership with the Chartered College of Teaching. Watch the discussion.
Watch Rob Coe’s presentation, A Vision for Enhanced Professionalism, from the Charted College of Teaching inaugural conference.