Our Experience of Using Comparative Judgement

Since Michael Gove announced the scrapping of national curriculum levels, many schools, both in the UK and internationally, have searched for a meaningful form of assessment to replace them. Levels, as originally envisioned back in the late 1980s, were a solution to the problem of how to measure the attainment and progress of students in schools at the end of each key stage.

Over time the system became so corrupted, due to the prevailing need for accountability, that many primary teachers were encouraged, as a matter of standard practice, to use summative end of key stage levels to assess individual pieces of work.

Michael Gove’s announcement, therefore, presented an opportunity for schools to explore and set up systems of assessment that could more effectively support learning. Unfortunately, many of the new systems of assessment that have appeared since the scrapping of levels have closely resembled that system and therefore replicated the flaws of that system.

Moderating Writing since 2004:

Since training as a teacher in 2004, the moderation of writing has always involved the following steps:

  • Students complete a sample of writing under test conditions.
  • The teacher then takes these samples and assesses them against level descriptors eg. APP, Roz Wilson, internal target descriptors.
  • The school holds a moderation meeting during which teachers’ moderate writing samples from other classes and year groups to check the consistency of the assessments made across the school.

This has always been a time-consuming process for teachers but as Daisy Christodoulou has shown here and as Professor Rob Coe’s research has shown, see below, this type of teacher assessment is unfair and open to significant bias that impacts on some of the most vulnerable students.

Rob Coe Teacher assessment.png

ResearchED 2016: An Introduction to Comparative Judgement

After watching a live feed of Daisy Christodoulou’s presentation during ResearchED 2016, I became interested in the possibility of using comparative judgement as a method of assessment for Key Stage 2 writing:

I shared the film with my colleagues and we discussed the possibility of using comparative judgement. We then contacted nomoremarking.com directly and not only did Dr Wheadon kindly answer any questions we had but he also invited us to take part in a trial this academic year.

Comparative Judgement:

During the process of comparative judgment, the website presents a teacher with two scripts on the screen. All the teacher is required to do is pick the sample that they think is the better piece of writing.  This process is repeated again and again by the individual teacher and by all the teachers involved in the moderation process in a school.

That’s it.

Random samples of writing from other schools are also judged as part of the process and this allows the judgements made within each school to be benchmarked against those made by other schools involved in the trial.

The claim made for comparative judgment is that not only does it speed up the process of writing assessment and moderation but by collating all the judgements, the process is made more reliable than standard teacher assessment and less susceptible to individual bias.

Our Experience of Comparative Judgement:

We completed our comparative judgement session on Wednesday 29th March. Initially, we invited all teachers in Key Stage 1 & 2 to be a part of the meeting but teachers in the Early Years Foundation Stage and some teaching assistants also asked if they could be involved.

Based on our first session of using this method of assessment, here are our observations:

  • It is a far quicker method of assessment than the ‘traditional’ process of writing assessment and moderation. So much so that it was difficult for some staff to accept that the process was rigorous enough or reliable enough as a form of assessment!
  • Comparing two samples of writing was not always as easy as it sounds. When there was a significant difference in quality, the process was easy. When the samples were similar, it could take some time to make a judgement.
  • The advice from no more marking was that each judgment should take around 30 seconds. One of the problems we faced with the samples we were judging was that the students had been provided with a generic sentence opener at the beginning. This meant that the opening paragraph of each sample of writing was fairly similar. This in turn impacted on our ability to make quick judgements. Based on this experience I would recommend that sentence openers are not provided for samples being assessed in this way.
  • We initially expected teachers to complete around 100 judgments each and had set aside around 1 hour & 30 minutes to do this. Some teachers managed this but, due to some of the factors mentioned above, many found it difficult and didn’t manage to reach the target number.
  • Despite this, due to the fact that we had 24 members of staff involved in judging 63 portfolios of writing, as a school we reached a high level of reliability (0.93) completing 1880 judgements in total.
  • For general guidance on the number of judgments required before the process becomes reliable, see this blog post.
  • Due to the speed of some of the judgments made, many staff found it difficult to understand how the process could be reliable and many expressed concern at the fact that they had felt their own personal bias influencing their judgements. This led to a debate about the reliability score that we achieved. Dr Wheadon, who runs the website, was kind enough to respond to our questions in this area with this blog post.
  • Dr Wheadon explained that they have found no correlation between the speed and quality of judging unless the speed dips below 10 seconds.
  • One area of feedback from the staff was that assessing writing in this way did not allow for the process of dialogue and reflection on student writing between teachers that usually occurs during moderation meetings and this was missed. We are planning to build this into the next meeting when we share the results of the session.

All in all, we felt that comparative judgement represents an exciting development as far as writing assessment is concerned. In the next few weeks, we expect to receive a breakdown of what the session revealed about our Year 6 writers. We are looking forward to finding out what data was produced, exploring the results, and discussing & reflecting on them as a staff. When you consider the fact that we assessed 63 portfolios of writing in one staff meeting and significantly increased the level of reliability with which they were assessed, it represents a substantial improvement on the previous system.

A consideration going forward might be to reflect on how this system of assessment would be affected if it is adopted as the standard method of writing assessment for schools in the UK. If this happened, how might the nature of high stakes accountability impact on the process and what would be the implications for schools using the system?


Bias in Teacher Assessment slide taken from the presentation From Evidence to Great Teaching, Robert Coe, March 2015. Accessed via the internet, April 2017:



One thought on “Our Experience of Using Comparative Judgement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s