Comparative Judgement – what we did next.

I previously wrote about our experience of using comparative marking to judge Year 6 scripts as part of the sharing standards trial. A short time after posting that piece, the results came through. There was a lot to reflect on. As well as providing us with information on how the school was performing in relation to other schools who took part in the trial, it also gave us a lot of interesting data relating to the individual writing samples that we submitted for each child.

The data included:

  • the percentile rank of each child in relation to all the children that took part in the trial;
  • the probability of each child reaching the expected standard by the end of Year 6;
  • the probability of each child reaching the greater depth standard by the end of Year 6;
  • the current standard that the child is working at based on the samples provided (working towards, expected standard, greater depth).

As an international school, any data that allows us to benchmark ourselves against other schools using the English National Curriculum is always useful. Having received all this information, the question was, what next?

In this respect, we found Jon Brunskill’s blog, ‘Comparative Judgement – now what?’ really useful. The post suggests some practical formative strategies for using the information gained through a comparative judgement session.

Sharing the results and gaining feedback:

We felt that at this point it would be useful to book a  meeting to allow staff to reflect on the process so far and gain feedback from teachers based on the results we had received.

At the start of the meeting, we shared some of the wider research into comparative judgement and reminded staff why we had taken part in the trial in the first place. Drawing heavily on the book, What every teacher needs to know about psychology by David Didau and Nick Rose (2016), we discussed the following points:

  • the idea that humans find it difficult to make individual judgements of quality and that we are far better at comparing the similarities and differences between things;
  • the idea that although mark schemes, such as the APP writing scheme, provide the appearance of objectivity, they can often warp the teaching and assessment process. Lots of teachers related to the experience of adjusting their assessment of writing half way through a marking session, based on a script that was much better or worse than the previous scripts;
  • the idea that traditional teacher assessment can give rise to systematic bias against students who speak English as an additional language, have special educational needs, or who have a personality that is different to the teacher marking the work and that the comparative process can help guard against this;
  • the idea that comparative judgment encourages quick intuitive judgements and that this can feel inaccurate but that the reliability scores are high;
  • the difference between reliability and validity and the fact that all summative tests will suffer from validity issues in relation to the inferences that we draw from them;
  • the feelings of guilt we have as teachers when making quick intuitive judgements of work that the students have taken a lot of time to produce;
  • the fact that comparative judgment speeds up the summative process so that rather than spending large amounts of time agonising over whether a piece of writing is a 3C, 3B or 3A, we can get past this through a quick comparative marking session and then focus on the formative process of planning the next appropriate teaching steps for the class.

Reflecting on the data & the samples:Slide 3

We then asked teachers to get into small groups and look again at a mix of the scripts drawn from the trial. We asked staff to discuss the scripts and consider the approaches they would use in class to develop the children’s writing further.

We also asked teachers to suggest how we might use the samples and the data gained from the trial to inform formative strategies in the classroom. Similar to Jon Brunskill’s post mentioned earlier, some of the suggestions included:

  • taking anonymised samples from the trial and asking the children to compare samples that had been ranked highly with samples that had been ranked towards the lower end and discussing the differences (making sure that the samples were not drawn from the children in that class);
  • collecting a range of exemplar scripts from the trial and keeping these as models for reflection and instruction when covering the same genres in class (as part of the final report, released a range of anonymised samples from the different schools involved in the trial);
  • the idea of using samples of writing to help hone ideas of quality during the reporting process. Anonymised samples could become models of the expected standard and greater depth standard so that children and parents could get a better idea of the quality of writing schools are aspiring to. This could be tailored to individual children so that developing writers were given realistic samples that suggested next steps.

Feedback from staff – the role of handwriting bias:

Overall, staff were positive about the experience of using comparative marking but they did have some concerns relating to the validity of the process, particularly in relation to the influence that handwriting had on intuitive, 30-second judgements. I have included a sample of the typical comments we received below:

I like CJ, I think we should use it instead of other summative levelling because of the better reliability and the time saved. It has flaws but they are fewer than APP and it has good potential.

I fear that it is even more biased towards handwriting than other assessment measures.

I think that it is worth trialling for a year as it hasn’t been proved to have any more deficiencies than the current system. It might start losing its efficacy over time though (like the current system!)

I am concerned about large anomalies in the results. They are significant. There are clear problems with judging handwriting which I think is less of an issue when you are marking your own class as you are used to their writing. Does it feed into a competitive atmosphere in the school community? What are we going to do with this ranking?

It has huge potential. I like that lots of people look at the writing.

I like the fact that the comparative marking process leads to more reliable judgements & more holistic judgements.

After receiving this feedback, I spoke with the Year 6 team and we reflected on the impact that handwriting had on the overall results. Having reviewed the samples and looked more closely at the anomalies we came to the following conclusion: in cases where the content of scripts were of a similar quality but there was a glaring difference between the standard of handwriting, then it was obvious that handwriting had played a big role in the judging.

At this point, it felt like we had two options:

  1. dismiss comparative marking as being a process that is overly biased towards scripts with better handwriting;
  2. accept the fact that if a room full of teachers are unable to filter out a clear handwriting bias then it is probably worth making the effort to try to improve our students’ handwriting so that we can judge samples more fairly in future.

In the end, the second option felt more natural to us and we are going to have a push on standards of handwriting next year.

Splitting formative and summative processes:Slide 2

In my opinion, the real benefit of comparative judgement, and the reason it has so much potential, is that it splits the formative and summative process of assessment. Rather than spending large amounts of time poring over rubrics at certain checkpoints in the year to make unreliable summative judgements in relation to whether a child is a 3B, 3A or 4C, teachers can instead use the time to analyse writing to work out what to teach next. The summative aspect becomes a quick collaborative process that guards against individual teacher bias. Yes, it isn’t perfect and yes, handwriting certainly seems to have been a factor in some of the judgements made during our trial, but no assessment is perfect. Perhaps the real lesson for us is that we have been avoiding a focus on handwriting because we have undervalued it and this process has certainly been useful in re-evaluating this mindset.

We have now made a decision to extend the comparative judgement trial next year. With the help of Dr Wheadon, we have come up with an assessment plan that will allow us to move away from using national curriculum levels for the first time. I would be happy to share our plan in the near future.






Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s