The RCTs have landed – but has anyone noticed?

Last month the Education Endowment Foundation published their first set of evaluation reports. The six studies involved 238 schools and 6,800 pupils and represented the first stage in a process that aims to deepen our knowledge of what works in education, with a particular focus on raising the attainment of those eligible for free school meals. Several of the studies were designed as randomised controlled trials (RCT’s) which are widely acknowledged as the most powerful way to determine causality in education research.

Given the clamour for RCT’s amongst the education community – most evident at the ResearchEd event last autumn – it was surprising how little coverage these reports got. Kevan Collins (from the EEF) did his bit with a rather self-congratulatory piece in the TES trumpeting the reports as an example of what can be achieved when schools work together. A couple of other articles commented on the findings in relation to teaching assistants who came out of these studies slightly better than they have done in other research. But on the whole the reaction was muted: nothing groundbreaking had emerged to shake the foundations of the education system. Instead we got a preview of the kind of nuanced findings that these RCT’s are likely to produce over the next few years.

It is no coincidence that the effect sizes detected were relatively small (typically ranging from 0.10 to 0.25). The attraction of RCT’s is that they can do a good job of isolating the impact of a particular intervention. It is hardly surprising, therefore, that the reported gains were modest. Improving attainment is difficult and it would be extraordinary and implausible if relatively small interventions led to huge gains. Hopefully the publication of these reports will lead to a greater questioning about the reliability of much of the existing research. The truth is that many of the studies included in Hattie’s Visible Learning, for example, fail to meet the basic standards of methodological rigour. Reading Recovery is a good example. As the evaluation report from the EEF notes, of 78 existing studies on Reading Recovery only 4 met basic evidential standards. In the coming years we will see an increasing divide between the methodologically sound studies that report modest effect sizes and the less rigorous that will continue to claim implausibly large benefits for certain interventions. It is critical that consumers of education research are aware of that divide and encouraged to look beyond effect size and instead at how robust the finding is. The EEF could help here by getting rid of the conversion they use in the toolkit between effect size and months progress. It is neither rigorous nor particularly illuminating and it promotes a one-dimensional analysis of the evidence.

The other point that stands out from the first round of reports is the difference between evaluation and endorsement. Even where a positive outcome is detected it does not follow from this that the intervention is worth pursuing. The costs – both financial and opportunity – need to be carefully considered. In some cases an intervention that appears effective on the surface might actually be a poor use of time and resources. The Catch-Up Numeracy study, for example, found that although the carefully designed small-group intervention programme was effective compared to equivalent time in the classroom, it was actually less effective than other small-group interventions. Until we have built up a large-number of RCT’s looking at a whole range of interventions we are unlikely to be able to make accurate judgments about what constitutes best practice. In this sense the utility of the toolkit is cumulative: the more reports that are included the more powerful it will become as a guide for decision making in schools.

For now the publication of these initial reports should be welcomed. They will not immediately tell us how to improve the education system but they will start to give us an idea about what works (reading recovery-style literacy catch-up programmes) and what doesn’t (Catch-Up Numeracy). If this knowledge informs decisions that school leaders make about where to invest time and money then we are certainly moving in the right direction.

This entry was posted in Uncategorized. Bookmark the permalink.

3 Responses to The RCTs have landed – but has anyone noticed?

  1. Peter Simpson says:

    We hadn’t noticed, and we were one of the trial schools. Having now read ‘our’ report as well as the others, and being broadly in favour of RCTs in education and what the EEF are doing, I do think they have a way to go. A few observations below.

    Catch-up – whilst an encouraging story about how effective TAs can be, the main problem here is that the TAs themselves selected the pupils they felt they could help. And not too surprisingly those they went on to help, ended up doing better than those they did not. As such it is hardly surprising that the catch–up programme itself had no additional impact as it may well be that that wasn’t what was really being tested here. Perhaps need to select pupils using objective data and TAs randomly to test the claim that TAs generally are effective?

    Switch–on – like the above, great claims about how effective TAs are, but there is confusion as to how much was actually delivered by TAs themselves as opposed to teachers, and more importantly which TAs. As we all know many TAs are excellent, and would make (or in some cases are already) great teachers, and if only those were included in this trial then it is not surprising they had an impact. To apply the findings to TAs generally it would have been necessary to randomly select them as well. With regards the impact on Pupil Premium / FSM children the report firstly finds the impact to be 0.05 (ie. zero months progress), then 0.3 by ignoring the different starting points between the groups (not even Ofsted do that), and then 0.36 (4 months progress) which looks good, but not clear how they got that figure.

    Future foundations – the real problem here is the huge number of children dropping out after being assigned to the programme, and even then the exact numbers doing it are not clear in the report. The executive summary appears to say that only 19% of pupils in Brighton assigned to the programme actually did it. Such high numbers of drop-outs make it impossible to make any reasonable claims about impact. Finally, we are told that it cost over £500,000 to do the project with only 193 pupils, of which an unknown number were not even a part of the trial!

    Response to intervention – this looked like a bit of a disaster, with the rather shocking conclusion that it was a ‘spoilt’ trial. Should it not have been stopped once it was apparent things were going wrong? Particularly given the inconvenience caused to schools (and pupils) of doing tests etc, let alone the wasted money.

    Grammar for writing – no impact, however it does support the important point in the article that well run trials are likely to show little (or no) impact.

    • Chris Hall says:

      Very interesting. I think you are absolutely right about the need to randomly select TA’s and pupils. I think most people would expect these studies to be focusing on typical cases rather than specifically tailoring interventions to ensure a positive impact (e.g. by allowing TA’s to pick pupils or by selecting particularly competent TA’s).

      I have to say I’m also pretty confused about the FSM effect size for Switch-On. Ignoring the differing starting points seems slightly odd to me. The obvious point is that the type of analysis should be agreed in advance and not – as the case seems to be here – adapted post-trial when the initial results seem mixed.

      The costs of some of the trials also seemed quite high to me as well, particularly in the cases where the trial did not work particularly well and ended up with small-ish samples. It was particularly frustrating to read about there being fidelity issues in some of the trials. Given the large budgets you would expect more to have been done to ensure that the interventions being delivered closely corresponded to that envisaged in the experiment design.

Leave a comment