Our ratings explained

What do our ratings mean and how should you use them?

Any attempt to simplify multiple studies into a single rating score will be imperfect. Yet busy professionals need help to identify the most promising ways of working. We provide ratings for interventions in the Evidence Store in relation to:

  • Overall effectiveness: looking at the consistency of effect across different research studies

  • Strength of evidence: looking at how confident we can be about a finding, based on how the research was designed and carried out.

Each intervention in the Evidence Store is rated on a four-point scale for both overall effectiveness and for strength of evidence. Here’s how our research partners at Cardiff University created these ratings.

Overall effectiveness

For each intervention in the Evidence Store we define the effect in relation to how consistently the intervention was found to be effective across studies.

The best way of assessing the overall effectiveness of an intervention is through a meta-analysis. A well conducted meta-analysis pools data from various studies to produce a robust measure of overall effect. Where it was not possible to carry out a meta-analysis (e.g. because different studies measured different outcomes, or measured them in different ways) we have looked at the effects found in individual studies and whether they combined to give a consistent message about effect, or whether the overall picture was mixed.

Studies might also vary in the number of people involved – so how do we assess one big study that found one result if three smaller studies found something different? We’ve taken a simple approach of requiring a majority of both studies and the participants involved to show a particular outcome when pooled together.

Our approach to overall effect is therefore a four-point scale that communicates what we know about how effective (or not) an intervention is. The table below shows the scale, with icons and an explanation of the rating.

Icon Name What this means
Negative effect Evidence tends to show negative effect. The balance of evidence suggests that the intervention has a negative effect, meaning the intervention made things worse. Where there was a meta-analysis to provide a pooled effect size, this showed a negative effect. Where there was no meta-analysis, most of the studies AND the studies involving most of the participants had a negative effect.
Mixed or no effect Mixed or no effect. The balance of evidence (including the pooled effect size from meta-analysis where available) suggests that the intervention has no effect overall, or studies show a mixture of effects and the criteria for negative or mixed positive effect are not met.
Tends to positive effect Evidence tends to show positive effect. The balance of evidence suggests that the intervention has a positive effect, meaning that outcomes improved. There are one or more studies showing a negative effect, but either there was a meta-analysis which showed a positive effect; or there was no meta-analysis but most of the studies AND the studies involving most of the participants had a positive effect.
Consistently positive effect Evidence shows consistently positive effect. Most published studies have positive effects and none have negative effects for this outcome. Some individual studies may show no effect. However, either the pooled effect (in a meta-analysis) or most studies AND the studies involving most of the participants have a positive effect.

Strength of evidence

The evidence store makes judgements about the strength of evidence included in systematic reviews. Our overall framework is provided by the EMMIE system developed by the UCL Jill Dando Institute, for use by the What Works Centre for Crime Reduction in their toolkit. EMMIE uses a specific way of evaluating the quality of existing reviews called the EMMIE-Q. The Crime Reduction Toolkit is working with a well-developed and substantial global literature from criminology. When we started to apply the EMMIE-Q, almost no studies in children’s social care met the criteria to obtain a meaningful score. There were also some issues with several poor studies being in a review that was done well. We have therefore adapted the EMMIE-Q to provide a four-point rating for strength of evidence.

Our first two ratings are an attempt to differentiate between reviews that contain no good quality evidence and those where some good quality evidence is present. Here we make a judgement about the number of acceptable quality studies within a review that we are summarising. To be acceptable, a study has to meet key quality criteria. We have adapted these from the core ones developed and used by the Early Intervention Foundation (EIF). We make a judgement about whether there are no acceptable quality studies (which gets a 0), whether there are 1 or 2 (which gets a 1) or whether there are 3 or more. Where there are 3 or more this meets the threshold for us to apply the EMMIE-Q. The requirements of EMMIE-Q are combined to allow us to differentiate between lower and higher scores.

This process allows us to rate the strength of evidence in an existing review on a four-point scale:

Icon Strength of evidence What this means
0 Very low strength evidence No acceptable quality studies
1 Low strength evidence One or two acceptable quality studies
2 Moderate strength evidence Three or more acceptable quality studies. High quality review therefore possible. Between 0-3 EMMIE-Q requirements are met, indicating strong confidence cannot be placed in review findings.
3 High strength evidence Three or more acceptable quality studies. High quality review therefore possible. Between 4-6 EMMIE-Q requirements are met including all themes marked* (see below), indicating a high quality review in which strong confidence can be placed.

Defining an acceptable quality study

The following definition of an acceptable quality study is consistent with key elements of the definition used by the Early Intervention Foundation (EIF). An acceptable quality study must have the following characteristics:

  1. The sample is sufficiently large to test for the desired impact. A minimum of 20 participants are subject to measures at both time points within each study group (e.g. a minimum of 20 participants in the treatment group AND comparison group).
  2. The study must use valid measures. Participants might be asked to complete measures at various points, and these measures should reliable, standardised and validated independently of the study. Administrative data and observational measures might be used to measure programme impact.
  3. Comparability of groups is addressed in selection and/ or analysis. This might be achieved through randomisation, or by selecting a comparator group based on matching criteria, or through analysis by using statistical techniques such as propensity score matching.
  4. An ‘intent-to-treat’ design is used, meaning that all participants recruited to the intervention participate in the pre/post measurement, regardless of whether or how much of the intervention they receive, even if they drop out of the intervention (this does not include dropping out of the study – which is then regarded as missing data).
  5. The study should report on overall and differential attrition (or clearly present sample size information such that this can be readily calculated).

EMMIE-Q requirements

The EMMIE-Q identifies 6 requirements, each relating to a different aspect of study quality. These inform the assessment of the methodology of studies that are used to measure effect. They are as follows:

# Requirements
1. A transparent and well-designed search strategy*
2. High statistical conclusion validity (at least four of the following are necessary for a study to be considered sufficient)* (a) Calculation of appropriate effect sizes (b) The analysis of heterogeneity (c) Use of a random effects model where appropriate (d) Attention to the issue of dependency (e) Appropriate weighting of individual effect sizes in the calculation of mean effect sizes
3. Sufficient assessment of the risk of bias (at least two necessary for sufficient consideration)* (a) Assessment of potential publication bias (b) Consideration of inter-rater reliability (c) Consideration of the influence of statistical outliers
4. Attention to the validity of the constructs, with only comparable outcomes combined and/or exploration of the implications of combining outcome constructs*
5. Assessment of the influence of study design (e.g. separate overall effect sizes for experimental and quasi-experimental design)
6. Assessment of the influence of unanticipated outcomes or spin-offs on the size of the effect (e.g. quantification of displacement or diffusion of benefit)

Requirements 1-4 (highlighted by *) are considered particularly important, and are required for any review to achieve a rating of 3, which is the highest rating in the scale.

We then use the number of EMMIE-Q requirements present to inform a judgement on strength of evidence, differentiating between a 2 and a 3 in our strength of evidence scale as outlined above. This is different from the way the EMMIE-Q scores are used by the What Works Centre for Crime Reduction because, as discussed above, there is more high quality evidence in that field.

The What Works Centre’s outcomes framework

Research in the Evidence Store has to focus on outcomes that fit in the What Works Centre’s outcomes framework. In this framework there are three sets of primary outcomes which are:

  • The rights of children, parents, carers and families
  • Children’s and young people’s outcomes
  • Parent, carer and family outcomes

There are also process outcomes that relate to organisational factors around Children’s Social Care. These include:

  • Cost-effectiveness of services
  • Workforce outcomes
  • Skills, knowledge and experience of social workers and other social care professionals

Read a full description of the What Works Centre’s outcomes framework here.

Who's involved

Development team

Research partner