Improving the efficiency of evidence-based interventions The strengths and limitations of randomised controlled trials

as randomisation, efficacy and effectiveness, and discuss the benefits of using the RCT as the standard of intervention evaluation. We discuss how ‘realist’ evaluation contributes to what policymakers need to know in order to make a decision about an evaluation and alternatives to the RCT, such as stepped wedge, regression discontinuity, non-randomised cohort, and time series designs.


Improving the efficiency of evidence-based interventions
The strengths and limitations of randomised controlled trials such as randomisation, efficacy and effectiveness, and discuss the benefits of using the RCT as the standard of intervention evaluation. Finally, we will juxtapose this with a discussion of the limitations of RCT and how other methods can be used as a way of testing interventions.
How and why is evidence built?

Efficacy and effectiveness
If policymakers propose to invest in a violence prevention intervention (a parenting programme, a life skills curriculum, reducing access to alcohol) 6 then one of the central questions should be: does that intervention achieve the outcomes that are expected of it, so that it will be a worthwhile investment of taxpayers' money? The purpose of an efficacy trial is to answer precisely that question: did the intervention make a difference, and how sure can we be that it was the intervention (and not something else) that made the difference? This is a question of internal validity (see Table 1 for a summary of definitions).  The group of individuals who do not receive the treatment condition, against which the outcomes of the intervention can be compared.

Effectiveness
The extent to which a specific intervention, when used under ordinary circumstances, does what it is intended to do.

Efficacy
The extent to which an intervention produces a beneficial result under ideal conditions.

External validity
The extent to which the results can be generalised to populations beyond the trial. Are the results valid for populations in which the intervention was not originally tested?

Internal validity
This gives researchers the confidence to conclude that what they did in the study caused what they observed to happen, i.e., that the outcome is the result of the treatment. A research study with high internal validity lets you choose one explanation over another with a lot of confidence, because it avoids (many possible) confounds.

Intervention group
A group of participants allocated a particular treatment.

Selection bias
A systematic distortion of evidence that arises because people with certain important characteristics are disproportionately more likely to wind up in one condition. Although random assignment theoretically eliminates selection biases, a bias can still occur. Another common problem is bias in selection to the trial at all -not only to which arm of the trial.

Generalisability
Related to issues of efficacy and effectiveness, another important question is whether the intervention will work with a different group of people.
If a parenting programme was tested in Soweto with Setswana speakers, will it also work with isiZulu speakers in Ixopo, and Afrikaans speakers in Eldorado Park? This question -one of external validity, or generalisability -is crucial if policymakers wish to roll the programme out widely (see Box 1).
If it was established as effective in one place, will it remain effective when taken to other places? Efficacy and effectiveness are linked to the concept of generalisability. When a trial is conducted in an ideal setting with all factors and variables being controlled (as far as is possible) by the researcher, it may lack a measure of generalisability. Characteristics of those enrolled in a study (e.g. sex, age, severity of the disease, racial groups) are primary factors in generalisability. 10 For example, a study of a counselling intervention targeted at women may not necessarily generalise to men or children.
Geographic settings (urban versus rural) and health care systems can also be significant factors, 11 particularly when something more complex than a drug (e.g. screening programmes, behavioural therapy) is being tested. Multiple factors determine the external validity (i.e. generalisability or applicability) of studies, including of RCTs: characteristics of those taking part in the programme and in the study, the problem under investigation, costs, compliance, co-morbidities and concomitant interventions. Also, certain aspects of study designeligibility criteria, study duration, mode of intervention, outcomes, adverse events assessment, or type of statistical analysis -greatly influence the degree of generalisability. 12

Phases of scientific discovery
For scientific evidence to be useful to policymakers, they need to distinguish which research and types of evidence will be most useful to them, which In addition, policymakers need to make decisions about how to weigh the evidence when considering implementation. 18 Victora and colleagues have proposed three levels of evidence to guide decisions: 19 • Adequacy evidence -was the intervention implemented and found to be successful?
• Plausibility evidence -were the changes found in adequacy evidence shown to not be due to other influences?
• Probability evidence -were the changes observed   However, randomisation may face opposition from policymakers and practitioners, who may believe in the value of an intervention for certain individuals or groups, often regardless of its actual evidence base, and therefore oppose random allocation. 23 For instance, in one trial -testing a substance abuse intervention in a community health centre, with the hope that it would reduce substance-related aggression as well as substance misuse and HIV risk behaviours -nurses in the health centre tried to refer patients to the intervention group in the belief that the intervention would help them, regardless of the fact that the intervention had yet to be tested. However, only after the intervention has been tested in a highquality evaluation can we have any certainty that it is effective. It is entirely possible that the intervention could have very little effect (as was in fact the case for that substance abuse intervention) 24 or even do harm. Famously, a substance abuse intervention that was rolled out widely in US high schools cost an enormous amount and made no difference to those receiving the programme: they were just as likely to use drugs and alcohol as those who did not. 25 Even more concerning, a common-sense delinquency prevention programme -taking youth at risk into prisons so that convicted offenders could scare them away from their lives of delinquency -turned out to increase offending in the young people, rather than deterring them. 26 In the long run, therefore, randomly assigning people to groups -knowing that people in need may end up in the control group and receive nothing -is more ethical than not using either random assignment or a control group, 27  Some trials screen up to 68 people for each person enrolled. 30 In many settings, RCTs emphasise standardised interventions that might be too rigid when they need to be tailored for local population needs or other settings. 31 There are also concerns about the extent to which trials conducted in highincome settings apply to low-and middle-income countries (LMIC). 32 It cannot be assumed that there will be a universal response to an intervention across contexts, since a delivery system (such as a Rahman and colleagues implemented a cognitive behavioural intervention in which local health workers, known as Lady Health Workers, delivered a mental health intervention component. 37 One of the difficulties with implementing health interventions is the lack of adequately trained professionals in most LMIC, especially in the case of mental health interventions where, in some countries, the treatment gap approaches 90%. 38 In Pakistan, Lady Health Workers are women who have completed secondary school and are trained to deliver preventive maternal, neonatal and child health care and education in the community. Lady Health Workers provide services to about 80% of the rural population of Pakistan. A cluster RCT was conducted with depressed women in their third trimester of pregnancy. Lady Health Workers were trained to deliver the intervention, while in control clusters Lady Health Workers who had not been trained in mental health made an equal number of visits to depressed women. The intervention halved the rate of prenatal depression in the intervention group. In addition, women receiving the intervention had better overall functioning and less disability up to a year later. Other health benefits included fewer episodes of diarrhoea and higher levels of immunisation in the intervention group. The intervention is a pivotal one because it is not dependent on a new or separate mental health workforce for its delivery. Rahman and colleagues argue that evidence of this sort is crucial in order to convince LMIC policymakers of the importance of integrating interventions such as these into the existing health system. This study is frequently used as evidence for how mental health interventions can be delivered by community health workers and how they can feasibly be delivered at scale -and this is undoubtedly true. There are a number of potential problems, however, with using evidence such as this in countries other than Pakistan. One is the lack of similar existing cadres of functioning community health workers such as the Lady Health Workers. Most LMIC do not have such an extensive workforce, and when they do there are significant problems with management, care delivery and supervision. 39 In addition, it is likely that the prevailing cultural and contextual conditions in this region of Pakistan (such as maternal seclusion after birth, and not being permitted visitors unless they are family) may limit the external validity of these data. All children had to attend pre-kindergarten, and so randomisation was impossible -but the regression discontinuity design used in the evaluation provided convincing evidence that the city's investment in pre-kindergarten led to worthwhile outcomes for children. 43 • Another alternative design is what is known as non-random quantitative assignment of treatment. 44 In this design, participants are assigned to a treatment group based on need or merit, rather than random assignment. The point is that programmes that are to be rolled out widely (and where people cannot be randomised) must still be evaluated, using the best possible research design.

Scale-up and 'when is there enough evidence'
Attempts have been made to rank the levels  Grantham-McGregor and colleagues implemented an intervention study of nutritional supplementation and psychosocial stimulation of stunted children. 53 A total of 129 children were randomly assigned to four groups: nutritional supplementation only; psychosocial stimulation only; nutritional plus psychosocial stimulation; and a control group. There was also a group of matched non-stunted children. Community health aides delivered the intervention. The results of the study were compelling and showed how nutritional supplementation had a beneficial effect on stunted children's mental development. Importantly though, the treatment effects were additive, with the combined intervention (nutritional plus psychosocial stimulation) being significantly more effective than either of the stand-alone interventions. 54 This study is one of the most frequently cited papers in the child development literature and has had a significant impact on the design of interventions in many LMIC. 55 A recent 20-year follow-up on the same sample found that the earnings of the stimulation group were 25% higher than those of the control group and had caught up to the earnings of a non-stunted comparison group. 56 This study is unquestionably an important and seminal one. There are, however, two particular issues that should be borne in mind when using this data to inform scale-up or interventions in other countries. The first is the small sample size -only 32 children received the supplementation and psychosocial intervention. The second has to do with the relevance of this data (particularly the long-term economic finding) to most other LMIC. Jamaica has a very high rate of pre-school attendance, unlike most LMIC. The early impact of the supplementation and psychosocial stimulation is an important and compelling finding, but it is possible that part of the explanation for the long-term benefit of the early intervention is the additive booster benefit of a high enrolment in pre-school. It is possible that in countries where enrolment in crèches or pre-school is very low, the benefits of the early intervention may disappear over time. This is of course an empirical question and should be tested, but the issue is testament to the limitations of RCTs and how longitudinal assessment in many countries is vital in order to make meaningful policy decisions.