Evaluation Design and Methods
Evaluation is a contested discipline. We are aware of the ongoing and healthy debate about what types of evidence are appropriate to inform policy and practice in U.S. education and in international public health and development. However, the diversity of our partners and areas of focus precludes us from promoting only certain types of evaluation evidence as acceptable for decision making.
We avoid a one-size-fits-all approach to evaluation because we want our evaluation efforts to be designed for a specific purpose and for specific intended users. This approach to evaluation design, which we call fit to purpose, has three elements:
- It allows for a range of methods, including qualitative and quantitative data collection and analysis, retrospective and prospective designs, experimentation, theory-based evaluation, and systems-based approaches.
- It requires our teams, outside evaluators, and partners to be rigorous about the inferences they make and explicit about the assumptions they use to draw conclusions.
- It requires our teams and our partners to consider evaluation evidence in the context of action so the evaluation efforts produce findings that can be acted on rather than information that is merely nice to know.
The following three designs represent the vast majority of the evaluations we support.
Evaluations to understand and strengthen program effectiveness
Evaluations that help our partners strengthen the execution of projects are among the most relevant for the foundation because they provide feedback about what is and isn’t working within a specific location or across locations.
We use this type of evaluation in the following scenarios:
- When one or more partners are delivering a combination of interventions to achieve aggregate outcomes (e.g., increased and consistent use of latrines, better student achievement, exclusive breastfeeding in the first 6 months, use of a particular crop variety, or use of mobile phone–based financial services by women) in a specific location.
- When one or more partners are delivering the same approach, product, or solution in different locations.
- When we collaborate with a partner to promote effective resource allocation, planning, and delivery of services in a specific location or sector.
Such evaluations should be designed with the following considerations in mind:
- They are not expected to assess causal relationships between the interventions and the desired outcomes.
- They should have a very specific purpose and use. Because evaluations can quickly become comprehensive as well as expensive, the findings must closely match the partner’s decision-making needs.
- We support the use of technological innovations in data collection and analysis to increase the timeliness and accessibility of data.
- Both quantitative and qualitative data are relevant in evaluating processes, operations, cost effectiveness, key stakeholders’ perceptions, and enabling contextual factors.
Evaluations may include impact estimates if those are needed to inform important decisions—about scaling up an initiative, for example, or about the level of penetration needed to ensure a certain level of impact. Impact estimates should not be used as proof of macro-level impact, however.
Because the assumptions used to construct impact estimates can lead to large error margins, a robust baseline of key coverage indicators is essential, along with data on how these indicators have changed over time. Population-level impact can then usually be determined through modeling or use of secondary data.
In select cases, it may be necessary to determine a causal relationship between the change in coverage and the desired population-level impact. If so, the design should include a plausible counterfactual, usually obtained through modeling or comparison with national or sub-national trends.
Evaluations to test the causal effects of pilot projects, innovations, or delivery models
Evaluations that produce causal evidence can be used to decide whether to scale up or replicate pilots, innovations, or delivery models. They can also provide essential knowledge to the foundation, our partners, policymakers, and practitioners.
We use this type of evaluation in the following scenarios:
- When foundation teams and partners need evidence to determine which solutions within large programs are the most effective and cost-effective.
- When foundation teams and partners invest in pilot projects and innovations and need evidence to persuade others to scale up to larger geographies or replicate in other contexts.
- When we and our partners need evidence to make trade-offs between different implementation tactics, delivery approaches, and program components.
- When we and our partners need to assess the effectiveness of advocacy, social marketing, and awareness-raising tactics before deciding on an overall strategy to influence perceptions and behaviors.
Evaluations of causal relationships should be designed with the following considerations in mind:
- They should be able to clearly demonstrate that the positive or negative effects observed were caused by the intervention or tactic. They should also measure the size of the effect caused by that intervention or tactic.
- They must be able to rule out the effects of factors other than the specific intervention, by including a plausible counterfactual. We suggest using experimental or certain quasi-experimental designs in this context. If it is impractical or implausible to create a counterfactual (e.g., when a national institution provides technical support to a government partner), we suggest using the evaluation design described next.
- They are more useful when we test variations rather than a single line of inquiry (e.g., does x work or not?).
- They should examine the equally important questions of how and why the changes were caused by the intervention, by looking at processes, performance, and costs.
Evaluations of causal relationships should not be used when existing proxies of effectiveness and outcomes are sufficient. They are also not appropriate for evaluating whole packages of interventions with multiple cause-and-effect pathways.
Evaluations to improve the performance of institutions or operating models
Evaluations that provide a neutral assessment of the effectiveness of an organization or operating model can inform foundation and partner decision making about how best to use financial or technical resources, resolve challenges, and support ongoing progress.
We use this type of evaluation selectively, in the following scenarios:
- When we work with a partner organization that is essential to the success of a foundation strategy.
- When our relationship with the partner is at a critical juncture where additional, detailed information on a specific area of operation can inform next steps, strengthen collaboration, and depersonalize decision making.
- When we develop new ways of working (e.g., by establishing a deeper presence in a specific country) and an objective, systematic assessment can inform decisions about implementation and strengthen relationships with key stakeholders.
Evaluations of institutional effectiveness and operating models should be designed with the following considerations:
- Such evaluations can easily become too comprehensive and burdensome to foundation staff or partners, so rigorous selection of evaluation questions and a clear purpose are essential.
- Whenever possible, these evaluations should be done in close collaboration with other donors so we can gain efficiencies, achieve a common understanding of the support that key partners need to succeed, and continue learning from our joint experience.
Such evaluations are largely qualitative and should not seek to assess the causal relationship between a partner organization or operating model and program outcomes.