By David A. Roberts
The most critical challenge that philanthropy has faced over the last several years is proving with evidence, rather than inference, that financial gifts have made a significant impact. This has become even more critical during the present economic crisis as donors give more selectively. In one of the New York Times Magazine‘s annual “Money” issues, the impact measurement is referred to as “philanthropy’s largest problem.”
Most non-profits and foundations have been more talk than action when it comes to measuring impact. There are several reasons why. First, as everyone acknowledges, measuring impact is hard — and many charities and foundations believe it is too expensive. But another big part of the reason that detailed impact evaluations aren’t de rigeur is that many people in the sector fear finding out the truth of their programs’ impact. I recently had a conversation with a leader of a respected philanthropy in my hometown. After telling him of my interest in measuring the impact of philanthropic giving, he looked at me and said, “you’re the bad guy.” This typical response embodies the concern of many that there is a draconian motive behind impact measurement, rather than a desire to make things better.
But perhaps the single biggest challenge to collecting meaningful impact data is that too many philanthropic organizations simply do not know how to go about measuring. Some don’t even try. Nonprofits today may say they are interested in impact, but rely for their ‘evidence’ on self-generated reports from beneficiaries and implementing agencies. This style of reporting produces “feel-good” stories but that’s about it. It really doesn’t tell you about the overall outcome of a program or how it compares to others—which is ultimately the point of measurement.
There of course are some organizations that have taken the first steps toward impact measurement through the use of surveys. Surveys are the most popular form of measurement because they seem to be straightforward, easy and relatively cheap compared to the alternatives. Unfortunately these first steps are often dangerous. Why? Because surveys can easily lead to bad data and false conclusions that can result in future missteps.
Is there a way for organizations to start getting serious about impact evaluation without either breaking the bank or misleading themselves through surveys?
Yes. Here’s how:
Point 1: Design Carefully
Good surveys are not as easy as you may think. If you read Philanthropy Action with any frequency, you’ll have read about randomization in surveys and impact studies. Randomization basically means that the study sample is randomly divided into treatment and control groups so a clean comparison between them can be made. Randomization is critically important because without it, you introduce the opportunity for all sorts of bad data.
The most common problem for surveys that are not randomized is self-selection bias. Most organizations that use surveys, in the nonprofit world and elsewhere, ask for volunteers to fill out their surveys. Unfortunately, the people who volunteer to take surveys often have more in common with each other than they do with the whole group you are trying to assess. Voluntary surveys often attract only the happiest and unhappiest participants, which skews the survey data beyond repair. Even some randomized surveys suffer from problems when potential respondents can easily opt-out of taking a survey. This is known as non-respondent bias.
For charities dealing with sensitive problems reporting bias is a big problem. Put most simply, respondents often don’t want to be completely honest about such issues as religion, sex practices or health. Recent work by economists Dean Karlan and Jonathan Zinman in South Africa found that 40 percent of respondents to a survey purposely or accidentally provided inaccurate information about their debts (for instance by denying that they had taken a microfinance loan).
Finally, another common problem with surveys is interviewer bias. This occurs when the interviewer asks a question a certain way, provides an opinion to the respondent or disturbs the respondent’s answering process. But surveys can be skewed by nothing more complicated than the sex or identity of the survey taker. In the South Africa lending study mentioned above, women were less likely to admit they had taken a loan to male surveyors than to female surveyors. Best practices to guard against interviewer bias include ensuring that no more than 10 percent of the sample is collected by any one person. But that can make the cost of a survey climb quickly.
Point 2: Use the Right Tools
Even well designed surveys sometimes miss the mark. That’s because they aren’t always the best tool for answering the question at hand. For instance, over the last 25 years charities of every stripe have spent billions educating people around the world about how to prevent HIV. A survey might give us good data on HIV infection rates over time, but it can’t reliably tell us what prevention education programs were most effective. A better approach is to measure how participants’ knowledge and beliefs change as a result of an education program.
But knowledge and beliefs are what we call latent or hidden traits. I’ve met plenty of people who believe that latent traits can’t truly be measured. But there are good tools for measuring latent traits—specifically Item Response Theory (IRT). IRT is an accepted and rigorously used methodology within the fields of psychology, pain management, business and education. The most famous use of IRT is in the Graduate Record Exam (GRE), required by the majority of graduate schools for entrance, which measures the latent traits of knowledge and aptitude.
Using IRT takes more time and money, but it produces meaningful data when measuring latent traits—which surveys simply cannot do.
Point 3: If you don’t have the right tools, don’t mix in faulty ones
As a student in Public Health at Johns Hopkins, I was asked by a friend in Northern India to design a survey to measure the prevalence of tuberculosis (TB) in a neighboring valley. A local hospital was trying to assess the presence of TB there in order to tailor a new program. We ran two well-designed and executed surveys—and found no TB to speak of in the valley. The TB treatment programs for the valley were canceled. Yet shortly thereafter people with advanced cases of TB began showing up at the hospital.
What went wrong? The tools we had for detecting TB simply weren’t good enough. The hospital had used a cheaper method of TB detection that yielded high rates of “false negatives”—telling us that someone with TB actually didn’t have the disease. The hospital had tried to save money on the survey effort but had in effect wasted all the money they’d spent. The lesson: it’s better not to run a survey at all than run a faulty one.
What all this adds up to is a cautionary tale about wading into impact measurement without proper planning. Surveys, poorly designed or poorly executed, can do more harm than good.
That being said, correcting these problems doesn’t have to be a huge expense. Engaging experts to help design a survey upfront does cost more than doing a survey on your own, but it will likely mean the difference between a wasted and a valuable effort. You can engage in some experiments to gain experience in survey design and implementation that will help you improve your surveys—and prove to stakeholders that well-designed surveys are an absolute necessity.
For instance you can run some test surveys using different methodologies to see how your surveys could be skewed. Here are few ways to get started:
1. Come up with a question you want answered and a population that you know and that is easily accessible to you. This could be beneficiaries from your projects or could be donors who partner with you. If you work with beneficiaries, you could ask questions about program participation, other organizations they receive services from, how long they’ve been involved, what other needs they have and their satisfaction levels with the services they receive.
2. For a well run survey, you’ll need to determine how large of a sample size you’ll need to get accurate results. You can use one of the many calculators on the web, such as the one at Survey System.
3. Next, if your population is over several hundred, you can randomly select which of your population will be sampled using one of the many random number generators on the web. One way to do this is to number every person in your population, and then those people corresponding the numbers that come up in from the random number generator are those who participate.
4. Now you can run a parallel survey to compare against the results of the well-designed survey. For instance, you can run the survey with a smaller sample of volunteers. You can change the wording of a few questions. You can only survey a group that shares a characteristic like women-only, people in their 20’s only, or people who are wearing jeans when you ask them to participate.
5. You can then compare the results from the various surveys and see how the average response differs—and have powerful first-hand evidence of how good survey design matters.
I’m of course very interested in the results of your tests—so please let me know how it goes!
David Roberts is Executive Director of New Dominion Philanthropy Metrics. He has more than 10 years experience working in quantitative research and public health. He can be reached at droberts (at) ndpmetrics.com