What Keeps Researchers Up at Night? Sample Quality. And They Are Right to be Worried

Sample quality

“…issues surrounding access to quality and representative sample is without a doubt the single biggest individual challenge mentioned in the survey. General decline in the quality of sample with falling participation rates…is another concern.”

This quote is direct from the most recent GreenBook Research Industry Trends (GRIT) report, the world’s largest ongoing survey of research professionals.

These concerns about the quality of sample bubbled to the surface when a sample of 1,533 researchers from around the globe—both client-side and supplier—were asked an open-ended question about challenges that exist in the research industry today.

We’re glad to see that people are concerned about sample quality—they should be. We’ve been tracking this problem for years and it’s getting worse, not better.

Here are four points of grave concern, and one reason for hope:

  1. River sample is not reliable;
  2. Most “panel” sample is not actually from a panel, it is river sample sold as panel;
  3. Panel sample that is recruited from sources like loyalty cards will skew the data;
  4. Publisher source panel is not only poor quality, it can be downright misleading;
  5. Sample from a well-maintained community can be both reliable and representative.

Let’s unpack these.

River sample is not reliable

The name river sample evokes visions of pristine waters flowing softly, with babbling rapids and a meandering path through a verdant forest. But that’s not exactly what it is. River sampling is an online sampling method that drives potential respondents sourced from panels, ads, pop-ups on social media, and other websites to a router, which sends potential respondents to any survey, by anyone, that might be available at the time.

In tracking research we have done using river sample, we found that 23% of questions showed a statistically significant difference over a one year period. That’s almost five times what we’d expect to see at a 95% confidence interval. And far from reliable.

Most panel sample is not actually from a panel, it is river sample sold as panel

The advent of social media and publisher sources has radically changed the “sample” we use in recent years. We have gone from relying on panels of known respondents—people we have been vetted and profiled—to streams of unknown respondents whose motivations for answering our questions are variable and not always aligned with our aim of collecting reliable and useful data.

Research by Compete, a Millward Brown Company, tracked the actual URL source of sample sold through the largest panel and other sample providers in the United States. This research shows how, in recent years, the source of most sample—including sample sold by “panel” sources—had become unknown, non-panel sample that is simply resold from secondary sources that are coming from social media, paid survey or routers.

Panel sample that is recruited from sources like loyalty cards will skew the data

Market research is intended to inform intelligent decision making by providing accurate information on the marketplace. If your sample source provides you with information that flavors the results so much that they lead to misinformation, then your results are useless.

We tested a single source community that was recruited from a loyalty program to see whether it might be more reliable than something like river or publisher sample. We conducted a study using matched samples from Maru Voice Canada and a very large community recruited from a loyalty program.

We compared the results on a total of 69 items in the questionnaire and found the results differed significantly on 45 of those items. This represents two-thirds of all the responses we tested.

Despite being demographically identical, the people who were sampled from the loyalty program behaved very differently. The samples varied significantly on many things, including where they shopped, the technology they owned, how much they travelled, whether they had a car, what medical conditions they suffered from, where they travelled, what loyalty programs they were part of, the credit cards they used and the number of credit cards they had.

That’s a problem when you need to make decisions based on the results.

Publisher source panel is not only poor quality, it can be downright misleading

We also studied data coming from a well-known service that encourages publishers to “monetize your website’s content” and “join the hundreds of publishers who are using [this service] to earn revenue from their content.” So the people who are answering the questions are doing so solely because they want to get past the survey and to the content they desire.

In this study, we tracked the percentage of people who said they were active on a number of the most common social media sites over a two year period. What we found were data suggesting wild increases and decreases in social media habits.

These data appear implausible and, indeed, when we compared them to data from Pew’s Social Media tracking data we saw a very different story. Not only did sample from this source grossly underestimate the prevalence of social media activity, but it also suggests dramatic upheaval where Pew’s data would indicate there has been little change.

Sample from a well-maintained community can be both reliable and representative

Our high-quality market communities, Maru Voice Canada and Springboard America (U.S.) are well recruited and meticulously maintained communities of well profiled and engaged respondents. They are a far cry from river and publisher sample and, having been recruited from many, many sources, are not skewed by being sourced primarily from something like a loyalty card. We know they are quality sample because we test them continuously.

We test their validity and reliability in a few ways. One is by comparing our results to known “realities” like election results. Forecaster Nate Silver’s rating of online sample sources shows Springboard America is the most reliable in matching election results.

Another way we test our communities is by tracking the same measures over time, to see if our results are consistent. We have a set of questions we ask on a regular basis. They are designed to measure things that we don’t expect to change very much, like car ownership and visits to the dentist.

The results are, well, boring. We see nothing we would not expect if we used a margin of error at a 95% confidence interval—aka the proverbial 19 times out of 20. As boring as that is, it’s yawn-inducing in a good way. These are the kind of results that don’t keep researchers up at night.

When you are using research to help you make the right decision, you need results that are reliable in the way that bridges and buildings are. Our market communities are sources of dependable information because we carefully recruit from a vast array of sources—to ensure we don’t introduce a notable source of bias. We also take great care in how we engage the people that join our market communities. We create true communities where we value people’s opinions, respect their time and show them how their feedback makes a difference. For us, they are not just “sample”, they are people.

Don’t hope for good sample. Ask questions about sources and quality

Researchers in the GRIT study were right to name sample quality as their number one concern, as our research has confirmed. If the sample is not good, the insights won’t be either.

We encourage all researchers to ask these three key questions whenever they make sample decisions:

  • Am I actually getting panel sample or is it repackaged river sample?
  • From what sources are the people recruited?
  • Is there evidence of the reliability and validity of the sample?

These are profoundly important questions, questions that will determine where the research you conduct delivers insights or provides misinformation that would result in an incorrect decision.

When obtaining sample, ask questions. Know your source. And sleep easy.

Be sure to also read: The Danger of Relying on Statistical Significance.

next post thumbnail