Market Research Didn’t Just Inherit a Data Quality Problem

Read time: 7 mins

The market research data quality problem is often framed as a respondent issue, but the reality is more complex. From pricing pressure and incentive structures to recruitment methods and fraud exposure, the industry has created conditions that make poor data more likely. Understanding these upstream drivers is essential to improving data quality in a meaningful and sustainable way.

Key Takeaways

Pricing pressure and low incentives weaken respondent engagement and contribute to declining data quality
Recruitment methods and sample composition introduce hidden bias, even when datasets appear clean
Fraud is amplified by low-cost, high-speed systems that create opportunities for exploitation

When market research teams talk about data quality, the conversation usually starts with what is easiest to see: poor open ends, speeders, straightliners, duplicates, and contradictions. Those are real problems, but they are usually symptoms, not the source. Industry guidance from ESOMAR/GRBN and AAPOR has long pointed out that online data quality is shaped by the full chain of recruitment, panel maintenance, respondent validation, sampling design, and fieldwork controls, not just what appears in the final file.

The harder truth is that the industry’s quality problem is not only behavioral. It is also economic.

Over time, pressure on CPI has pushed the ecosystem toward lower-cost, faster-fill models. That does not just affect margins. It changes the conditions under which respondents are recruited, retained, and rewarded. AAPOR notes that online panels have become increasingly central as traditional methods grew more expensive, and that recruitment methods, attrition, panel freshening, and self-selection all directly affect sample quality and representativeness.

One consequence is incentives. When the value exchange with respondents weakens, engagement can weaken with it. A 2022 peer-reviewed study in a probability-based internet panel found that raising the incentive from the equivalent of $1 to $5 increased interview completion rates with minimal impact on data quality or bias. A 2024 JMIR study similarly found that higher incentives produced quicker data collection, less ad spend, and higher response rates, with $5 proving the most cost-effective option in that experiment. In other words, better incentives do not automatically create dirty data. In many cases, they help recruit and retain better participation.

That matters because the market often behaves as if lower respondent payouts are harmless. They are not always harmless. The Market Research Society has highlighted this directly in industry commentary across 16 countries, noting that many people now ask, in effect, why they should give up their data if the money is negligible. That is a sharp warning sign for anyone who cares about long-term respondent health. When legitimate participants feel the value exchange is too low, the ecosystem becomes less attractive to thoughtful respondents and more vulnerable to disengagement and churn.

Cost pressure also affects where sample comes from. To sustain volume under tighter pricing, recruitment tends to shift toward the channels that are cheaper, easier to scale, and faster to activate. That may improve throughput, but it can also narrow the composition of the respondent pool over time. AAPOR explicitly warns that recruitment methods, self-selection, and coverage error can introduce systematic bias, and that researchers need to evaluate more than completion metrics when assessing sample quality.

That is where the conversation has to get more honest. A dataset can look “clean” and still be weak. Pew Research’s 2023 benchmarking of online samples found that the average absolute error for opt-in online samples was 5.8 percentage points, compared with 2.6 points for probability-based panels. The gap widened for key subgroups: among 18- to 29-year-olds and Hispanic adults, the average error on opt-in samples was materially higher than for all adults. That is not just a fraud story. It is a composition story.

This is why lower cost can become expensive in ways that do not show up on an invoice. If narrower recruitment creates thinner audience diversity, heavier reliance on repeated pools, weaker profiling, or more hidden bias, the damage shows up later as unstable incidence, poor representation, and reduced confidence in the data.

The latest large-scale market benchmarks suggest the industry is already living with this reality. The 2025 Global Data Quality Benchmarking report, based on roughly 1.15 million research-agency records and 825,000 supplier records, found global pre-survey, in-survey fraud, and in-survey behavior removals totaling 9.4% for research agencies and 13.7% for suppliers. Supplier-side pre-survey removals alone were 7.4% globally. In North America, the same report showed notable gaps between sold and actual incidence, including 64.7% sold vs. 49.4% actual for U.S. research agency work and 46.2% sold vs. 36.3% actual for U.S. suppliers. That is a sign of an ecosystem under pressure, where projected feasibility and real respondent availability are not always lining up.

Fraud also becomes more attractive in weak-value, high-volume systems. SampleCon’s guidance on end-link fraud explains that bad actors exploit incentives for financial gain through manual fraud, scripted fraud, and ghost completes. The document notes that the return on this activity is sufficient for it to proliferate, and that these attacks can create tens of millions of dollars in cost across the ecosystem while reducing the amount of real data collected. It also highlights how increasingly technical fraud can bypass surveys altogether or exploit weak redirect structures.

That point is critical. Fraud is not just a matter of careless respondents anymore. It is often a function of the environment we have created: lower friction, weaker protections, thinner incentives, and greater pressure to scale cheaply. In those conditions, bad actors do not need a perfect opening. They just need enough of one.

So yes, the visible symptoms still matter. Poor open ends matter. Duplicate responses matter. Speeding matters. But if the industry focuses only on what appears in the dataset, it stays stuck in cleanup mode. It treats downstream evidence instead of upstream causes.

The more useful frame is this: data quality problems often begin with three structural issues.

First, compressed economics can weaken the respondent value exchange and push quality to the edge. Peer-reviewed evidence suggests higher incentives often improve response and completion without materially hurting data quality.

Second, narrower recruitment and heavier dependence on limited channels can increase self-selection risk, coverage gaps, overexposure, and hidden bias. AAPOR and Pew both reinforce that representativeness and subgroup accuracy are not guaranteed in opt-in ecosystems, even when studies appear operationally successful.

Third, a low-cost, high-speed environment creates more room for fraud and technical manipulation to thrive. SampleCon’s fraud guidance and the GDQ benchmark both show that a meaningful share of respondents now need to be removed before, during, and after field.

None of that means the situation is hopeless.

It does mean the industry has to stop pretending this is only a respondent problem. It is a system problem.

And system problems need system responses.

That includes better recruitment discipline, better source diversification, better quota design, stronger profile validation, more realistic incidence planning, better encryption and routing security, and better respondent experience. It also includes using modern quality tools to catch low-quality or fraudulent respondents earlier, before they distort the dataset. The 2025 GDQ benchmark explicitly calls for greater use of technology to identify removals earlier in the survey process, and industry guidance from ESOMAR/GRBN emphasizes emerging quality-assurance practices across panels, routers, and exchanges.

The takeaway is simple.

The industry did not just wake up with a quality problem. It created conditions that made one easier to grow.

If we want better data, we cannot just get better at spotting bad records.

We have to get better at fixing the environment that lets them in.

Want to talk more about data quality? Let’s connect.

FAQs

What is causing the market research data quality problem?

It is driven by a combination of economic pressure, weak incentives, recruitment bias, and increased fraud opportunities across the research ecosystem.

Why are incentives important for data quality?

Stronger incentives improve engagement, completion rates, and respondent quality without necessarily increasing bias.

Is fraud the main issue in survey data quality?

Fraud is a major factor, but it is often a symptom of deeper structural issues like pricing pressure and weak system design.

The Industry Didn’t Just Inherit a Data Quality Problem. It Priced Its Way Into One

Read time: 7 mins

Key Takeaways

FAQs

Toronto, ON

Grundy, VA