Home

Tutoring

Subjects

Live Classes

Study Coach

Essay Review

On-Demand Courses

Colleges

Games

Opening subject page...

Loading your content

Home

Tutoring

Subjects

Live Classes

Study Coach

Essay Review

On-Demand Courses

Colleges

Games

AP STATISTICS • COLLECTING DATA

Potential Problems with Sampling

Understanding how bias, undercoverage, and nonresponse threaten the validity of statistical conclusions drawn from sample data.

SECTION 1

Historical Context & Motivation

Statistical sampling has a storied history of spectacular failures that illuminate why careful methodology matters. One of the most famous disasters in survey history occurred in 1936, when The Literary Digest magazine mailed out over ten million questionnaires to predict the outcome of the presidential election between Franklin D. Roosevelt and Alf Landon. The magazine predicted a Landon landslide, yet Roosevelt won by one of the largest margins in American electoral history. The root cause was not a small sample—indeed, over two million responses were collected—but rather a deeply flawed sampling frame that overrepresented affluent Americans who were more likely to favor the Republican candidate. This episode powerfully demonstrates that sample size alone does not guarantee accuracy; the method by which the sample is drawn is equally, if not more, important.

1936

The Literary Digest Polling Disaster

A massive voluntary-response survey of over 2 million people produces a spectacularly wrong election prediction, demonstrating the devastating effects of undercoverage bias and nonresponse bias.

1948

Dewey Defeats Truman

Pollsters using quota sampling predicted Dewey's victory, but the non-random nature of the quotas introduced systematic bias, prompting the profession to adopt probability sampling methods.

1970s

Rise of Telephone Surveys

Random digit dialing becomes standard practice, but researchers soon discover that households without telephones—often lower-income—are systematically excluded, creating undercoverage.

2016

Modern Polling Challenges

Widespread election polling errors highlight persistent problems with nonresponse bias and the increasing difficulty of reaching representative samples in the digital age.

These historical episodes raise a fundamental question in statistics: even when we intend to collect data that faithfully represents a population, what can go wrong between our sampling plan and the conclusions we draw? Understanding the sources of sampling problems is essential because flawed data collection undermines every subsequent analysis, no matter how sophisticated the statistical techniques applied. In this lesson, we systematically examine the types of bias and error that can compromise a sample and learn to identify, diagnose, and—where possible—mitigate these threats.

SECTION 2

Core Principles & Definitions

Before diagnosing specific problems, it is critical to distinguish between two broad categories of error that affect any survey or study. Sampling error refers to the natural variability that arises because we observe only a subset of the population rather than conducting a census; it is expected, quantifiable, and decreases as sample size increases. Non-sampling error, by contrast, encompasses all other mistakes—faulty sampling frames, biased wording, nonresponse, measurement problems—and these errors do not shrink with larger samples. In fact, enlarging a biased sample merely produces a more precise wrong answer. The AP Statistics exam places heavy emphasis on identifying and explaining non-sampling errors, and these form the core of what we mean by "potential problems with sampling."

Bias

A systematic tendency to overestimate or underestimate a population parameter. Unlike random error, bias pushes results consistently in one direction and cannot be reduced by increasing sample size.

Undercoverage

Occurs when some members of the population are left out of the sampling frame and therefore have no chance of being selected. If the excluded group differs systematically from the included group, the results are biased.

Nonresponse Bias

Arises when selected individuals fail to participate and their characteristics differ from those who do respond. High nonresponse rates are problematic only when nonrespondents are systematically different.

Voluntary Response Bias

When individuals self-select into the sample, those with strong opinions (often negative) are overrepresented. Online reviews and call-in polls are classic examples.

Response Bias

Results from factors that cause respondents to give inaccurate answers—leading questions, social desirability, poor recall, or interviewer influence all fall under this umbrella.

✦ KEY TAKEAWAY

Think of sampling like drawing a map of an unfamiliar territory from a helicopter. Sampling error is like the natural fuzziness of the photograph—you can improve it by zooming in (larger sample). Bias is like a tilted camera that consistently crops out the left side of the landscape—no amount of better resolution fixes the missing region. The only way to correct bias is to straighten the camera itself, which means redesigning the sampling method.

SECTION 3

Visualizing Sampling Problems

The following diagram illustrates how different types of bias distort the relationship between a population and the sample that is actually analyzed. In an ideal scenario, the sampling frame perfectly matches the population, every selected individual responds, and every response accurately reflects reality. Each type of bias represents a breakdown at a different stage of this pipeline, and recognizing where the breakdown occurs is the key to diagnosing and naming the bias correctly on the AP exam.

The diagram traces the path from the full population to the data actually collected. Undercoverage occurs between the population and the sampling frame. Voluntary response bias distorts who enters the sample. Nonresponse bias reduces who actually participates, and response bias corrupts the answers themselves.

Notice that the pipeline has multiple stages, and bias can enter at any one of them—or at several simultaneously. A survey might suffer from both undercoverage (the sampling frame excludes certain demographics) and nonresponse bias (those who are contacted disproportionately refuse to participate). When answering AP exam questions, you should identify the specific stage at which bias enters and name it precisely; saying a sample is simply "biased" without specifying the type and direction will not earn full credit.

SECTION 4

How Bias Distorts Estimation

Although many sampling problems are conceptual rather than computational, it is useful to formalize how bias and variability jointly determine the accuracy of a sample statistic. The total survey error framework decomposes the difference between a sample statistic and the true population parameter into systematic and random components. Understanding this decomposition clarifies why increasing sample size does not remedy bias.

MEAN SQUARED ERROR DECOMPOSITION

MSE(θ̂) = [Bias(θ̂)]² + Var(θ̂)

Where θ̂ is the sample statistic, Bias(θ̂) = E(θ̂) − θ is the systematic deviation from the true parameter θ, and Var(θ̂) is the sampling variability. Increasing n reduces Var(θ̂) but does not affect Bias(θ̂).

MARGIN OF ERROR (FOR PROPORTIONS)

ME = z* × √(p̂(1 − p̂) / n)

The margin of error quantifies only random sampling variability. It does NOT account for bias. A survey with a ±3% margin of error can still be off by 15% or more if systematic bias is present.

This distinction is crucial for interpreting confidence intervals on the AP exam. When a problem states that a 95% confidence interval for a proportion is (0.42, 0.48), the interval accounts for sampling variability but implicitly assumes that the sample was collected without systematic bias. If the sampling method was flawed—say, a voluntary response internet poll—then the true parameter might lie well outside the reported interval, regardless of its width. The confidence level of 95% applies to the repeated-sampling probability of capturing θ only when the sampling procedure is unbiased.

📝 AP EXAM TIP

Free-response questions frequently ask you to "describe a potential problem with the sampling method" and "explain how it could affect the results." A complete answer must (1) name the specific type of bias, (2) identify the direction of the bias (overestimate or underestimate), and (3) explain the mechanism. For example: "This voluntary response survey is likely to suffer from voluntary response bias because people with strong negative opinions are more likely to respond, which would overestimate the proportion of dissatisfied customers."

SECTION 5

Detailed Classification of Sampling Problems

The AP Statistics curriculum identifies several distinct categories of problems that can compromise a sample. While they share the common feature of introducing systematic error, each has a unique mechanism and requires a different remedy. The diagram below provides a classification tree that can help you quickly identify which type of bias is present in an exam scenario.

The classification tree separates total survey error into sampling error (random, reducible by increasing n) and non-sampling error (systematic, requiring methodological fixes). The three major branches of non-sampling error—selection bias, nonresponse bias, and response bias—each have distinct causes and remedies.

Summary of the five major types of bias tested on the AP Statistics exam

Type of Bias	When It Occurs	Classic Example	Likely Direction
Undercoverage	The sampling frame omits part of the population	A telephone survey that excludes cell-phone-only households, underrepresenting younger adults	Depends on how the excluded group differs from those included
Voluntary Response	Individuals self-select into the sample	An online restaurant review site where only customers with extreme experiences bother to post	Overrepresents strong (often negative) opinions
Nonresponse	Selected individuals refuse or fail to respond	A mailed health survey where sicker individuals are too ill to complete and return it	Depends on how nonrespondents differ from respondents
Response (Wording)	Question wording pushes answers in a particular direction	"Don't you agree that taxes are too high?" leads respondents toward agreement	Toward the implied "correct" answer
Response (Social Desirability)	Respondents give socially acceptable rather than truthful answers	A face-to-face survey about drug use where respondents underreport illegal behavior	Overestimates "good" behaviors, underestimates "bad" ones

SECTION 6

Worked Example: Identifying and Explaining Bias

Consider the following scenario, which is typical of what you would encounter on an AP Statistics free-response question: A school principal wants to estimate the proportion of students who have been bullied in the past year. She places a box in the main office with slips of paper for students to voluntarily report whether they have been bullied. After one month, she collects the slips and finds that 72% of respondents report being bullied. She concludes that bullying is a severe problem at her school. Identify all potential problems with this sampling method and explain how each could affect the conclusion.

Diagnosing Bias in a School Bullying Survey

Step 1 — Identify the Sampling Method

The principal placed a box in the office and allowed students to voluntarily deposit slips. No random selection mechanism was used. This is a voluntary response sample, not a probability sample.

Sampling method: Voluntary response

Step 2 — Identify Voluntary Response Bias

Students who have been bullied are more likely to feel strongly about the issue and are therefore more motivated to take the time to fill out a slip and drop it in the box. Students who have not been bullied have little reason to participate. As a result, the sample likely overrepresents bullied students.

Direction: Overestimates the true proportion of bullied students

Step 3 — Identify Potential Undercoverage

The box is located only in the main office. Students who rarely visit the office—perhaps those in certain grades, extracurricular programs, or those who avoid the office for personal reasons—have effectively zero probability of being included. This is undercoverage, and if the excluded students have different bullying experiences, the result is biased.

Additional bias: Undercoverage of students who do not visit the main office

Step 4 — Identify Potential Response Bias

The slips are not anonymous in a guaranteed way—students might worry that office staff could identify them. If students fear retaliation for reporting bullying or feel embarrassed, some may underreport, introducing response bias due to social desirability. This could work against the voluntary response bias, but the net effect is uncertain.

Possible response bias: Underreporting due to fear of identification

Step 5 — State the Conclusion

The principal's estimate of 72% is almost certainly an overestimate of the true proportion of bullied students because voluntary response bias is the dominant problem here. The result cannot be generalized to the entire student body because the sample is not representative. A better approach would be to use a simple random sample of students selected from the school enrollment list, with an anonymous questionnaire administered during class to maximize the response rate.

72% likely overestimates the true bullying rate; no valid generalization is possible

SECTION 7

Remedies and Mitigation Strategies

Knowing how to identify bias is essential, but a strong understanding of sampling problems also includes knowledge of how to prevent or mitigate them. The table below summarizes the primary remedies for each type of bias. On the AP exam, you may be asked not only to identify what went wrong but to propose an improved sampling design.

Remedies for common sources of bias in sampling

Problem	Primary Remedy	Why It Works
Undercoverage	Use a comprehensive, up-to-date sampling frame; consider stratification to ensure all subgroups are represented	A complete frame ensures every member of the population has a known, nonzero probability of selection
Voluntary Response	Replace with a probability sampling method (SRS, stratified, cluster, or systematic)	Probability sampling removes self-selection, giving the researcher control over who is included
Nonresponse	Follow up persistently with nonrespondents; offer incentives; keep surveys short and convenient	Higher response rates reduce the gap between respondents and nonrespondents, minimizing potential bias
Response Bias (Wording)	Use neutral, balanced question wording; pilot-test the instrument; avoid leading or loaded language	Neutral wording reduces the likelihood that the question itself pushes respondents toward a particular answer
Response Bias (Social Desirability)	Guarantee anonymity; use self-administered questionnaires rather than face-to-face interviews for sensitive topics	Anonymity removes the social pressure that causes respondents to shade their answers toward what is deemed acceptable

✦ KEY TAKEAWAY

Think of bias remedies like calibrating scientific instruments in a research laboratory. A spectrometer that is systematically reading wavelengths 5 nm too high will produce consistently wrong results no matter how many measurements you take—the only fix is to recalibrate the instrument. Similarly, no amount of data collection fixes a biased sampling method; you must fix the method itself. Probability sampling is the gold standard for calibration in the world of surveys—it is the single most important tool for ensuring that sample results can be trusted to generalize to the population.

SECTION 8

Connection to Inference and Advanced Topics

Sampling problems do not exist in isolation; they directly affect the validity of every inferential procedure you will learn in AP Statistics. When you construct a confidence interval or perform a hypothesis test, the formulas assume that the data were collected using a method that gives every individual (or at least identifiable groups) a known probability of selection. The conditions for inference—randomness, independence, normality—begin with the randomness condition, which is impossible to satisfy if the sample was collected via a flawed method. The table below connects the sampling problems covered in this lesson to the inferential concepts you will encounter later in the course.

How sampling problems propagate into later inferential procedures

Sampling Problem	Effect on Inference
Undercoverage	The confidence interval or test result applies only to the covered population, not the target population. Generalization is invalid.
Voluntary Response	The randomness condition is violated. No confidence interval or p-value is meaningful because the sample does not represent any well-defined population.
Nonresponse Bias	The effective sample is no longer random even if the original selection was. The margin of error understates the true uncertainty.
Response Bias	The data values themselves are distorted. Even a perfectly random sample produces biased estimates if the measurements are systematically off.

In more advanced statistics courses and in professional survey methodology, researchers use techniques such as post-stratification weighting, propensity score adjustment, and multiple imputation to partially correct for known biases after data collection. However, these methods require strong assumptions and are imperfect substitutes for good sampling design. The fundamental lesson remains: the best defense against bias is a well-designed probability sampling plan executed with rigorous follow-up procedures to minimize nonresponse. Prevention is always preferable to correction.

SECTION 9

Practice Problems

PROBLEM 1 — CONCEPTUAL

A researcher mails a survey to 1,000 randomly selected households to estimate the average number of hours residents spend exercising per week. Only 230 surveys are returned. The researcher reports a 95% confidence interval based on these 230 responses. Which of the following is the most serious concern about this study?

PROBLEM 2 — BASIC CALCULATION

A school district wants to survey parents about satisfaction with the new cafeteria menu. They post the survey on the school's social media page and ask interested parents to complete it. Of the 500 parents who respond, 78% express dissatisfaction. Which type of bias is MOST clearly present?

PROBLEM 3 — INTERMEDIATE

A political polling firm conducts a phone survey by randomly dialing landline numbers. In a region where 40% of adults have only cell phones and no landline, the poll estimates that 55% of adults support a particular candidate. If cell-phone-only adults support the candidate at a rate of 65%, in which direction is the poll likely biased?

PROBLEM 4 — APPLIED

A university wants to estimate the proportion of its 20,000 students who support extending library hours to midnight. The administration sends an email survey to all students, and 1,200 students respond. Of those, 84% favor the extension. (a) Identify a type of bias that is likely present in this survey and explain how it arises. (1 point) (b) Explain the likely direction of the bias—that is, whether the 84% is likely an overestimate or underestimate of the true proportion. Justify your answer. (1 point) (c) If the administration had instead selected 400 students using a simple random sample and obtained an 80% response rate, would this method eliminate all potential for bias? Explain. (1 point) (d) Describe one additional modification to the survey design in part (c) that could further reduce bias. Explain why it would help. (1 point)

PROBLEM 5 — CRITICAL THINKING

A national news network wants to estimate public opinion on a proposed environmental regulation. They use two methods simultaneously: Method A: A random digit dialing telephone survey of 1,500 adults, achieving a 35% response rate (525 respondents). Method B: An online poll posted on the network's website, generating 12,000 responses. Method A finds 52% support. Method B finds 71% support. (a) For each method, identify the most serious type of bias and explain the mechanism by which it arises. (2 points) (b) A commentator argues that Method B is more reliable because it has a much larger sample. Explain why this reasoning is flawed, referencing the distinction between sampling error and bias. (1 point) (c) Suppose the network wants to produce the most accurate estimate possible using one of these methods. Which method would you recommend, and what specific steps could be taken to improve it? Justify your recommendation. (1 point)

SUMMARY

Lesson Summary

Sampling problems fall into two broad categories: sampling error, which is the natural random variability inherent in any sample and decreases with larger sample sizes, and non-sampling error (bias), which is systematic and cannot be reduced by increasing the sample size. The major types of bias include undercoverage (parts of the population are excluded from the sampling frame), voluntary response bias (individuals self-select into the sample, overrepresenting those with strong opinions), nonresponse bias (selected individuals fail to participate and differ from those who do), and response bias (respondents give inaccurate answers due to leading questions, social desirability, or interviewer effects).

For the AP Statistics exam, always remember the three-part framework for addressing bias: (1) name the type of bias, (2) explain the mechanism by which it arises, and (3) state the direction (overestimate or underestimate). The gold standard remedy is probability sampling—simple random samples, stratified random samples, and cluster samples—combined with vigorous follow-up to minimize nonresponse. Remember that the margin of error reported with a confidence interval quantifies only sampling variability, not bias; a biased sample can produce a confidence interval that entirely misses the true parameter.

Opening subject page...

Loading your content

AP STATISTICS • COLLECTING DATA

Potential Problems with Sampling

Understanding how bias, undercoverage, and nonresponse threaten the validity of statistical conclusions drawn from sample data.

SECTION 1

Historical Context & Motivation

1936

The Literary Digest Polling Disaster

1948

Dewey Defeats Truman

Pollsters using quota sampling predicted Dewey's victory, but the non-random nature of the quotas introduced systematic bias, prompting the profession to adopt probability sampling methods.

1970s

Rise of Telephone Surveys

Random digit dialing becomes standard practice, but researchers soon discover that households without telephones—often lower-income—are systematically excluded, creating undercoverage.

2016

Modern Polling Challenges

Widespread election polling errors highlight persistent problems with nonresponse bias and the increasing difficulty of reaching representative samples in the digital age.

SECTION 2

Core Principles & Definitions

Bias

A systematic tendency to overestimate or underestimate a population parameter. Unlike random error, bias pushes results consistently in one direction and cannot be reduced by increasing sample size.

Undercoverage

Nonresponse Bias

Voluntary Response Bias

When individuals self-select into the sample, those with strong opinions (often negative) are overrepresented. Online reviews and call-in polls are classic examples.

Response Bias

Results from factors that cause respondents to give inaccurate answers—leading questions, social desirability, poor recall, or interviewer influence all fall under this umbrella.

✦ KEY TAKEAWAY

SECTION 3

Visualizing Sampling Problems

SECTION 4

How Bias Distorts Estimation

MEAN SQUARED ERROR DECOMPOSITION

MSE(θ̂) = [Bias(θ̂)]² + Var(θ̂)

MARGIN OF ERROR (FOR PROPORTIONS)

ME = z* × √(p̂(1 − p̂) / n)

The margin of error quantifies only random sampling variability. It does NOT account for bias. A survey with a ±3% margin of error can still be off by 15% or more if systematic bias is present.

📝 AP EXAM TIP

SECTION 5

Detailed Classification of Sampling Problems

Summary of the five major types of bias tested on the AP Statistics exam

Type of Bias	When It Occurs	Classic Example	Likely Direction
Undercoverage	The sampling frame omits part of the population	A telephone survey that excludes cell-phone-only households, underrepresenting younger adults	Depends on how the excluded group differs from those included
Voluntary Response	Individuals self-select into the sample	An online restaurant review site where only customers with extreme experiences bother to post	Overrepresents strong (often negative) opinions
Nonresponse	Selected individuals refuse or fail to respond	A mailed health survey where sicker individuals are too ill to complete and return it	Depends on how nonrespondents differ from respondents
Response (Wording)	Question wording pushes answers in a particular direction	"Don't you agree that taxes are too high?" leads respondents toward agreement	Toward the implied "correct" answer
Response (Social Desirability)	Respondents give socially acceptable rather than truthful answers	A face-to-face survey about drug use where respondents underreport illegal behavior	Overestimates "good" behaviors, underestimates "bad" ones

SECTION 6

Worked Example: Identifying and Explaining Bias

Diagnosing Bias in a School Bullying Survey

Step 1 — Identify the Sampling Method

The principal placed a box in the office and allowed students to voluntarily deposit slips. No random selection mechanism was used. This is a voluntary response sample, not a probability sample.

Sampling method: Voluntary response

Step 2 — Identify Voluntary Response Bias

Direction: Overestimates the true proportion of bullied students

Step 3 — Identify Potential Undercoverage

Additional bias: Undercoverage of students who do not visit the main office

Step 4 — Identify Potential Response Bias

Possible response bias: Underreporting due to fear of identification

Step 5 — State the Conclusion

72% likely overestimates the true bullying rate; no valid generalization is possible

SECTION 7

Remedies and Mitigation Strategies

Remedies for common sources of bias in sampling

Problem	Primary Remedy	Why It Works
Undercoverage	Use a comprehensive, up-to-date sampling frame; consider stratification to ensure all subgroups are represented	A complete frame ensures every member of the population has a known, nonzero probability of selection
Voluntary Response	Replace with a probability sampling method (SRS, stratified, cluster, or systematic)	Probability sampling removes self-selection, giving the researcher control over who is included
Nonresponse	Follow up persistently with nonrespondents; offer incentives; keep surveys short and convenient	Higher response rates reduce the gap between respondents and nonrespondents, minimizing potential bias
Response Bias (Wording)	Use neutral, balanced question wording; pilot-test the instrument; avoid leading or loaded language	Neutral wording reduces the likelihood that the question itself pushes respondents toward a particular answer
Response Bias (Social Desirability)	Guarantee anonymity; use self-administered questionnaires rather than face-to-face interviews for sensitive topics	Anonymity removes the social pressure that causes respondents to shade their answers toward what is deemed acceptable

✦ KEY TAKEAWAY

SECTION 8

Connection to Inference and Advanced Topics

How sampling problems propagate into later inferential procedures

Sampling Problem	Effect on Inference
Undercoverage	The confidence interval or test result applies only to the covered population, not the target population. Generalization is invalid.
Voluntary Response	The randomness condition is violated. No confidence interval or p-value is meaningful because the sample does not represent any well-defined population.
Nonresponse Bias	The effective sample is no longer random even if the original selection was. The margin of error understates the true uncertainty.
Response Bias	The data values themselves are distorted. Even a perfectly random sample produces biased estimates if the measurements are systematically off.

SECTION 9

Practice Problems

PROBLEM 1 — CONCEPTUAL

PROBLEM 2 — BASIC CALCULATION

PROBLEM 3 — INTERMEDIATE

PROBLEM 4 — APPLIED

PROBLEM 5 — CRITICAL THINKING

SUMMARY