Expected Counts in Two-Way Tables
Help Questions
AP Statistics › Expected Counts in Two-Way Tables
Which of the following expressions correctly calculates the expected number of people who own a cat and have allergies, under the null hypothesis of no association?
$$\frac{(180)(200)}{600}$$
$$\frac{(150)(250)}{600}$$
$$\frac{(600)}{(150)(180)}$$
$$\frac{(150)(180)}{600}$$
Explanation
The formula for the expected count in a cell of a two-way table is (row total × column total) / grand total. Here, the row total for 'owning a cat' is 150, the column total for 'having allergies' is 180, and the grand total sample size is 600. The correct expression is $$\frac{(150)(180)}{600}$$.
What is the expected number of weekdays with Low customer traffic if traffic level is independent of the type of day?
$$40$$
$$20$$
$$30$$
$$50$$
Explanation
The expected count is (row total × column total) / grand total. The row total for weekdays is 80. The column total for Low traffic is 50. The grand total is 200. The expected count is $$(80 \times 50) / 200 = 4000 / 200 = 20$$. The value of 30 is the observed count, which is a common distractor.
In a chi-square test for independence, if the observed count for a particular cell is much larger than its corresponding expected count, what does this suggest?
This cell provides evidence in favor of the null hypothesis of independence.
This cell provides evidence against the null hypothesis of independence.
The sample size was not large enough to perform the test.
The calculation of the expected count must be incorrect.
Explanation
The chi-square test statistic measures the total discrepancy between observed and expected counts. A large difference between an observed and expected count for a cell increases the value of the chi-square statistic. A large test statistic provides evidence against the null hypothesis. Therefore, a cell where the observed count is very different from the expected count suggests the variables may not be independent.
For a chi-square test of independence, which of the following comparisons between expected counts is correct?
The expected count for (Level 1, Level X) is equal to the expected count for (Level 2, Level X).
The expected count for (Level 1, Level X) is less than the expected count for (Level 1, Level Y).
The expected count for (Level 1, Level Y) is greater than the expected count for (Level 2, Level Y).
The expected count for (Level 2, Level X) is greater than the expected count for (Level 1, Level X).
Explanation
First, calculate the four expected counts using the formula (row total × column total) / grand total. The expected count for (Level 1, Level X) is $$(80 \times 100) / 200 = 40$$. The expected count for (Level 1, Level Y) is $$(80 \times 100) / 200 = 40$$. The expected count for (Level 2, Level X) is $$(120 \times 100) / 200 = 60$$. The expected count for (Level 2, Level Y) is $$(120 \times 100) / 200 = 60$$. Comparing the values as per the choices, only C is correct because the expected count for (Level 2, Level X), which is 60, is greater than the expected count for (Level 1, Level X), which is 40.
In the context of a chi-square test for independence between two categorical variables, the calculation for expected counts is based on which assumption?
The two variables are dependent, which is the assumption of the alternative hypothesis.
The distribution of each variable is approximately normal within the population.
The two variables are independent, which is the assumption of the null hypothesis.
The sample size is large enough for the Central Limit Theorem to apply to the counts.
Explanation
Expected counts for a chi-square test of independence are calculated under the assumption that the null hypothesis is true. The null hypothesis for this test states that there is no association (i.e., the variables are independent) between the two categorical variables.
Under the null hypothesis that movie genre preference is independent of age group, how is the expected number of teens who prefer action movies calculated?
By multiplying the proportion of all moviegoers who are teens by the total number of action movies available.
By multiplying the total number of teens by the total number of people who prefer action movies, then dividing by the total number of moviegoers.
By averaging the number of moviegoers across all combinations of age group and genre.
By dividing the number of teens who were observed to prefer action movies by the total number of teens.
Explanation
The expected count for a cell under the null hypothesis of independence is calculated by the formula: (row total × column total) / grand total. In this context, this corresponds to (total number of teens × total number of people who prefer action movies) / total number of moviegoers.
A gym tracked 500 members by whether they attend group classes and whether they renewed their membership. Assuming independence, which expression calculates the expected count for the Group Classes & Renewed cell?
$\dfrac{350}{500}$
$\dfrac{(200)(350)}{500}$
$140$
$(200)(350)$
$\dfrac{200}{500}$
Explanation
To find expected counts assuming independence, we apply (row total × column total) ÷ grand total. For Group Classes & Renewed, we multiply members attending group classes (200) by members who renewed (350), then divide by all 500 members. Choice A shows this correctly: (200)(350)/500. Choice C gives 140, which is the calculated result but not the expression itself. Choices B and D show individual proportions, while E shows the product without division. The expected count formula helps us test whether the observed counts differ significantly from what independence would predict.
A researcher classified 120 plants by whether they received fertilizer and whether they bloomed. Under the assumption of independence, which expression calculates the expected count for the No Fertilizer & Bloomed cell?
$\dfrac{70}{120}$
$(50)(70)$
$\dfrac{50}{120}$
$30$
$\dfrac{(50)(70)}{120}$
Explanation
Expected counts in two-way tables use the formula (row total × column total) ÷ grand total. For No Fertilizer & Bloomed, we multiply plants without fertilizer (50) by plants that bloomed (70), then divide by all 120 plants. Choice A correctly shows (50)(70)/120. Choice C shows 30, which might be an observed count but isn't the expression. Choices B and D show marginal proportions that don't calculate expected counts, while E multiplies totals without dividing, yielding 3,500 instead of the reasonable expected count of about 29.
A clinic categorized 180 patients by whether they received a flu shot and whether they later reported flu symptoms. Under the assumption of independence, which expression calculates the expected count for the Shot & Symptoms cell?
$20$
$\dfrac{(120)(45)}{180}$
$(120)(45)$
$\dfrac{120}{180}$
$\dfrac{45}{180}$
Explanation
This problem requires calculating the expected count for a cell in a two-way table assuming independence between variables. The expected count formula is (row total × column total) ÷ grand total. For the Shot & Symptoms cell, we multiply the total who got shots (120) by the total with symptoms (45), then divide by all 180 patients. Choice A correctly represents this: (120)(45)/180. Choice D shows 20, which might be an observed count, while B and C show individual proportions. Choice E multiplies the totals but forgets the crucial step of dividing by the grand total.
A study categorized 90 commuters by whether they bike to work and whether their commute is under 5 miles or 5 miles and over. Assuming independence, which expression calculates the expected count for the Bike & Under 5 miles cell?
$18$
$\dfrac{50}{90}$
$\dfrac{(36)(50)}{90}$
$\dfrac{36}{90}$
$(36)(50)$
Explanation
Expected counts under independence use the formula $(row\ total \times column\ total) \div grand\ total$. For Bike & Under 5 miles, we multiply commuters who bike (36) by those with commutes under 5 miles (50), then divide by all 90 commuters. Choice B shows this correctly: $ (36)(50)/90 $. Choice D gives 18, which might be an observed count rather than the expression. Choices A and C show individual proportions, while E shows only the product. Understanding this formula is essential for testing whether categorical variables are associated or independent.