Computing Bias
Help Questions
AP Computer Science Principles › Computing Bias
Which of the following is an example of bias being embedded in the design of an algorithm itself, rather than in the data used to train it?
A résumé-screening tool favors male candidates because it learned from a company's historical hiring records, which were heavily male.
A voice recognition system struggles to understand speakers with a certain accent because they were not included in the audio training files.
A facial recognition system is less accurate for women because it was trained mostly on pictures of men.
A college admissions algorithm is programmed to give extra weight to applicants who play a sport that is predominantly played by a single demographic group.
Explanation
Correct. In this case, the programmer has made an explicit choice to prioritize a specific factor (a particular sport). This is a feature of the algorithm's logic or design. If that factor correlates with a demographic group, the algorithm itself is biased, regardless of the data it processes.
- A, C, and D are all examples of data bias, where the algorithm's biased output is a result of being trained on unrepresentative or historically biased data.
What is the most likely consequence of using this system in a public setting with a diverse population?
The system will be highly accurate for everyone because the training dataset was very large.
The system will function correctly only if users have high-resolution cameras, regardless of their demographic group.
The system will be less accurate in identifying individuals from demographic groups that were underrepresented in the training data.
The system will pose a significant risk to data privacy because it stores images without user consent.
Explanation
Correct. The training data lacks diversity, which is a primary cause of data bias. The system will likely perform poorly for groups not well-represented in the training set, as it has not learned to identify their features effectively.
- A is incorrect because the size of a dataset does not guarantee fairness or prevent bias; the representativeness of the data is critical.
- C is incorrect because while camera quality can affect performance, the core issue of bias stems from the unrepresentative training data.
- D is incorrect because while data privacy is a valid concern for facial recognition, the question asks about the consequence of the biased training method, which relates to accuracy and fairness.
What is the most likely consequence if this program is used on a broader patient population?
The program will run much more slowly when analyzing images of patients from different backgrounds.
The program will be highly accurate for all patients because it learned the fundamental features of the disease.
The program's diagnostic accuracy will likely be lower for patients from ethnic backgrounds not included in the training data.
The program will be unable to store images from patients of other ethnic backgrounds due to privacy restrictions.
Explanation
Correct. The appearance of skin conditions can vary significantly across different skin tones. By training on data from only one group, the program has not learned to recognize these variations. This data bias will likely lead to reduced accuracy and potentially harmful misdiagnoses for underrepresented groups.
- A is incorrect. Privacy restrictions are a legal and data management issue, not a direct consequence of the biased training method.
- B is incorrect. The 'fundamental features' of a visual disease are often tied to how they appear on different skin types, so the model's learning is incomplete and biased.
- C is incorrect. The program's speed of execution is generally not affected by the content of the image in this way; the issue is with the accuracy of its analysis.
What is the primary source of the algorithm's bias?
The bank's computer system lacks the necessary security features to protect applicant data.
The algorithm is too complex, making it difficult for bank employees to understand its decisions.
The historical data used to train the algorithm contained and reflected past societal biases.
Applicants from the specified geographic area are making more errors on their loan applications.
Explanation
Correct. The algorithm learned patterns from data that was generated during a time of discriminatory practices. It is therefore perpetuating this historical bias by treating new applications in a similar way, even if the discriminatory policies are no longer in place. The data itself is the source of the bias.
- A is incorrect. Complexity and transparency are issues, but the root cause of the bias is the data.
- C is incorrect. This response blames the victims of the bias rather than identifying the flaw in the system.
- D is incorrect. Security is an important but separate concern from algorithmic fairness.
What is the most plausible technical explanation for this bias?
The computer hardware running the translation lacks the processing power to consider gender-neutral options.
The program has a syntax error that causes it to default to male pronouns for certain professions.
The user interface of the tool makes it difficult for users to select a preferred gender for the output.
The algorithm was trained on a massive text dataset from the real world, which contains statistical biases about gender and professions.
Explanation
Correct. Language models learn from the patterns in the text they are trained on. If the training data (e.g., books, articles, websites) more frequently associates "doctor" with male pronouns and "nurse" with female pronouns, the model will learn and replicate this statistical correlation. This is a classic example of a computing innovation reflecting biases present in its training data.
- A is incorrect. The bias is in the core translation logic, not the user interface.
- B is incorrect. This is not a syntax error, but a logical outcome based on the model's training.
- C is incorrect. This is a software and data issue, not a hardware limitation.
Which of the following describes the most significant ethical concern related to computing bias in this scenario?
The program's predictions may not be 100% accurate, potentially causing police to be sent to areas where no crime occurs.
The program's user interface could be difficult for police officers to use, leading to misinterpretation of the crime predictions.
The historical arrest data may reflect biases in policing, leading the program to unfairly target certain neighborhoods and create a feedback loop.
The program might violate the privacy of citizens by collecting data about their locations without their consent.
Explanation
Correct. This highlights a critical societal impact of computing bias. If historical data reflects that certain neighborhoods were policed more heavily (regardless of actual crime rates), the algorithm will learn to identify these areas as high-risk. This can lead to increased police presence, more arrests, and a reinforcement of the original bias in future data.
- A is incorrect. Data privacy is a valid ethical concern, but the question is specifically about bias in the program's function.
- B is incorrect. Inaccuracy is a general problem for any predictive model, but the key ethical concern here is the systematic and unfair nature of the inaccuracy due to bias.
- D is incorrect. User interface design is a usability issue, not the core ethical problem of systemic bias.
Which of the following provides the best definition of algorithmic bias?
Mistakes in the syntax of a programming language that prevent a program from running.
Random errors in a program's code that cause it to produce unpredictable results for all users.
Systematic and repeatable errors in a computer system that create unfair outcomes for certain groups of people.
Security flaws in an algorithm that allow unauthorized users to gain access to sensitive information.
Explanation
Correct. This definition captures the key elements of algorithmic bias: it is systematic (not random), repeatable, and results in unfairness or prejudice against specific subgroups.
- A describes a buggy or unreliable program, not a biased one.
- B describes a syntax error, which is a coding mistake that a compiler or interpreter would catch.
- D describes a security vulnerability, which is distinct from the concept of fairness in a system's outcomes.
How could this autocomplete feature perpetuate harmful stereotypes?
The feature requires a fast internet connection to work properly, which may disadvantage some users.
The feature may be inaccurate and suggest completions that do not match what the user intended to type.
If many users have searched for stereotypical or biased phrases, the algorithm will learn to suggest those phrases to new users.
The feature stores a history of user searches, which raises concerns about data privacy and surveillance.
Explanation
Correct. The system learns from user behavior. If past user queries reflect societal biases (e.g., searching "CEOs are..." and getting completions that are gender-biased), the autocomplete feature will present these biases to new users as neutral suggestions, thereby amplifying and reinforcing the stereotypes.
- B is incorrect. Inaccuracy is a performance issue, but perpetuating stereotypes is a specific problem of bias.
- C is incorrect. Data privacy is a valid but separate concern from the content of the suggestions.
- D is incorrect. This relates to the digital divide, not the biased nature of the algorithm's output.
A machine learning model exhibits bias against a certain demographic group. A developer decides to address this by collecting a much larger dataset. Which of the following best explains why this action might NOT eliminate the bias?
A larger dataset always makes an algorithm less accurate before it becomes more accurate.
Collecting more data is expensive and may exceed the project's budget without solving the issue.
If the method of data collection is still flawed, the new, larger dataset will likely contain the same underlying bias.
A larger dataset requires more storage space, which can make the computing innovation less scalable.
Explanation
Correct. According to EK DAT-2.C.5, bias is not eliminated by simply collecting more data. If the source or method of collection is biased, a larger sample will just amplify that original bias. For bias to be reduced, the data must be more representative, not just more plentiful.
- A is incorrect. This is not a generally true statement about machine learning; more data typically improves accuracy if the data is of good quality.
- B is incorrect. While budget is a practical concern, it doesn't explain why the approach is fundamentally flawed from a technical perspective.
- C is incorrect. Storage and scalability are system design concerns, not the reason why the bias itself would persist.
Introduction: A school district considers using facial recognition to identify visitors at building entrances. Examples of Bias: Computing bias means a system produces unfair results for some people, often because it reflects human choices or unequal social conditions. In facial recognition, bias appears when the tool misidentifies people with darker skin tones or women more often than others, especially if the training photos include fewer examples from those groups. How Bias Emerges: Bias can emerge when a data set overrepresents one group, when labels include stereotypes, or when designers test the tool mostly on one population. Impacts: Misidentification can lead to students or parents being wrongly questioned, denied entry, or reported to security, which can increase stress and distrust. Mitigation Strategies: The district can test accuracy across demographic groups, use more representative data, add human review before action, and set clear limits on when the tool is used.
Based on the text, which strategy is used to mitigate computing bias?
Rely on a single vendor’s accuracy claims
Use facial recognition for every school interaction
Remove all human judgment from decisions
Test performance across different demographic groups
Explanation
This question tests AP Computer Science Principles skills: understanding computing bias and its societal impact. Computing bias occurs when algorithms or data sets favor certain outcomes, often reflecting societal inequalities. In the passage, the impact of bias in facial recognition at school entrances is highlighted, showing how it can misidentify people with darker skin tones or women more often due to underrepresentation in training data. Choice C is correct because it accurately reflects the passage's explanation of testing performance across different demographic groups as a specific mitigation strategy. Choice A is incorrect because relying on vendor claims without independent verification would not address bias issues. To help students: Encourage careful reading to identify specific mitigation strategies mentioned in passages, and discuss how testing across groups helps reveal hidden biases. Watch for: Options that sound plausible but aren't actually mentioned as mitigation strategies in the given text.