Data science decisions affect people: credit, hiring, healthcare, policing. Fairness and ethics are not optional add-ons—they shape whether a model should ship at all.
Sources of harm
- Historical bias — past discrimination encoded in labels
- Representation bias — some groups under-sampled
- Measurement bias — proxies that correlate with protected attributes
- Deployment bias — model used outside intended context
Questions before launch
- Who benefits and who is harmed if wrong?
- Is there meaningful human review or appeal?
- Are we legally allowed to use these features?
- How will we monitor drift and disparate impact?
Fairness metrics (awareness)
Equalized odds, demographic parity, calibration by group—definitions conflict; choose with legal and policy stakeholders, not in isolation.
Documentation
Model cards and datasheets record intended use, limitations, and evaluation by segment—standard practice in responsible teams.
Important interview questions and answers
- Q: Proxy feature?
A: Column correlated with protected class (ZIP code) that can reintroduce discrimination. - Q: Why accuracy is insufficient?
A: Model can be accurate overall but harmful to minority groups—segment metrics required.
Self-check
- Name two sources of bias in training data.
- What is a proxy feature?
- List two pre-launch ethics questions.
Tip: Test metrics across demographic segments when applicable.
Interview prep
- Fairness?
Evaluate impact across groups; mitigate disparate harm.