AI often needs personal data—behavior, biometrics, health signals. Privacy principles: collect minimally, purpose-limit, secure, retain briefly, and honor user rights (access, deletion) where applicable.
Key concepts
- PII / personal data — identifies or relates to a person
- Purpose limitation — use data only for stated reasons
- Data minimization — fewer fields, shorter retention
- Anonymization vs pseudonymization — re-identification risk remains in many "anonymized" sets
ML-specific risks
- Memorization—models regurgitate training snippets
- Membership inference—guess if someone was in training set
- Federated learning still needs threat modeling
Privacy-by-design habits
# Pseudonymize IDs in analytics tables
import hashlib
def pseudonym(user_id: str, salt: str) -> str:
return hashlib.sha256(f"{salt}:{user_id}".encode()).hexdigest()[:16]Practice: Reflect on ethics scenarios in writing—no code required. Optional snippets illustrate policy checks only.
Hashing alone is not sufficient if salt leaks or space is small—consult privacy engineers.
Important interview questions and answers
- Q: GDPR lawful basis?
A: Consent, contract, legal obligation, etc.—must match processing activity. - Q: Delete user request?
A: Remove from stores and stop using in future training where required; model unlearning is hard—plan retention up front.
Self-check
- Define data minimization.
- Name one ML-specific privacy risk.
Tip: Plan deletion and retention before training—model unlearning is hard.
Interview prep
- Data minimization?
- Collect and retain only fields needed for the stated purpose.
- Memorization risk?
- Models may regurgitate sensitive training snippets.