Guardrails validate inputs and outputs—toxicity classifiers, regex for secrets, JSON schema enforcement, allow-listed domains for tools.
Where to run
- Pre-call — block prompt before spend
- Post-call — strip or replace unsafe completion
- Streaming — cut off mid-generation when triggered
Vendor moderation
Many APIs return category scores (hate, violence, sexual). Tune thresholds per product—kids app vs developer docs.
Custom rules
Regex credit cards, block internal hostnames in tool args, require JSON keys for automated workflows.
Important interview questions and answers
- Q: Pre vs post moderation?
A: Pre saves cost and blocks attacks early; post catches model-generated harm.
Self-check
- Name three guardrail placement points.
- Why tune thresholds per product?
Tip: Pre-moderation saves cost; post-moderation catches model-generated toxicity.
Interview prep
- Pre vs post?
Pre blocks spend and attacks; post catches toxic completions.
- Custom rules?
Regex secrets, schema validation, tool allow lists.