Fine-tuning updates model weights on your labeled examples—use when prompts and RAG cannot hold format, tone, or domain jargon stable at scale.
LoRA / adapters
LoRA trains small adapter matrices instead of full weights—cheaper GPU jobs, easier to swap adapters per customer in some setups.
When to fine-tune
- Thousands of consistent label pairs
- Strict output schema the model resists in few-shot
- Proprietary style that must not leak via long prompts
Try prompt + RAG + tools first—fine-tuning adds MLOps debt.
Risks
Catastrophic forgetting, eval leakage, and compliance if training data includes PII. Version datasets like production code.
Important interview questions and answers
- Q: LoRA benefit?
A: Fewer trainable parameters—faster iteration than full fine-tunes.
Self-check
- When prefer fine-tuning over prompts?
- One fine-tuning risk?
Tip: Prove prompt+RAG insufficient on a golden set before committing to fine-tune ops.
Interview prep
- LoRA?
Low-rank adapters train small matrices—cheaper than full weight updates.
- Try first?
Prompts, RAG, and tools before fine-tuning ops debt.