Shipping Gen AI is a stack: base model → optional adapters → orchestration (prompt + tools + RAG) → evaluation → guardrails → UX.
Layers explained
- Base model — general capabilities from large pretraining
- Alignment / instruction tuning — follows user intent more safely
- Application layer — your prompts, retrieval, tools, policies
- Ops layer — logging, cost caps, A/B tests, incident response
Build vs buy
Most teams buy API access (OpenAI, Anthropic, Google, AWS Bedrock, Azure OpenAI) or host open weights (Llama, Mistral) on their GPU fleet. Training from scratch is rare except for large labs.
Compare: latency SLAs, data retention, fine-tuning support, regional compliance, and price per million tokens.
Open weights vs closed APIs
Open weights give control and on-prem options but you operate security patches and capacity. Closed APIs shift ops burden but add vendor lock-in and policy constraints.
Important interview questions and answers
- Q: What is instruction tuning?
A: Additional training so the model follows user/system messages—not just raw next-token completion.
Self-check
- Name four layers of the Gen AI stack.
- One reason teams choose APIs over training from scratch?
Pitfall: Choosing the largest model by default—cost and latency often favor smaller models + RAG.
Interview prep
- Stack layers?
Base model, alignment, application orchestration, ops/monitoring.
- Build vs buy?
Most products buy APIs or host open weights; pretraining from scratch is rare.