Most teams integrate via HTTP APIs or official Python SDKs—keep secrets in environment variables, wrap retries, and log token usage.
Minimal pattern (pseudocode)
import os
# from openai import OpenAI # example vendor SDK
client = None # OpenAI(api_key=os.environ["OPENAI_API_KEY"])
def answer(question: str, context: str) -> str:
messages = [
{"role": "system", "content": "Answer only from CONTEXT."},
{"role": "user", "content": f"CONTEXT:\n{context}\n\nQ: {question}"},
]
# resp = client.chat.completions.create(model="gpt-4o-mini", messages=messages)
# return resp.choices[0].message.content
return "[Enable SDK locally — keys not in browser]"
Engineering checklist
- Timeouts, exponential backoff, idempotency keys for writes
- Structured logging: model, tokens, latency, retrieval IDs
- Feature flags to disable Gen AI on incident
Local open models
Ollama, vLLM, or cloud GPUs host open weights—same RAG patterns, different ops overhead.
Important interview questions and answers
- Q: Where store API keys?
A: Environment variables or secrets manager—never frontend bundles.
Self-check
- Why log token counts?
- Name two resilience patterns (timeout/backoff).
Tip: Log model name, latency, and token counts per request for cost attribution.
Interview prep
- API keys where?
Environment variables or secrets manager—never frontend or git.
- Log tokens?
Attribute cost and debug runaway prompts.