NLP processes text and speech: classification, entity extraction, translation, summarization, and conversational agents. Large language models shifted many tasks to general prompting plus light fine-tuning.
Classic vs modern stack
- Classic — tokenization, bag-of-words, small classifiers
- Modern — transformer embeddings, LLMs, RAG over your docs
Deep prompting patterns: Generative AI track.
Pipeline stages
- Ingest and normalize text (encoding, language detect)
- Chunk for long documents
- Retrieve relevant context (search index)
- Model generates or classifies
- Post-filter for safety and citations
Token preview
# Tokens are subword pieces, not always whole words
sample = "unbelievable pricing"
tokens = sample.split() # simplified; real tokenizers differ
print("token count (demo):", len(tokens))Practice: Sketch product flows on paper or in a doc—optional Python illustrates API response shapes only.
Important interview questions and answers
- Q: RAG?
A: Retrieve documents from your knowledge base, then LLM answers grounded in them. - Q: Hallucination?
A: Fluent but false statements—mitigate with retrieval, citations, and human review.
Self-check
- What is RAG in one sentence?
- Name two NLP pipeline stages.
Tip: RAG + citations reduce hallucination risk versus raw prompting alone.
Interview prep
- RAG?
- Retrieve relevant documents then generate answers grounded in them.
- Hallucination?
- Fluent but incorrect output—mitigate with retrieval, citations, filters.