Deep learning uses neural networks with many layers and large datasets—often on GPUs. It powers modern vision, speech, and language systems, including foundation models behind generative AI.
What changed in the 2010s
- ImageNet-scale labeled images
- GPU training made large batches feasible
- Architectures: CNNs for images, Transformers for sequences
- Transfer learning—fine-tune pretrained weights
CNN vs Transformer (high level)
| Architecture | Strength |
|---|---|
| CNN | Local patterns in images/video |
| RNN/LSTM (legacy) | Sequential data before Transformers dominated |
| Transformer | Parallel attention over tokens—LLMs, vision transformers |
Practical takeaway
Most product teams use pretrained models (API or open weights) rather than training billion-parameter models in-house. Literacy means knowing when fine-tuning, RAG, or prompting suffices.
Important interview questions and answers
- Q: Transfer learning?
A: Start from weights trained on large dataset; adapt last layers to your task with less data. - Q: Transformer key idea?
A: Self-attention lets each token weigh relevance of other tokens in context.
Self-check
- Name one architecture for images and one for language.
- Why do teams use pretrained models?
Tip: Default to pretrained models; training huge nets in-house is rarely step one.
Interview prep
- Transfer learning?
- Fine-tune pretrained weights instead of training huge models from scratch.
- Transformer strength?
- Self-attention over tokens—foundation of modern LLMs.