Computer vision interprets images and video: classification, detection (boxes around objects), segmentation (pixel labels), and OCR. Mobile and edge deployment raise latency and privacy considerations.
Task types
| Task | Output |
|---|---|
| Classification | Whole image label (cat vs dog) |
| Detection | Bounding boxes + labels |
| Segmentation | Pixel-level masks |
| OCR | Text in scene → string |
Product examples
- Document scanning and KYC verification
- Manufacturing defect detection
- Retail shelf analytics
- Accessibility: scene description for blind users
Data and bias
Models trained on limited geographies or lighting fail in the field. Collect diverse capture conditions; monitor per-site error rates.
Important interview questions and answers
- Q: Detection vs classification?
A: Classification labels whole image; detection localizes multiple objects. - Q: Edge vision?
A: On-device inference reduces upload of raw video—helps privacy.
Self-check
- Name two vision task types and outputs.
- Why diverse training images matter?
Tip: Collect training images across lighting, devices, and geographies you will serve.
Interview prep
- Detection vs classification?
- Classification labels whole image; detection outputs boxes per object.
- Diverse training data?
- Reduces failure under new lighting, devices, and geographies.