Choose embed when data is read together and bounded; reference when shared entities update often or fan-out is huge.
Decision checklist
- Read together in UI? → lean embed
- Unbounded children (comments)? → reference or bucket
- Many writers to same subdoc? → reference
- Need SQL-style reports? → consider SQL or $lookup pipelines
Duplication trade-off
Embedding duplicates customer name on every order for fast reads—accept staleness or sync on change.
Anti-pattern: unbounded arrays
// Avoid: comments[] growing forever on one post document
// Prefer: db.comments with postId indexPractice: Run on practice in mongosh.
Important interview questions and answers
- Q: Atomic update of embed?
A: Single-document updates are atomic—multi-doc needs transactions. - Q: $lookup cost?
A: Joins at query time—index foreign keys like customerId.
Self-check
- Example of unbounded array risk.
- What makes single-doc updates atomic?
Pitfall: Unbounded arrays inside one document—move to child collection.
Interview prep
- Reference when?
- Shared entities, unbounded children, frequent updates to child set.
- Atomicity?
- Single-document updates atomic; multi-doc needs transactions.