Garbage in, garbage out: track completeness, validity, consistency, timeliness, and uniqueness before modeling.
Quality dimensions
| Dimension | Question |
|---|---|
| Completeness | Are fields missing? |
| Validity | Are values allowed (age > 0)? |
| Consistency | Do two tables agree on counts? |
| Timeliness | Is data fresh enough? |
| Uniqueness | Duplicate keys? |
Data dictionary
Maintain column definitions, units, and known bugs—onboarding analysts depend on it.
Important interview questions and answers
- Q: Completeness?
A: Fraction of non-missing values per field. - Q: Duplicate user rows?
A: Uniqueness failure—inflates counts.
Self-check
- Name three quality dimensions.
- Why a data dictionary?
Tip: Start every project with a data dictionary row count check.
Interview prep
- Completeness?
Fraction of non-missing values.
- Duplicates?
Uniqueness failure—inflates counts.