Data types drive which summaries and models are valid: numeric (continuous vs discrete), categorical (nominal vs ordinal), text, datetime, and boolean.
Numeric
- Continuous — height, revenue (meaningful fractions)
- Discrete — page views, item counts (integers)
Do not treat IDs as continuous numbers—user_id=10002 is not “twice” user_id=5001.
Categorical
- Nominal — country, product SKU (no natural order)
- Ordinal — survey Likert scales (ordered categories)
Datetime and text
Parse dates explicitly—timezone bugs break joins. Text needs tokenization or embeddings for modeling; start with counts and keywords in exploration.
Important interview questions and answers
- Q: Nominal vs ordinal?
A: Nominal has no order; ordinal ranks categories. - Q: Why not model user_id as numeric?
A: IDs are labels, not measurements.
Self-check
- Give one nominal and one ordinal example.
- Why are IDs not continuous?
Pitfall: Treating user IDs as numeric features.
Interview prep
- Nominal?
Categories without meaningful order.
- ID as numeric?
Invalid—IDs are labels.