A factor stores categorical data with predefined levels—critical for statistical models and ggplot2 aesthetics (local). Base R uses factors in modeling formulas.
Creating factors
grades <- factor(c("B", "A", "C", "A"), levels = c("A", "B", "C"))
print(grades)
print(table(grades))
Why factors matter
- Models treat categories correctly in
lm(y ~ group) - Control ordering of bars in charts
- Avoid treating codes as numeric accidentally
In tidyverse workflows, forcats (local) refines factor levels—playground uses base factor().
Important interview questions and answers
- Q: Factor vs character?
A: Factors store levels with an integer backing; characters are plain strings without level metadata. - Q: When convert to factor?
A: Before modeling or when plot order should follow category order, not alphabet.
Self-check
- What function creates a factor?
- Why specify levels explicitly?
Tip: Set factor levels explicitly to control plot and table order—not alphabetical surprises.
Interview prep
- Factor vs character?
Factors store level metadata for modeling and ordered legends; characters are plain strings.