Summarize data with mean(), median(), sd(), quantile(), and summary()—foundation before inference.
summary() and friends
x <- c(10, 12, 14, 18, 23)
print(summary(x))
print(sd(x))
Grouped summaries (base)
df <- data.frame(group = c("A", "A", "B", "B"), val = c(10, 12, 20, 22))
print(aggregate(val ~ group, data = df, FUN = mean))
dplyr group_by() + summarise() is the tidyverse equivalent locally.
Important interview questions and answers
- Q: Mean vs median?
A: Mean sensitive to outliers; median robust for skewed distributions. - Q: sample sd?
A: R's sd() uses n-1 denominator (sample standard deviation).
Self-check
- What function returns five-number summary?
- What does aggregate() do?
Tip: Report median with skewed data—mean alone misleads stakeholders.
Interview prep
- Mean vs median?
Median resists outliers; mean reflects the arithmetic average.