The scipy.stats module provides probability distributions, descriptive statistics, and hypothesis tests on NumPy arrays—essential for data science after Pandas cleaning.
What scipy.stats offers
- rv_continuous / rv_discrete — distribution objects with pdf, cdf, ppf, rvs
- describe, tstd, skew, kurtosis — descriptive summaries
- ttest_ind, mannwhitneyu, chi2_contingency — hypothesis tests
- pearsonr, spearmanr — correlation coefficients
Distribution object pattern
import numpy as np
from scipy import stats
dist = stats.norm(loc=0, scale=1)
print('pdf(0):', dist.pdf(0))
print('cdf(1.96):', dist.cdf(1.96))
print('sample:', dist.rvs(size=5, random_state=0))
Arrays in, results out
Pass 1D NumPy arrays of observations. Most tests return a result object with statistic and pvalue attributes you log or report in notebooks.
Important interview questions and answers
- Q: stats vs np.mean?
A: NumPy aggregates; stats adds distributions, tests, and standardized inference helpers. - Q: rvs meaning?
A: Random variates—draw samples from the distribution.
Self-check
- What four capabilities does scipy.stats provide?
- Name three methods on a distribution object.
Tip: Distribution objects share pdf/cdf/ppf/rvs—learn one family, know them all.
Interview prep
- Distribution object?
pdf/cdf/ppf/rvs for simulation and inference.
- Test output?
Result with statistic and pvalue—log both in production.