Export clean numeric columns from Pandas with to_numpy(), run scipy.stats tests or transforms, and attach results back as new columns—standard notebook and pipeline pattern.
Handoff checklist
- Drop or impute NaNs in Pandas first
- Confirm numeric dtype (
select_dtypes) arr = df['col'].to_numpy()- Call SciPy; store scalar or array results in DataFrame
Group-wise tests
Loop groups with groupby or use vectorized ops when possible. Document sample size per group—small n makes p-values unreliable.
Example pattern
import numpy as np
import pandas as pd
from scipy import stats
df = pd.DataFrame({'group': ['A','A','B','B'], 'value': [1.0, 1.2, 2.5, 2.7]})
a = df.loc[df['group']=='A', 'value'].to_numpy()
b = df.loc[df['group']=='B', 'value'].to_numpy()
print(stats.ttest_ind(a, b))
Important interview questions and answers
- Q: Why clean in Pandas first?
A: SciPy functions may not handle NaN—propagate errors or wrong statistics. - Q: to_numpy vs values?
A: Prefer to_numpy()—explicit, handles extension dtypes better than legacy .values.
Self-check
- List four steps in the Pandas→SciPy handoff.
- How extract two groups for ttest_ind?
Tip: Drop or impute NaNs in Pandas before ttest_ind—SciPy may not handle NaN gracefully.
Interview prep
- NaN policy?
Handle missing in Pandas before exporting to SciPy.
- Group tests?
groupby → to_numpy per arm for ttest_ind.