Pandas stores numeric columns as NumPy ndarrays under the hood. Understanding ndarray semantics explains Pandas dtype, NA, and vectorization behavior.
DataFrame to ndarray
Locally: df.values (legacy) or df.to_numpy() returns 2D array. Mixed dtypes may yield object array—select numeric columns first.
Column as Series.values
import numpy as np
# Simulating pandas column backing
col = np.array([10.0, 20.0, 30.0, 40.0])
print('Series.values would be:', col)
print('mean:', col.mean())
Interop patterns
- Apply NumPy ufuncs to Series: aligns index in Pandas
np.wherewith Series conditions- After wrangling in Pandas, pass
X.to_numpy()to sklearn
Important interview questions and answers
- Q: Why to_numpy() over values?
A: Explicit API; handles extension dtypes and copy control. - Q: Pandas NA vs NaN?
A: Nullable integer/string use pd.NA; float columns use np.nan.
Self-check
- What method exports a DataFrame to ndarray?
- Where do Pandas numeric columns live in memory?
Tip: After wrangling, export with to_numpy() for sklearn.
Interview prep
- to_numpy?
Export DataFrame/Series to ndarray for sklearn.
- Column storage?
Numeric Pandas columns backed by ndarrays.