Pandas and NumPy interoperate constantly: columns store ndarrays, arithmetic uses broadcasting, and to_numpy() exports matrices for custom kernels or sklearn.
NumPy under the hood
import pandas as pd
import numpy as np
df = pd.DataFrame({'a': [1, 2, 3]})
arr = df['a'].to_numpy()
print(type(arr), arr.dtype, arr.shape)
Crossing the boundary
df.to_numpy()— full 2D array (may copy if mixed dtypes)df['col'].values— legacy alias; prefer to_numpy()pd.DataFrame(arr, columns=[...])— ndarray → labeled table- NumPy ufuncs on Series:
np.sqrt(df['x'])
Alignment caveat
NumPy ops on raw arrays ignore index labels. Pandas Series ops align on index—can introduce NaN where labels mismatch. Use .values or to_numpy() when you want pure positional NumPy behavior.
Important interview questions and answers
- Q: to_numpy vs values?
A: to_numpy is explicit modern API; values is legacy attribute on Series. - Q: Mixed dtype DataFrame?
A: to_numpy() may upcast to object—select numeric columns first for ML.
Self-check
- Export a numeric column to ndarray.
- Build a DataFrame from a 2D NumPy array.
Tip: Prefer to_numpy() over legacy .values for explicit exports.
Interview prep
- to_numpy?
Explicit export to ndarray for sklearn and custom ufuncs.
- Alignment caveat?
Pandas ops align indexes; raw NumPy ignores labels.