Pandas sits between raw NumPy arrays and higher-level ML/visualization stacks. Most Python data workflows pass through a DataFrame at some stage.
Upstream and downstream
- NumPy — numeric columns stored as ndarrays;
df.to_numpy()for ML - Matplotlib / Seaborn — plot directly from Series and DataFrame columns
- scikit-learn — accepts DataFrames; pipelines use column names
- SciPy — stats and optimization on array exports
- SQL — databases feed DataFrames via connectors
File formats
read_csv/to_csv— universal interchangeread_parquet/to_parquet— columnar, typed, compressedread_json,read_excel— common in business data
Version check
import pandas as pd
import numpy as np
print('Pandas:', pd.__version__)
print('NumPy:', np.__version__)
df = pd.DataFrame({'x': np.arange(3)})
print(df['x'].values) # underlying ndarray
Important interview questions and answers
- Q: NumPy relationship?
A: Numeric columns are backed by ndarrays; Pandas adds labels, alignment, and IO. - Q: Why learn SQL alongside Pandas?
A: Production data often lives in databases—SQL extracts, Pandas transforms.
Self-check
- Name three libraries that integrate with Pandas.
- What method exposes a column as a NumPy array?
Tip: Numeric columns live on NumPy—check .values when debugging dtypes.
Interview prep
- NumPy relation?
Numeric columns backed by ndarrays; to_numpy() exports for ML and SciPy.
- read_sql?
Bridges database queries into DataFrames—SQL extract, Pandas transform.