Pandas represents missing data with NaN (float columns) or pd.NA (nullable dtypes). Detect with isna(); handle with drop, fill, or forward-fill strategies.
Detection
import pandas as pd
import numpy as np
df = pd.DataFrame({'a': [1, np.nan, 3], 'b': ['x', None, 'z']})
print(df.isna())
print(df.isna().sum())
Handling strategies
df.dropna()— remove rows/cols with any NaNdf.fillna(0)ordf.fillna({'col': median})df.interpolate()— fill numeric gaps linearlydf.ffill()/df.bfill()— propagate last/next valid
Best practice
Document your missing-data policy. Dropping all NaN rows can bias results; imputing with mean/median/mode depends on domain. Never compare with == np.nan—use isna().
Important interview questions and answers
- Q: None vs NaN?
A: None is Python object; NaN is float missing marker—both detected by isna() in most cases. - Q: dropna subset?
A: Pass column list to only require non-null in key columns.
Self-check
- Count missing values per column.
- Fill numeric NaN with column median.
Tip: Never test x == np.nan—always df.isna().
Interview prep
- isna?
Correct detection—never compare with == np.nan.
- Imputation policy?
Document drop vs fill vs median—domain decision, not one-size-fits-all.