Each column has a dtype controlling memory and valid operations. Cast explicitly with astype, pd.to_numeric, or pd.to_datetime—especially after CSV loads.
Common dtypes
int64,float64— numericobject— Python strings or mixed (often strings from CSV)bool— True/Falsecategory— low-cardinality strings (memory efficient)datetime64[ns]— timestampsInt64(capital I) — nullable integer with NaN support
Safe conversion
import pandas as pd
s = pd.Series(['1', '2', 'bad', '4'])
nums = pd.to_numeric(s, errors='coerce') # 'bad' → NaN
print(nums)
Downcasting
Use smallest dtype that fits (pd.to_numeric(..., downcast='integer')) on large datasets to save memory. Always validate after casting.
Important interview questions and answers
- Q: object dtype?
A: Often means strings from CSV—convert before math operations. - Q: errors='coerce'?
A: Invalid values become NaN instead of raising—good for messy real data.
Self-check
- Convert a string column to float with bad values coerced.
- What dtype supports nullable integers?
Pitfall: astype(int) on floats truncates—round or use errors='coerce' first.
Interview prep
- to_numeric?
Converts strings to numbers; errors='coerce' turns bad values to NaN.
- Int64?
Nullable integer extension dtype—supports NaN unlike int64.