A repeatable Pandas workflow: load → inspect (head, info, describe) → clean (dtypes, missing) → transform → aggregate → export or hand off to ML.
Inspect first
df.shape— rows × columnsdf.head()/df.tail()— sample rowsdf.info()— dtypes and non-null countsdf.describe()— numeric summary statsdf.isna().sum()— missing value counts per column
Clean before analyze
Fix dtypes (strings that should be numbers), handle missing values explicitly, and deduplicate before joins. Silent dtype bugs cause wrong aggregates.
Reproducible patterns
import pandas as pd
import numpy as np
df = pd.DataFrame({'a': [1, 2, np.nan], 'b': ['x', 'y', 'z']})
print(df.info())
print(df.describe())
Next steps in this track
Modules 02–05 cover basics through advanced reshaping. Module 06 previews NumPy, Matplotlib, sklearn, and SciPy integration; module 07 prepares interviews and production habits before SciPy and deeper SQL.
Important interview questions and answers
- Q: Why info() before groupby?
A: Reveals object vs numeric dtypes and missing counts—prevents silent aggregation errors. - Q: describe() limits?
A: Summarizes numeric columns by default; categoricals need value_counts().
Self-check
- List four inspect methods for any new DataFrame.
- What is the recommended first step after loading data?
Challenge
Inspect a new DataFrame
- Run the workflow lesson code.
- Add
df.info()output mentally—note dtypes and null counts.
Done when: you can describe shape, dtypes, and missing data before transforming.
Interview prep
- Inspect first?
head, info, describe, isna().sum() before heavy transforms.
- Why dtypes?
Wrong dtypes cause silent math errors—strings that look like numbers fail aggregation.