Pandas (Panel Data) provides high-performance, easy-to-use data structures for structured data. The two workhorses are Series (1D labeled array) and DataFrame (2D labeled table).
Core concepts
- Series — one column with an index (row labels)
- DataFrame — table of Series sharing the same index
- Index — row labels; can be integers, strings, or datetimes
- Alignment — operations match on labels, not just position
- Vectorization — column math delegates to NumPy under the hood
Typical use cases
- Exploratory data analysis (EDA) on CSV/Parquet/SQL results
- Cleaning, filtering, and aggregating business metrics
- Feature engineering before machine learning
- Time-series analysis (sales, sensors, logs)
Import convention
import pandas as pd
import numpy as np
s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
df = pd.DataFrame({'x': s, 'y': [1, 2, 3]})
print(s)
print(df)
Important interview questions and answers
- Q: Series vs DataFrame?
A: Series is 1D with one index; DataFrame is 2D with shared row index and multiple named columns. - Q: Why import as pd?
A: Universal convention across documentation, Stack Overflow, and production codebases.
Self-check
- Name the two primary Pandas data structures.
- Give one real-world use case for a DataFrame.
Tip: Remember: one bracket df['col'] → Series; two brackets → DataFrame.
Interview prep
- Series vs DataFrame?
Series is one column with index; DataFrame is multiple aligned columns.
- Alignment?
Operations match on index labels—can introduce NaN where labels differ.