Python dominates data science through notebooks, NumPy, pandas, and scikit-learn—install those locally via pip. This lesson previews concepts using stdlib only in the playground; full stacks need Jupyter on your machine.
Typical local stack
- Jupyter — interactive notebooks for exploration
- pandas — DataFrames for tabular data
- NumPy — numerical arrays and vectorized math
- matplotlib / seaborn — visualization
Stdlib preview: statistics
import statistics
data = [10, 12, 14, 18, 23]
print(statistics.mean(data))
print(statistics.median(data))
Compare careers with R or SQL-focused pipelines—Python glue connects APIs, ETL, and ML serving.
Important interview questions and answers
- Q: Why Python for data science?
A: Readable syntax, rich PyPI ecosystem, and notebook workflow speed experimentation. - Q: pandas vs SQL?
A: SQL queries databases at scale; pandas manipulates in-memory tables—often used together.
Self-check
- What stdlib module computes mean and median?
- Can pandas run in this playground?
Tip: Install pandas/Jupyter locally—playground stays stdlib-only; statistics previews numeric summaries.
Interview prep
- Why Python for data?
Readable syntax, PyPI stack (pandas, numpy), notebook workflow for exploration.
- statistics module?
Stdlib mean/median/stdev—full analytics need local pandas/numpy install.