Skip to content
Learn Netverks

Lesson

Step 32/36 89% through track

sklearn-pandas-preview

scikit-learn and Pandas preview

Last reviewed Jun 1, 2026 Content v20260601
Track mode
server_script
Means
Server runner
Reading
~1 min
Level
intermediate

This lesson

This lesson teaches scikit-learn and Pandas preview: Pandas tabular manipulation—indexing, dtypes, reshaping, and analysis habits for real-world tables.

This track orients workflow; NumPy/Pandas tracks teach the tools you will use daily in notebooks.

You will apply scikit-learn and Pandas preview in contexts like: Train/test feature matrices built from wrangled DataFrames.

Read the narrative, run `import pandas as pd` snippets with in-memory DataFrames (install pandas and numpy with pip if needed), inspect `.head()`, `.dtypes`, and complete MCQs.

Toward the end—consolidate before SciPy, sklearn-heavy projects, and interview prep.

scikit-learn estimators accept NumPy arrays; Pandas DataFrames work when all columns are numeric. Export with to_numpy(), keep feature names in ColumnTransformer pipelines, and never leak test statistics into train features.

Feature matrix convention

import pandas as pd
import numpy as np

df = pd.DataFrame({'age': [25, 30], 'income': [50000, 60000]})
X = df.to_numpy()  # shape (n_samples, n_features)
print('X shape:', X.shape)

Pipeline pattern

  • Split train/test before fit
  • Fit scalers/encoders on train only
  • Use ColumnTransformer for mixed numeric/categorical columns
  • Keep column names in a list parallel to X columns

Common pitfall

Fitting StandardScaler on full dataset before split leaks future information—always fit on train, transform both.

Important interview questions and answers

  1. Q: X shape?
    A: (n_samples, n_features)—rows are observations, columns are features.
  2. Q: Categorical columns?
    A: One-hot encode or ordinal encode before sklearn—does not accept raw strings.

Self-check

  1. Export a numeric DataFrame to X matrix.
  2. Why fit scaler on train only?

Pitfall: Fit scalers and encoders on train split only—never on full data before split.

Interview prep

X shape?

(n_samples, n_features)—rows observations, columns features.

Leakage?

Never fit scaler/encoder on full dataset before train/test split.

Interview tip Lesson completion confidence

Can you explain this lesson in 30 seconds without reading notes?

Not saved yet.

Playground

Runs on the configured server runner (dev: npm run runner with LEARNING_RUNNER_ENABLED=true). Output appears below the editor.

Check yourself

Multiple choice — immediate feedback.

Discussion

Past discussion is visible to everyone. Only logged-in users can post comments and replies.

Starter discussion topics

  • X y from columns?
  • Train leakage cols?

Sign up or log in to post comments and sync lesson progress across devices.

No discussion yet. Be the first to ask a question.

Jump