Skip to content
Learn Netverks

Lesson

Step 21/36 58% through track

merge-join

Merge and join

Last reviewed May 28, 2026 Content v20260528
Track mode
server_script
Means
Server runner
Reading
~1 min
Level
intermediate

This lesson

This lesson teaches Merge and join: Pandas tabular manipulation—indexing, dtypes, reshaping, and analysis habits for real-world tables.

Many-to-one merge mistakes duplicate rows silently—analysts and ML engineers debug this weekly.

You will apply Merge and join in contexts like: Customer 360 tables, experiment cohort joins, and feature-store enrichment.

Read the narrative, run `import pandas as pd` snippets with in-memory DataFrames (install pandas and numpy with pip if needed), inspect `.head()`, `.dtypes`, and complete MCQs. Also verify row counts before and after joins or aggregations.

When you can explain the previous lesson's ideas in your own words.

pd.merge combines DataFrames on key columns—like SQL JOIN. Specify how='inner'|'left'|'right'|'outer' and validate row counts after merge to catch duplicate keys.

Basic merge

import pandas as pd
left = pd.DataFrame({'id': [1, 2], 'name': ['A', 'B']})
right = pd.DataFrame({'id': [1, 3], 'score': [90, 85]})
inner = pd.merge(left, right, on='id', how='inner')
print(inner)

Join types

howSQL equivalent
innerINNER JOIN
leftLEFT JOIN
rightRIGHT JOIN
outerFULL OUTER JOIN

Duplicate keys

If keys repeat, merge produces Cartesian expansion—always check len(result) vs expected. Use validate='one_to_one' to catch mistakes early.

Important interview questions and answers

  1. Q: on vs left_on/right_on?
    A: Use when key column names differ between DataFrames.
  2. Q: merge vs join?
    A: df.join is index-based; merge is column-key based—merge is more common.

Self-check

  1. Perform a left merge keeping all left rows.
  2. What happens with duplicate keys in both tables?

Pitfall: Check len(merged) after join—duplicate keys multiply rows silently.

Interview prep

Inner vs left?

Inner keeps matches only; left keeps all left rows.

Duplicate keys?

Cartesian expansion—inflate row count; use validate= to catch.

Interview tip Lesson completion confidence

Can you explain this lesson in 30 seconds without reading notes?

Not saved yet.

Playground

Runs on the configured server runner (dev: npm run runner with LEARNING_RUNNER_ENABLED=true). Output appears below the editor.

Check yourself

Multiple choice — immediate feedback.

Discussion

Past discussion is visible to everyone. Only logged-in users can post comments and replies.

Starter discussion topics

  • merge keys?
  • Many-to-many risk?

Sign up or log in to post comments and sync lesson progress across devices.

No discussion yet. Be the first to ask a question.

Jump