Aggregate functions collapse many rows into summary values. GROUP BY computes aggregates per group—foundation of dashboards and Data Science SQL summaries.
Aggregate functions
SELECT COUNT(*) AS order_count,
SUM(total) AS revenue,
AVG(total) AS avg_order,
MIN(ordered_at) AS first_order,
MAX(ordered_at) AS last_order
FROM orders;Practice: Seed customers and orders sample data, then run each query. Verify row counts manually.
GROUP BY
SELECT customer_id,
COUNT(*) AS order_count,
SUM(total) AS spent
FROM orders
GROUP BY customer_id;Every non-aggregated SELECT column must appear in GROUP BY (or be functionally dependent in engines that allow it).
Join then aggregate
SELECT c.name,
COUNT(o.id) AS order_count,
COALESCE(SUM(o.total), 0) AS spent
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.id, c.name;LEFT JOIN + GROUP BY includes customers with zero orders (COUNT(o.id) is 0).
Important interview questions and answers
- Q: COUNT(*) vs COUNT(col)?
A: COUNT(*) counts rows; COUNT(col) ignores NULLs in col. - Q: GROUP BY error symptom?
A: Selecting bare column not in GROUP BY causes error in strict SQL modes.
Self-check
- Which aggregate computes average?
- Why GROUP BY customer_id before summing totals?
Pitfall: Every selected non-aggregate column must appear in GROUP BY in strict SQL.
Interview prep
- COUNT(*) vs COUNT(col)?
COUNT(*) all rows; COUNT(col) ignores NULLs in col.
- GROUP BY rule?
Non-aggregated SELECT columns must be in GROUP BY.