Amazon CloudWatch collects metrics, logs, and alarms. Every EC2, RDS, and Lambda emits default metrics—custom metrics and dashboards complete observability.
CloudWatch pillars
- Metrics — CPUUtilization, Invocations, Duration (Lambda)
- Logs — CloudWatch Logs groups from apps and Lambda
- Alarms — SNS notification when threshold breached
- Dashboards — operational single pane of glass
Create alarm concept
# Console: CloudWatch → Alarms → Create alarm
# Metric: EC2 → Per-Instance Metrics → CPUUtilization
# Threshold: Average >= 80% for 5 minutes → SNS topic
aws cloudwatch describe-alarms \
--query 'MetricAlarms[].{Name:AlarmName,State:StateValue}' \
--output tablePractice: Run SDK examples locally with sandbox credentials via AWS_PROFILE=sandbox. Never commit real keys—use IAM roles in deployed environments.
Logs insights
Query Lambda logs with Logs Insights: filter errors, latency percentiles. Ship app logs from Python via watchtower or stdout captured by Lambda runtime.
Operational hygiene
- Alarm on billing, error rate, and latency SLOs
- Set log retention— indefinite logs cost money
- Use X-Ray for distributed tracing (awareness)
Incident response ties to Cybersecurity.
Important interview questions and answers
- Q: Metric vs log?
A: Metrics are numeric time series; logs are text events for debugging. - Q: Alarm action?
A: Typically SNS email/SMS/PagerDuty or Auto Scaling policy trigger.
Self-check
- What three CloudWatch features did this lesson cover?
- Why set log retention policies?
Tip: Alarm on 5xx rate and p99 latency—not just CPU—for user-facing services.
Interview prep
- Metrics vs logs?
Metrics are numeric time series; logs are event text for debugging.
- Billing alarm?
CloudWatch on EstimatedCharges alerts before surprise sandbox costs.