Aman Goyal

LeetCode LeetCode

Monitoring and Observability: Logs, Metrics, Alerts, and Tracing

Core Concept

Observability = ability to debug from outside


Four Pillars (VERY IMPORTANT)


1. Logging

Key practices:

Insight:


2. Metrics

Types:

Most important metrics:


3. Alerting

Key idea:

Alerts = your SLO definition

Examples:


Important:

Balance is critical


Advanced:


4. Tracing (VERY IMPORTANT)

How:

Enables:


Logging Best Practices

Log what you’ll wish you had during debugging


Metrics Best Practices


Request Monitoring

Track:

Use labels:


Advanced Metrics

Helps identify:


Pull vs Push


Alerting Insights

Bad alerts:

Good alerts:


Tracing Insights

With tracing:

Use tools like:


Aggregation & Storage


Why needed:


Techniques:

1. Log aggregation


2. Downsampling


3. Tiered storage


Key Insights

Together they give:


Trade-offs

Pros

Cons


One-line Summary

Monitoring and observability combine logs, metrics, alerting, and tracing to detect, understand, and debug issues in distributed systems before users are impacted.

#Distributed Systems #System Design #Observability #Monitoring #Logging #Prometheus #Tracing