Skip to content

Monitoring Setup

Caution

Configures observability stacks including Prometheus, Grafana, Alertmanager, and OpenTelemetry with dashboards, alerts, and SLO definitions.

By SRE Skills Lab 2,680 v1.5.0 Updated 2026-03-10

Install

Claude Code

Copy the SKILL.md file to your project's .claude/skills/ directory

About This Skill

Monitoring Setup is a skill that generates complete observability configurations for your infrastructure and applications. It covers the full stack from metric collection to dashboards to alerting, following Google SRE and OpenTelemetry best practices.

How It Works

  1. Stack assessment — Identifies what you're running and recommends the right monitoring approach
  2. Metric collection — Generates Prometheus scrape configs, recording rules, and service discovery
  3. Dashboard creation — Produces Grafana dashboard JSON with RED metrics (Rate, Errors, Duration)
  4. Alert design — Creates multi-level alert rules with proper severity, grouping, and inhibition
  5. SLO definition — Helps define Service Level Objectives with error budget burn rate alerts

Best For

  • Setting up monitoring for new Kubernetes clusters or services
  • Migrating from legacy monitoring to Prometheus/Grafana stack
  • Implementing SLO-based alerting to reduce alert fatigue
  • Adding distributed tracing with OpenTelemetry

Philosophy

Follows the Google SRE approach: monitor symptoms not causes, alert on SLO violations not individual metrics, and use dashboards for diagnosis not detection.

Use Cases

  • Set up Prometheus scrape configs and recording rules
  • Create Grafana dashboards from application metrics
  • Define SLOs and error budget alerting policies
  • Configure OpenTelemetry collectors for distributed tracing
  • Design multi-tier alerting with escalation policies

Pros & Cons

Pros

  • + Complete stack from collection to alerting in one skill
  • + Follows Google SRE best practices for alert design
  • + Generates importable Grafana dashboard JSON
  • + Supports OpenTelemetry for modern observability

Cons

  • - Dashboard layouts may need visual tuning in Grafana UI
  • - Custom exporters require manual integration
  • - Alert thresholds need calibration with real traffic data

Related AI Tools

Related Skills

Stay Updated on Agent Skills

Get weekly curated skills + safety alerts

每周精选 Skills + 安全预警