Monitoring Setup
CautionConfigures observability stacks including Prometheus, Grafana, Alertmanager, and OpenTelemetry with dashboards, alerts, and SLO definitions.
Install
Claude Code
Copy the SKILL.md file to your project's .claude/skills/ directory About This Skill
Monitoring Setup is a skill that generates complete observability configurations for your infrastructure and applications. It covers the full stack from metric collection to dashboards to alerting, following Google SRE and OpenTelemetry best practices.
How It Works
- Stack assessment — Identifies what you're running and recommends the right monitoring approach
- Metric collection — Generates Prometheus scrape configs, recording rules, and service discovery
- Dashboard creation — Produces Grafana dashboard JSON with RED metrics (Rate, Errors, Duration)
- Alert design — Creates multi-level alert rules with proper severity, grouping, and inhibition
- SLO definition — Helps define Service Level Objectives with error budget burn rate alerts
Best For
- Setting up monitoring for new Kubernetes clusters or services
- Migrating from legacy monitoring to Prometheus/Grafana stack
- Implementing SLO-based alerting to reduce alert fatigue
- Adding distributed tracing with OpenTelemetry
Philosophy
Follows the Google SRE approach: monitor symptoms not causes, alert on SLO violations not individual metrics, and use dashboards for diagnosis not detection.
Use Cases
- Set up Prometheus scrape configs and recording rules
- Create Grafana dashboards from application metrics
- Define SLOs and error budget alerting policies
- Configure OpenTelemetry collectors for distributed tracing
- Design multi-tier alerting with escalation policies
Pros & Cons
Pros
- + Complete stack from collection to alerting in one skill
- + Follows Google SRE best practices for alert design
- + Generates importable Grafana dashboard JSON
- + Supports OpenTelemetry for modern observability
Cons
- - Dashboard layouts may need visual tuning in Grafana UI
- - Custom exporters require manual integration
- - Alert thresholds need calibration with real traffic data
Related AI Tools
Related Skills
Kubernetes Deployer
CautionGenerates and validates Kubernetes manifests, Helm charts, and deployment strategies including rolling updates, canary, and blue-green deployments.
AWS Architect
CautionDesigns AWS architectures by selecting appropriate services, defining VPC layouts, IAM policies, and cost-optimized resource configurations.
Stay Updated on Agent Skills
Get weekly curated skills + safety alerts
每周精选 Skills + 安全预警