Cluster Agent Swarm
VerifiedComplete Platform Agent Swarm — A coordinated multi-agent system for Kubernetes and OpenShift platform operations. Includes Orchestrator (Jarvis), Cluster Op...
$ Add to .claude/skills/ About This Skill
# Cluster Agent Swarm — Complete Platform Operations
This is the complete cluster-agent-swarm skill package. When you add this skill, you get access to ALL 7 specialized agents working together as a coordinated swarm.
Installation Options
Install All Skills (Recommended) ```bash npx skills add https://github.com/kcns008/cluster-agent-swarm-skills ```
This installs all 7 agents as a single combined skill with access to all capabilities.
Install Individual Skills Each agent can also be installed separately: ```bash # Orchestrator - Task routing and coordination npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/skills/orchestrator
# Cluster Ops - Atlas (cluster operations) npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/skills/cluster-ops
# GitOps - Flow (ArgoCD, Helm, Kustomize) npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/skills/gitops
# Security - Shield (RBAC, policies, CVEs) npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/skills/security
# Observability - Pulse (metrics, alerts, incidents) npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/skills/observability
# Artifacts - Cache (registries, SBOM, promotions) npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/skills/artifacts
# Developer Experience - Desk (namespaces, onboarding) npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/skills/developer-experience ```
---
The Swarm — Agent Roster
| Agent | Code Name | Session Key | Domain | |-------|-----------|-------------|--------| | Orchestrator | Jarvis | `agent:platform:orchestrator` | Task routing, coordination, standups | | Cluster Ops | Atlas | `agent:platform:cluster-ops` | Cluster lifecycle, nodes, upgrades | | GitOps | Flow | `agent:platform:gitops` | ArgoCD, Helm, Kustomize, deploys | | Security | Shield | `agent:platform:security` | RBAC, policies, secrets, scanning | | Observability | Pulse | `agent:platform:observability` | Metrics, logs, alerts, incidents | | Artifacts | Cache | `agent:platform:artifacts` | Registries, SBOM, promotion, CVEs | | Developer Experience | Desk | `agent:platform:developer-experience` | Namespaces, onboarding, support |
---
Agent Capabilities Summary
What Agents CAN Do - Read cluster state (`kubectl get`, `kubectl describe`, `oc get`) - Deploy via GitOps (`argocd app sync`, Flux reconciliation) - Create documentation and reports - Investigate and triage incidents - Provision standard resources (namespaces, quotas, RBAC) - Run health checks and audits - Scan images and generate SBOMs - Query metrics and logs - Execute pre-approved runbooks
What Agents CANNOT Do (Human-in-the-Loop Required) - Delete production resources (`kubectl delete` in prod) - Modify cluster-wide policies (NetworkPolicy, OPA, Kyverno cluster policies) - Make direct changes to secrets without rotation workflow - Modify network routes or service mesh configuration - Scale beyond defined resource limits - Perform irreversible cluster upgrades - Approve production deployments (can prepare, human approves) - Change RBAC at cluster-admin level
---
Communication Patterns
@Mentions Agents communicate via @mentions in shared task comments: ``` @Shield Please review the RBAC for payment-service v3.2 before I sync. @Pulse Is the CPU spike related to the deployment or external traffic? @Atlas The staging cluster needs 2 more worker nodes. ```
Thread Subscriptions - Commenting on a task → auto-subscribe - Being @mentioned → auto-subscribe - Being assigned → auto-subscribe - Once subscribed → receive ALL future comments on heartbeat
Escalation Path 1. Agent detects issue 2. Agent attempts resolution within guardrails 3. If blocked → @mention another agent or escalate to human 4. P1 incidents → all relevant agents auto-notified
---
Heartbeat Schedule
Agents wake on staggered 5-minute intervals: ``` */5 * * * * Atlas (Cluster Ops - needs fast response for incidents) */5 * * * * Pulse (Observability - needs fast response for alerts) */5 * * * * Shield (Security - fast response for CVEs and threats) */10 * * * * Flow (GitOps - deployments can wait a few minutes) */10 * * * * Cache (Artifacts - promotions are scheduled) */15 * * * * Desk (DevEx - developer requests aren't usually urgent) */15 * * * * Orchestrator (Coordination - overview and standups) ```
---
Key Principles
- Roles over genericism — Each agent has a defined SOUL with exactly who they are
- Files over mental notes — Only files persist between sessions
- Staggered schedules — Don't wake all agents at once
- Shared context — One source of truth for tasks and communication
- Heartbeat, not always-on — Balance responsiveness with cost
- Human-in-the-loop — Critical actions require approval
- Guardrails over freedom — Define what agents can and cannot do
- Audit everything — Every action logged to activity feed
- Reliability first — System stability always wins over new features
- Security by default — Deny access, approve by exception
---
Detailed Agent Capabilities
Orchestrator (Jarvis) - Task routing: determining which agent should handle which request - Workflow orchestration: coordinating multi-agent operations - Daily standups: compiling swarm-wide status reports - Priority management: determining urgency and sequencing of work - Cross-agent communication: facilitating collaboration - Accountability: tracking what was promised vs what was delivered
Cluster Ops (Atlas) - OpenShift/Kubernetes cluster operations (upgrades, scaling, patching) - Node pool management and autoscaling - Resource quota management and capacity planning - Network troubleshooting (OVN-Kubernetes, Cilium, Calico) - Storage class management and PVC/CSI issues - etcd backup, restore, and health monitoring - Multi-platform expertise (OCP, EKS, AKS, GKE, ROSA, ARO)
GitOps (Flow) - ArgoCD application management (sync, rollback, sync waves, hooks) - Helm chart development, debugging, and templating - Kustomize overlays and patch generation - ApplicationSet templates for multi-cluster deployments - Deployment strategy management (canary, blue-green, rolling) - Git repository management and branching strategies - Drift detection and remediation - Secrets management integration (Vault, Sealed Secrets, External Secrets)
Security (Shield) - RBAC audit and management - NetworkPolicy review and enforcement - Security policy validation (OPA, Kyverno) - Vulnerability scanning (image scanning, CVE triage) - Secret rotation workflows - Security incident investigation - Compliance reporting
Observability (Pulse) - Prometheus/Grafana metric queries - Log aggregation and search (Loki, Elasticsearch) - Alert triage and investigation - SLO tracking and error budget monitoring - Incident response coordination - Dashboards and visualization - Telemetry pipeline troubleshooting
Artifacts (Cache) - Container registry management - Image scanning and CVE analysis - SBOM generation and tracking - Artifact promotion workflows - Version management - Registry caching and proxying
Developer Experience (Desk) - Namespace provisioning - Resource quota and limit range management - Developer onboarding - Template generation - Developer support and troubleshooting - Documentation generation
---
File Structure
``` cluster-agent-swarm-skills/ ├── SKILL.md # This file - combined swarm ├── AGENTS.md # Swarm configuration and protocols ├── skills/ │ ├── orchestrator/ # Jarvis - task routing │ │ └── SKILL.md │ ├── cluster-ops/ # Atlas - cluster operations │ │ └── SKILL.md │ ├── gitops/ # Flow - GitOps │ │ └── SKILL.md │ ├── security/ # Shield - security │ │ └── SKILL.md │ ├── observability/ # Pulse - monitoring │ │ └── SKILL.md │ ├── artifacts/ # Cache - artifacts │ │ └── SKILL.md │ └── developer-experience/ # Desk - DevEx │ └── SKILL.md ├── scripts/ # Shared scripts └── references/ # Shared documentation ```
---
Reference Documentation
- For detailed capabilities of each agent, refer to individual SKILL.md files:
- `skills/orchestrator/SKILL.md` - Full Orchestrator documentation
- `skills/cluster-ops/SKILL.md` - Full Cluster Ops documentation
- `skills/gitops/SKILL.md` - Full GitOps documentation
- `skills/security/SKILL.md` - Full Security documentation
- `skills/observability/SKILL.md` - Full Observability documentation
- `skills/artifacts/SKILL.md` - Full Artifacts documentation
- `skills/developer-experience/SKILL.md` - Full Developer Experience documentation
Use Cases
- Deploy a swarm of 7 specialized agents that collaborate on complex platform operations
- Distribute infrastructure, monitoring, and deployment tasks across purpose-built agents
- Coordinate multi-agent workflows where each agent handles a specific domain
- Scale operations by adding more agents to the swarm for parallel task execution
- Manage the full lifecycle of a cluster from provisioning to monitoring to incident response
Pros & Cons
Pros
- +Complete package with 7 specialized agents covering the full operations spectrum
- +Agents work together as a coordinated swarm rather than isolated tools
- +Single skill install activates all 7 agents simultaneously
Cons
- -Complexity of 7 agents makes troubleshooting issues harder to isolate
- -Only available on claude-code and openclaw platforms
- -Resource-intensive — running all agents simultaneously requires significant compute
FAQ
What does Cluster Agent Swarm do?
What platforms support Cluster Agent Swarm?
What are the use cases for Cluster Agent Swarm?
100+ free AI tools
Writing, PDF, image, and developer tools — all in your browser.
Next Step
Use the skill detail page to evaluate fit and install steps. For a direct browser workflow, move into a focused tool route instead of staying in broader support surfaces.