Cluster Agent Swarm

Verified

Complete Platform Agent Swarm — A coordinated multi-agent system for Kubernetes and OpenShift platform operations. Includes Orchestrator (Jarvis), Cluster Op...

93 downloads

$ Add to .claude/skills/

$ openclaw install

About This Skill

# Cluster Agent Swarm — Complete Platform Operations

This is the complete cluster-agent-swarm skill package. When you add this skill, you get access to ALL 7 specialized agents working together as a coordinated swarm.

Installation Options

Install All Skills (Recommended) ```bash npx skills add https://github.com/kcns008/cluster-agent-swarm-skills ```

This installs all 7 agents as a single combined skill with access to all capabilities.

Install Individual Skills Each agent can also be installed separately: ```bash # Orchestrator - Task routing and coordination npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/skills/orchestrator

# Cluster Ops - Atlas (cluster operations) npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/skills/cluster-ops

# GitOps - Flow (ArgoCD, Helm, Kustomize) npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/skills/gitops

# Security - Shield (RBAC, policies, CVEs) npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/skills/security

# Observability - Pulse (metrics, alerts, incidents) npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/skills/observability

# Artifacts - Cache (registries, SBOM, promotions) npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/skills/artifacts

# Developer Experience - Desk (namespaces, onboarding) npx skills add https://github.com/kcns008/cluster-agent-swarm-skills/skills/developer-experience ```

---

The Swarm — Agent Roster

| Agent | Code Name | Session Key | Domain | |-------|-----------|-------------|--------| | Orchestrator | Jarvis | `agent:platform:orchestrator` | Task routing, coordination, standups | | Cluster Ops | Atlas | `agent:platform:cluster-ops` | Cluster lifecycle, nodes, upgrades | | GitOps | Flow | `agent:platform:gitops` | ArgoCD, Helm, Kustomize, deploys | | Security | Shield | `agent:platform:security` | RBAC, policies, secrets, scanning | | Observability | Pulse | `agent:platform:observability` | Metrics, logs, alerts, incidents | | Artifacts | Cache | `agent:platform:artifacts` | Registries, SBOM, promotion, CVEs | | Developer Experience | Desk | `agent:platform:developer-experience` | Namespaces, onboarding, support |

---

Agent Capabilities Summary

What Agents CAN Do - Read cluster state (`kubectl get`, `kubectl describe`, `oc get`) - Deploy via GitOps (`argocd app sync`, Flux reconciliation) - Create documentation and reports - Investigate and triage incidents - Provision standard resources (namespaces, quotas, RBAC) - Run health checks and audits - Scan images and generate SBOMs - Query metrics and logs - Execute pre-approved runbooks

What Agents CANNOT Do (Human-in-the-Loop Required) - Delete production resources (`kubectl delete` in prod) - Modify cluster-wide policies (NetworkPolicy, OPA, Kyverno cluster policies) - Make direct changes to secrets without rotation workflow - Modify network routes or service mesh configuration - Scale beyond defined resource limits - Perform irreversible cluster upgrades - Approve production deployments (can prepare, human approves) - Change RBAC at cluster-admin level

---

Communication Patterns

@Mentions Agents communicate via @mentions in shared task comments: ``` @Shield Please review the RBAC for payment-service v3.2 before I sync. @Pulse Is the CPU spike related to the deployment or external traffic? @Atlas The staging cluster needs 2 more worker nodes. ```

Thread Subscriptions - Commenting on a task → auto-subscribe - Being @mentioned → auto-subscribe - Being assigned → auto-subscribe - Once subscribed → receive ALL future comments on heartbeat

Escalation Path 1. Agent detects issue 2. Agent attempts resolution within guardrails 3. If blocked → @mention another agent or escalate to human 4. P1 incidents → all relevant agents auto-notified

---

Heartbeat Schedule

Agents wake on staggered 5-minute intervals: ``` */5 * * * * Atlas (Cluster Ops - needs fast response for incidents) */5 * * * * Pulse (Observability - needs fast response for alerts) */5 * * * * Shield (Security - fast response for CVEs and threats) */10 * * * * Flow (GitOps - deployments can wait a few minutes) */10 * * * * Cache (Artifacts - promotions are scheduled) */15 * * * * Desk (DevEx - developer requests aren't usually urgent) */15 * * * * Orchestrator (Coordination - overview and standups) ```

---

Key Principles

Roles over genericism — Each agent has a defined SOUL with exactly who they are
Files over mental notes — Only files persist between sessions
Staggered schedules — Don't wake all agents at once
Shared context — One source of truth for tasks and communication
Heartbeat, not always-on — Balance responsiveness with cost
Human-in-the-loop — Critical actions require approval
Guardrails over freedom — Define what agents can and cannot do
Audit everything — Every action logged to activity feed
Reliability first — System stability always wins over new features
Security by default — Deny access, approve by exception

---

Detailed Agent Capabilities

Orchestrator (Jarvis) - Task routing: determining which agent should handle which request - Workflow orchestration: coordinating multi-agent operations - Daily standups: compiling swarm-wide status reports - Priority management: determining urgency and sequencing of work - Cross-agent communication: facilitating collaboration - Accountability: tracking what was promised vs what was delivered

Cluster Ops (Atlas) - OpenShift/Kubernetes cluster operations (upgrades, scaling, patching) - Node pool management and autoscaling - Resource quota management and capacity planning - Network troubleshooting (OVN-Kubernetes, Cilium, Calico) - Storage class management and PVC/CSI issues - etcd backup, restore, and health monitoring - Multi-platform expertise (OCP, EKS, AKS, GKE, ROSA, ARO)

GitOps (Flow) - ArgoCD application management (sync, rollback, sync waves, hooks) - Helm chart development, debugging, and templating - Kustomize overlays and patch generation - ApplicationSet templates for multi-cluster deployments - Deployment strategy management (canary, blue-green, rolling) - Git repository management and branching strategies - Drift detection and remediation - Secrets management integration (Vault, Sealed Secrets, External Secrets)

Security (Shield) - RBAC audit and management - NetworkPolicy review and enforcement - Security policy validation (OPA, Kyverno) - Vulnerability scanning (image scanning, CVE triage) - Secret rotation workflows - Security incident investigation - Compliance reporting

Observability (Pulse) - Prometheus/Grafana metric queries - Log aggregation and search (Loki, Elasticsearch) - Alert triage and investigation - SLO tracking and error budget monitoring - Incident response coordination - Dashboards and visualization - Telemetry pipeline troubleshooting

Artifacts (Cache) - Container registry management - Image scanning and CVE analysis - SBOM generation and tracking - Artifact promotion workflows - Version management - Registry caching and proxying

Developer Experience (Desk) - Namespace provisioning - Resource quota and limit range management - Developer onboarding - Template generation - Developer support and troubleshooting - Documentation generation

---

File Structure

``` cluster-agent-swarm-skills/ ├── SKILL.md # This file - combined swarm ├── AGENTS.md # Swarm configuration and protocols ├── skills/ │ ├── orchestrator/ # Jarvis - task routing │ │ └── SKILL.md │ ├── cluster-ops/ # Atlas - cluster operations │ │ └── SKILL.md │ ├── gitops/ # Flow - GitOps │ │ └── SKILL.md │ ├── security/ # Shield - security │ │ └── SKILL.md │ ├── observability/ # Pulse - monitoring │ │ └── SKILL.md │ ├── artifacts/ # Cache - artifacts │ │ └── SKILL.md │ └── developer-experience/ # Desk - DevEx │ └── SKILL.md ├── scripts/ # Shared scripts └── references/ # Shared documentation ```

---

Reference Documentation

For detailed capabilities of each agent, refer to individual SKILL.md files:
`skills/orchestrator/SKILL.md` - Full Orchestrator documentation
`skills/cluster-ops/SKILL.md` - Full Cluster Ops documentation
`skills/gitops/SKILL.md` - Full GitOps documentation
`skills/security/SKILL.md` - Full Security documentation
`skills/observability/SKILL.md` - Full Observability documentation
`skills/artifacts/SKILL.md` - Full Artifacts documentation
`skills/developer-experience/SKILL.md` - Full Developer Experience documentation

Use Cases

Deploy a swarm of 7 specialized agents that collaborate on complex platform operations
Distribute infrastructure, monitoring, and deployment tasks across purpose-built agents
Coordinate multi-agent workflows where each agent handles a specific domain
Scale operations by adding more agents to the swarm for parallel task execution
Manage the full lifecycle of a cluster from provisioning to monitoring to incident response

Pros & Cons

Pros

+Complete package with 7 specialized agents covering the full operations spectrum
+Agents work together as a coordinated swarm rather than isolated tools
+Single skill install activates all 7 agents simultaneously

Cons

-Complexity of 7 agents makes troubleshooting issues harder to isolate
-Only available on claude-code and openclaw platforms
-Resource-intensive — running all agents simultaneously requires significant compute

FAQ

What does Cluster Agent Swarm do?

Complete Platform Agent Swarm — A coordinated multi-agent system for Kubernetes and OpenShift platform operations. Includes Orchestrator (Jarvis), Cluster Op...

What platforms support Cluster Agent Swarm?

Cluster Agent Swarm is available on Claude Code, OpenClaw.

What are the use cases for Cluster Agent Swarm?

Deploy a swarm of 7 specialized agents that collaborate on complex platform operations. Distribute infrastructure, monitoring, and deployment tasks across purpose-built agents. Coordinate multi-agent workflows where each agent handles a specific domain.

100+ free AI tools

Writing, PDF, image, and developer tools — all in your browser.

AI Humanizer

Make AI text undetectable

AI Detector

Free, unlimited

PDF Tools

Merge, split, compress

Next Step

Use the skill detail page to evaluate fit and install steps. For a direct browser workflow, move into a focused tool route instead of staying in broader support surfaces.

Open Free Tools Try AI Detector