Skip to content

Systems Architect

Verified

Design infrastructure, networks, and cloud systems with integration, reliability, and security patterns.

401 downloads
$ Add to .claude/skills/

About This Skill

# Systems Architecture Rules

Infrastructure Design - Design for failure at every layer — hardware fails, networks partition, regions go down - Redundancy costs money, downtime costs more — calculate acceptable risk - Prefer managed services for undifferentiated work — run less, build more - Infrastructure as code from day one — manual changes drift and break - Immutable infrastructure beats patching — replace, don't repair

Cloud Architecture - Multi-AZ minimum, multi-region for critical systems — availability zones fail together sometimes - Right-size first, auto-scale second — baseline must be correct - Reserved capacity for steady load, spot/preemptible for bursts — cost optimization requires planning - Egress costs add up — keep traffic within regions when possible - Cloud vendor lock-in is real — abstract where escape matters, accept where it doesn't

Networking - Private subnets for workloads, public only for load balancers — minimize attack surface - VPC peering and transit gateways for multi-account — plan topology before scaling - DNS for service discovery — hardcoded IPs break migrations - Zero trust: authenticate and encrypt internal traffic — perimeter security isn't enough - Network segmentation limits blast radius — flat networks let attackers roam

Integration Patterns - APIs for synchronous, queues for asynchronous — match pattern to requirements - Event-driven for loose coupling — producers don't know consumers - Service mesh for complex microservices — observability and security at network layer - Rate limiting and backpressure protect systems — don't let slow consumers crash fast producers - Dead letter queues for failed messages — don't lose data, process later

Reliability - Define SLOs before building — what does "up" mean for this system? - Error budgets allow controlled risk — 99.9% means 8 hours downtime per year is acceptable - Blast radius reduction: cell-based architecture — limit how many users one failure affects - Chaos engineering in staging first — break things intentionally before production breaks accidentally - Runbooks for every alert — 3 AM isn't debugging time

Disaster Recovery - RTO (recovery time) and RPO (data loss) are business decisions — architect for the requirement - Backups aren't recovery until tested — restore regularly - Hot/warm/cold standby each have trade-offs — cost vs speed of recovery - Cross-region replication for critical data — single region is single point of failure - DR drills reveal real problems — plan meets reality

Security - Defense in depth: multiple barriers — one layer will fail - Least privilege for services too — not just users - Secrets management centralized — no secrets in code, config files, or environment variables in images - Audit logging for compliance and forensics — you'll need it after a breach - Patch aggressively — known vulnerabilities are actively exploited

Monitoring and Observability - Metrics, logs, and traces together — each tells part of the story - Alerting on symptoms, not causes — users down matters, CPU high might not - Dashboards for each service with golden signals — latency, traffic, errors, saturation - Distributed tracing across services — follow requests end to end - Log aggregation with retention policy — balance cost and forensic needs

Capacity Planning - Measure current baseline before projecting — can't scale what you don't measure - Load test to find breaking points — theory differs from reality - Capacity leads demand — scaling takes time, be ahead - Cost modeling for growth scenarios — 10x users is rarely 10x cost - Review quarterly at minimum — patterns change

Migration and Evolution - Strangler fig pattern for legacy replacement — route traffic gradually - Blue-green or canary for infrastructure changes — test in production safely - Database migrations are hardest — plan data migration separately - Rollback plans before rollout — assume failure, prepare for it - Communicate maintenance windows — surprises damage trust

Use Cases

  • Design cloud infrastructure architectures with appropriate service selection
  • Plan network topologies with security zones, VPNs, and load balancing
  • Evaluate integration patterns — REST, gRPC, message queues, event-driven architectures
  • Design for reliability with redundancy, failover, and disaster recovery strategies
  • Review existing architectures for security gaps and scalability bottlenecks

Pros & Cons

Pros

  • +Broad scope covering infrastructure, networking, cloud, integration, and security
  • +Useful for architecture design reviews and documentation generation
  • +Security-verified with no dangerous commands or data risks

Cons

  • -Minimal skill content — relies on AI knowledge rather than embedded reference material
  • -No specific cloud provider expertise or tooling integrations

FAQ

What does Systems Architect do?
Design infrastructure, networks, and cloud systems with integration, reliability, and security patterns.
What platforms support Systems Architect?
Systems Architect is available on Claude Code, OpenClaw.
What are the use cases for Systems Architect?
Design cloud infrastructure architectures with appropriate service selection. Plan network topologies with security zones, VPNs, and load balancing. Evaluate integration patterns — REST, gRPC, message queues, event-driven architectures.

100+ free AI tools

Writing, PDF, image, and developer tools — all in your browser.

Next Step

Use the skill detail page to evaluate fit and install steps. For a direct browser workflow, move into a focused tool route instead of staying in broader support surfaces.