🛡️ DevOps & Security Agent Skills
Your AI-Powered Second Brain for Infrastructure & Security
160+ production-ready skills for Claude Code, Cursor, Codex, and every AI agent that reads files.
Explore Skills · Install in 30 Seconds · Contribute
Why This Exists
Install these skills and your agent gains expert-level knowledge of:
| Domain | Skills | What Your Agent Learns |
|---|---|---|
| 🔧 DevOps | 40+ | CI/CD pipelines, K8s ops, observability, release strategies, platform engineering |
| 🔒 Security | 35+ | Vulnerability scanning, secrets management, hardening, AI agent security, MCP security |
| ☁️ Infrastructure | 65+ | AWS, Azure, GCP, Cloudflare, databases, networking, GPU clusters, local AI |
| 🤖 AI Engineering | 20+ | LLMOps, agent evals, RAG infrastructure, inference scaling, coding agent guardrails |
| 📋 Compliance | 20+ | SOC2, HIPAA, GDPR, PCI-DSS, policy-as-code, auditing |
| 💻 IT Operations | 5+ | Device management, identity/SSO, SaaS security, troubleshooting |
30-Second Install
# Install all skills to Claude Code, Cursor, Codex, or any supported agent
npx skills add bagelhole/DevOps-Security-Agent-Skills
# Install specific skills
npx skills add bagelhole/DevOps-Security-Agent-Skills --skill kubernetes-ops --skill hashicorp-vault -a cursor -y
# Or clone directly
git clone https://github.com/bagelhole/DevOps-Security-Agent-Skills.git ~/.skills/devops-security
Works with Claude Code, Cursor, Codex, OpenCode, Cline, and many more.
What Makes This Different
Most "awesome lists" give you links. This repo gives your AI agent production-ready knowledge it can act on:
# Every skill includes real, copy-pasteable configs like this:
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
replicas: 3
template:
spec:
containers:
- name: myapp
image: myapp:1.0.0
resources:
requests: { memory: "128Mi", cpu: "100m" }
limits: { memory: "256Mi", cpu: "500m" }
securityContext:
runAsNonRoot: true
readOnlyRootFilesystem: true
What's in Each Skill
skill/
├── SKILL.md # 250-400+ lines of expert knowledge
│ ├── When to Use # Decision guidance
│ ├── Prerequisites # What you need
│ ├── Real Configs # Copy-pasteable YAML, JSON, HCL, Bash
│ ├── CLI Commands # Exact commands to run
│ ├── Troubleshooting # Common issues + fixes
│ └── Related Skills # Cross-references
├── scripts/ # Ready-to-run automation
├── references/ # Deep-dive guides
└── assets/ # Config templates
Hot Topics (March 2026)
Skills you won't find in other repos:
| Skill | Why It's Hot |
|---|---|
| MCP Server Security | MCP is everywhere — secure your tool servers |
| AI Coding Agent Guardrails | Safe Claude Code/Cursor/Codex usage for teams |
| eBPF Observability | Kernel-level monitoring with Cilium & Tetragon |
| Platform Engineering | Build internal developer platforms with Backstage |
| Supply Chain Attack Response | Detect & respond to compromised dependencies |
| OpenTofu Migration | Migrate from Terraform to the open-source fork |
| Dev Containers & Nix | Reproducible dev environments for teams |
| Agent Evals | CI/CD gates for AI agent quality & safety |
How It Works
Agent Skills is an open format for extending AI agents. Each SKILL.md has YAML frontmatter that agents load for matching, and detailed instructions that load only when activated:
┌────────────────────────────────────────────────────────────────┐
│ 1. DISCOVER 2. MATCH 3. ACTIVATE │
│ │
│ Agent scans → User asks about → Agent reads full │
│ skill folders Kubernetes SKILL.md + runs │
│ at startup debugging scripts as needed │
└────────────────────────────────────────────────────────────────┘
🏃 Quick Start
Option 1: skills.sh CLI (Recommended)
The skills CLI discovers every SKILL.md in this repository and installs them into your agent's skills directory. See CLI docs and FAQ.
# Install all skills
npx skills add bagelhole/DevOps-Security-Agent-Skills
# List available skills
npx skills add bagelhole/DevOps-Security-Agent-Skills --list
# Install specific skills to a specific agent
npx skills add bagelhole/DevOps-Security-Agent-Skills --skill kubernetes-ops --skill hashicorp-vault -a cursor -y
# Global install
npx skills add bagelhole/DevOps-Security-Agent-Skills -g -y
# Install a single skill by URL
npx skills add https://github.com/bagelhole/DevOps-Security-Agent-Skills/tree/main/devops/orchestration/kubernetes-ops
Install from a local clone: npx skills add . --list from the repo root.
Option 2: Clone or Submodule
# Clone
git clone https://github.com/bagelhole/DevOps-Security-Agent-Skills.git ~/.skills/devops-security
# Or add as a submodule
git submodule add https://github.com/bagelhole/DevOps-Security-Agent-Skills.git .skills/devops-security
Option 3: For Humans
No agent? No problem. Browse the skills, copy the configs, run the scripts. MIT licensed — go wild.
📚 Skill Catalog
🔧 DevOps (40+ skills)
CI/CD
| Skill | Description |
|---|---|
| github-actions | Build, test, and deploy with GitHub Actions |
| gitlab-ci | GitLab CI/CD pipelines and runners |
| jenkins | Jenkins pipelines and shared libraries |
| azure-devops | Azure Pipelines and release management |
| circleci | CircleCI workflows and orbs |
Containers
| Skill | Description |
|---|---|
| docker-management | Docker images, multi-stage builds, optimization |
| docker-compose | Multi-container applications |
| podman | Rootless container management |
| container-registries | ECR, ACR, GCR, Docker Hub |
Orchestration
| Skill | Description |
|---|---|
| kubernetes-ops | Deploy, scale, troubleshoot K8s |
| helm-charts | Helm chart development and deployment |
| argocd-gitops | GitOps with ArgoCD |
| kustomize | Kubernetes manifest customization |
| openshift | OpenShift cluster management |
| model-serving-kubernetes | KServe and Triton model serving with canary deployments and GPU autoscaling |
Observability
| Skill | Description |
|---|---|
| prometheus-grafana | Metrics and dashboards |
| opentelemetry | Vendor-neutral traces, metrics, and logs |
| ebpf-observability | Kernel-level observability with Cilium, Tetragon, and bpftrace |
| elk-stack | Elasticsearch, Logstash, Kibana |
| loki-logging | Grafana Loki log aggregation |
| datadog | Datadog monitoring and APM |
| new-relic | New Relic observability |
| alerting-oncall | Alert rules and on-call rotations |
AI Engineering
| Skill | Description |
|---|---|
| agent-observability | Tracing, latency, token, and cost telemetry for agents |
| agent-evals | Automated regression and safety eval suites for agents |
| llm-cost-optimization | Cut LLM API costs with caching, batching, model routing, and self-hosting |
| llm-caching | Exact and semantic caching layers to reduce API calls by 30-70% |
| ai-pipeline-orchestration | Orchestrate RAG ingestion, training, and batch inference with Prefect/Airflow |
| llmops-platform-engineering | Build enterprise LLMOps platforms with evaluation gates, promotions, and governance |
| model-registry-governance | Model metadata, approvals, lifecycle policy, and auditable promotion controls |
| rag-observability-evals | Measure retrieval quality, groundedness, and RAG regressions continuously |
| ai-sre-incident-response | AI-specific SRE playbooks for model outages, quality regressions, and spend spikes |
Platform Engineering
| Skill | Description |
|---|---|
| platform-engineering | Build internal developer platforms with Backstage, Crossplane, and golden paths |
Developer Experience
| Skill | Description |
|---|---|
| devcontainers-nix | Reproducible dev environments with Dev Containers, Nix, and Devbox |
Release Management
| Skill | Description |
|---|---|
| git-workflow | Branching strategies and PR workflows |
| semantic-versioning | Automated versioning and changelogs |
| feature-flags | LaunchDarkly, Unleash |
| blue-green-deploy | Zero-downtime deployments |
🔒 Security (35+ skills)
Scanning
| Skill | Description |
|---|---|
| vulnerability-scanning | CVE scanning with Trivy, Grype |
| sast-scanning | Semgrep, CodeQL, SonarQube |
| dast-scanning | OWASP ZAP, Nuclei |
| dependency-scanning | Snyk, Dependabot |
| container-scanning | Image vulnerability scanning |
| sbom-supply-chain | SBOM generation, signing, and provenance verification |
| supply-chain-attack-response | Detect, respond to, and prevent software supply chain attacks |
Secrets Management
| Skill | Description |
|---|---|
| hashicorp-vault | Vault setup, policies, secrets engines |
| aws-secrets-manager | AWS secrets and rotation |
| azure-keyvault | Azure Key Vault |
| gcp-secret-manager | GCP Secret Manager |
| sops-encryption | Mozilla SOPS |
Hardening
| Skill | Description |
|---|---|
| linux-hardening | CIS benchmarks, sysctl, SSH |
| windows-hardening | Windows security baselines |
| container-hardening | Secure Docker/K8s configs |
| kubernetes-hardening | K8s security contexts and policies |
| cis-benchmarks | CIS benchmark auditing |
| openclaw-deployment-hardening | OpenClaw CI/CD, container, and runtime hardening |
Network Security
| Skill | Description |
|---|---|
| firewall-config | iptables, UFW, cloud firewalls |
| waf-setup | AWS WAF, Cloudflare WAF |
| zero-trust | Zero-trust architecture |
| vpn-setup | WireGuard, OpenVPN |
| ssl-tls-management | Let's Encrypt, certificate management |
Security Operations
| Skill | Description |
|---|---|
| incident-response | IR playbooks and evidence collection |
| threat-modeling | STRIDE methodology |
| penetration-testing | Authorized security testing |
| security-automation | Security workflow automation |
AI Security
| Skill | Description |
|---|---|
| ai-agent-security | Defend agents against injection, tool abuse, and exfiltration |
| llm-app-security | Harden LLM app inputs, outputs, and tenant isolation |
| mcp-server-security | Secure MCP servers with auth, tool authorization, and audit logging |
| ai-coding-agent-guardrails | Safe Claude Code/Cursor/Codex usage with permission boundaries |
| ai-security-hardening | Harden LLM deployments against prompt injection and model theft |
| prompt-injection-defense | Multi-layer prompt injection defense with detection code |
| ai-red-teaming | Adversarial AI red team programs and testing frameworks |
| model-supply-chain-security | Model signing, provenance, and trusted promotion policies |
☁️ Infrastructure (65+ skills)
AWS
| Skill | Description |
|---|---|
| terraform-aws | AWS infrastructure as code |
| cloudformation | CloudFormation templates |
| aws-ec2 | EC2 instances and AMIs |
| aws-ecs-fargate | Container orchestration |
| aws-lambda | Serverless functions |
| aws-rds | Managed databases |
| aws-s3 | Object storage |
| aws-vpc | Networking |
| aws-iam | Identity and access |
| aws-cost-optimization | FinOps cost reduction and spend governance |
Cloudflare
| Skill | Description |
|---|---|
| cloudflare-workers | Edge functions and APIs with Wrangler |
| cloudflare-pages | Static/full-stack deployments with previews |
| cloudflare-r2 | S3-compatible object storage without egress fees |
| cloudflare-zero-trust | Access policies and private app protection |
Azure
| Skill | Description |
|---|---|
| terraform-azure | Azure infrastructure as code |
| arm-templates | ARM/Bicep templates |
| azure-vms | Virtual machines |
| azure-functions | Serverless |
| azure-aks | Kubernetes |
| azure-sql | Databases |
| azure-networking | VNets and NSGs |
GCP
| Skill | Description |
|---|---|
| terraform-gcp | GCP infrastructure as code |
| gcp-compute | Compute Engine |
| gcp-cloud-functions | Serverless |
| gcp-gke | Kubernetes |
| gcp-cloud-sql | Databases |
| gcp-networking | VPCs and firewall |
IaC
| Skill | Description |
|---|---|
| opentofu-migration | Migrate from Terraform to the open-source OpenTofu fork |
Server Management
| Skill | Description |
|---|---|
| linux-administration | Core Linux admin |
| windows-server | Windows administration |
| ssh-configuration | SSH and bastion hosts |
| user-management | Users, groups, sudo |
| systemd-services | Services and timers |
| performance-tuning | System optimization |
| gpu-server-management | NVIDIA GPU driver setup, MIG partitioning, DCGM monitoring |
Networking
| Skill | Description |
|---|---|
| dns-management | DNS and Route53 |
| load-balancing | ALB, nginx, HAProxy |
| cdn-setup | CloudFront, Cloudflare |
| reverse-proxy | nginx, Traefik |
| service-mesh | Istio, Linkerd |
| llm-gateway | Unified LLM API gateway with routing, rate limiting, and semantic caching |
| ai-inference-service-mesh | Service mesh for mTLS, canary inference routing, and resilient AI traffic |
Databases
| Skill | Description |
|---|---|
| postgresql | PostgreSQL admin |
| mysql | MySQL/MariaDB |
| planetscale | Branch-based MySQL schema deployments |
| mongodb | MongoDB clusters |
| redis | Redis caching |
| database-backups | Backup strategies |
| vector-database-ops | Qdrant, Weaviate, and pgvector for AI search and RAG |
Storage
| Skill | Description |
|---|---|
| block-storage | EBS, LVM |
| object-storage | S3, MinIO |
| nfs-storage | NFS servers |
| backup-recovery | Backup with restic |
Platforms
| Skill | Description |
|---|---|
| vercel-deployments | Preview and production web app deployments |
| convex-backend | Realtime managed backend with typed functions |
| firebase-app-platform | Firebase auth, data, functions, and hosting |
Local AI Infrastructure
| Skill | Description |
|---|---|
| ollama-stack | Private local inference stack with Ollama and Open WebUI |
| mac-mini-llm-lab | Mac mini setup for always-on local LLM serving |
| openclaw-local-mac-mini | OpenClaw local development and Mac mini hosting |
| openclaw-security-hardening | OpenClaw host, auth, secrets, and network hardening |
| vllm-server | High-throughput LLM serving with vLLM and PagedAttention |
| llm-inference-scaling | Auto-scale LLM inference on Kubernetes with KEDA |
| rag-infrastructure | Production RAG with vector stores, hybrid search, and reranking |
| llm-fine-tuning | QLoRA and full fine-tuning with Axolotl and DeepSpeed |
| gpu-kubernetes-operations | GPU Kubernetes with MIG, autoscaling, and AI cost controls |
| multi-tenant-llm-hosting | Multi-tenant LLM hosting with quotas and isolation |
IT Operations
| Skill | Description |
|---|---|
| startup-it-troubleshooting | Practical IT troubleshooting for small teams |
| mdm-device-management | Manage and secure company devices with Fleet, Jamf, or Intune |
| identity-access-management | SSO, SCIM provisioning, and MFA with Google Workspace or Okta |
| saas-security-posture | Audit and harden your SaaS stack (GitHub, Slack, Google Workspace) |
📋 Compliance (20+ skills)
Frameworks
| Skill | Description |
|---|---|
| soc2-compliance | SOC2 Trust Services Criteria |
| hipaa-compliance | HIPAA security rules |
| gdpr-compliance | GDPR data protection |
| pci-dss-compliance | PCI-DSS requirements |
| iso27001-compliance | ISO 27001 ISMS |
| fedramp-compliance | FedRAMP controls |
Governance
| Skill | Description |
|---|---|
| policy-as-code | OPA, Kyverno, Checkov |
| access-review | IAM access reviews |
| change-management | Change control |
| asset-inventory | Asset tracking |
| vendor-management | Third-party security |
Auditing
| Skill | Description |
|---|---|
| audit-logging | Centralized audit logs |
| aws-cloudtrail | CloudTrail configuration |
| azure-monitor-audit | Azure Monitor logs |
| gcp-audit-logs | GCP Cloud Audit Logs |
Business Continuity
| Skill | Description |
|---|---|
| disaster-recovery | DR strategies |
| business-continuity | BCP planning |
| incident-management | Incident processes |
| runbook-creation | Operational runbooks |
🤝 Contributing
Found a gap? Want to add a skill? PRs are welcome!
See CONTRIBUTING.md for guidelines.
If this made your agent smarter, star this repo — it helps others find it.
Built by Toby Miller