About DevOps-Security-Agent-Skills

Agent-ready DevOps, security, infrastructure, and compliance knowledge base with 80+ skills across Kubernetes, Terraform, AWS/Azure/GCP, AI platform operations, container hardening, SOC2/ISO27001, and incident response—plus ready-to-run scripts, templates, and playbooks for SRE, platform, and security teams.

b

Published by

bagelhole

Visit View Profile

README.md

View on GitHub

🛡️ DevOps & Security Agent Skills

Your AI-Powered Second Brain for Infrastructure & Security

160+ production-ready skills for Claude Code, Cursor, Codex, and every AI agent that reads files.

Explore Skills · Install in 30 Seconds · Contribute

Why This Exists

Install these skills and your agent gains expert-level knowledge of:

Domain	Skills	What Your Agent Learns
🔧 DevOps	40+	CI/CD pipelines, K8s ops, observability, release strategies, platform engineering
🔒 Security	35+	Vulnerability scanning, secrets management, hardening, AI agent security, MCP security
☁️ Infrastructure	65+	AWS, Azure, GCP, Cloudflare, databases, networking, GPU clusters, local AI
🤖 AI Engineering	20+	LLMOps, agent evals, RAG infrastructure, inference scaling, coding agent guardrails
📋 Compliance	20+	SOC2, HIPAA, GDPR, PCI-DSS, policy-as-code, auditing
💻 IT Operations	5+	Device management, identity/SSO, SaaS security, troubleshooting

30-Second Install

# Install all skills to Claude Code, Cursor, Codex, or any supported agent
npx skills add bagelhole/DevOps-Security-Agent-Skills

# Install specific skills
npx skills add bagelhole/DevOps-Security-Agent-Skills --skill kubernetes-ops --skill hashicorp-vault -a cursor -y

# Or clone directly
git clone https://github.com/bagelhole/DevOps-Security-Agent-Skills.git ~/.skills/devops-security

Works with Claude Code, Cursor, Codex, OpenCode, Cline, and many more.

What Makes This Different

Most "awesome lists" give you links. This repo gives your AI agent production-ready knowledge it can act on:

# Every skill includes real, copy-pasteable configs like this:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: myapp
        image: myapp:1.0.0
        resources:
          requests: { memory: "128Mi", cpu: "100m" }
          limits: { memory: "256Mi", cpu: "500m" }
        securityContext:
          runAsNonRoot: true
          readOnlyRootFilesystem: true

What's in Each Skill

skill/
├── SKILL.md          # 250-400+ lines of expert knowledge
│   ├── When to Use   # Decision guidance
│   ├── Prerequisites # What you need
│   ├── Real Configs  # Copy-pasteable YAML, JSON, HCL, Bash
│   ├── CLI Commands  # Exact commands to run
│   ├── Troubleshooting # Common issues + fixes
│   └── Related Skills  # Cross-references
├── scripts/          # Ready-to-run automation
├── references/       # Deep-dive guides
└── assets/           # Config templates

Hot Topics (March 2026)

Skills you won't find in other repos:

Skill	Why It's Hot
MCP Server Security	MCP is everywhere — secure your tool servers
AI Coding Agent Guardrails	Safe Claude Code/Cursor/Codex usage for teams
eBPF Observability	Kernel-level monitoring with Cilium & Tetragon
Platform Engineering	Build internal developer platforms with Backstage
Supply Chain Attack Response	Detect & respond to compromised dependencies
OpenTofu Migration	Migrate from Terraform to the open-source fork
Dev Containers & Nix	Reproducible dev environments for teams
Agent Evals	CI/CD gates for AI agent quality & safety

How It Works

Agent Skills is an open format for extending AI agents. Each SKILL.md has YAML frontmatter that agents load for matching, and detailed instructions that load only when activated:

┌────────────────────────────────────────────────────────────────┐
│  1. DISCOVER         2. MATCH            3. ACTIVATE           │
│                                                                │
│  Agent scans      →  User asks about  →  Agent reads full     │
│  skill folders       Kubernetes          SKILL.md + runs      │
│  at startup          debugging           scripts as needed    │
└────────────────────────────────────────────────────────────────┘

🏃 Quick Start

Option 1: skills.sh CLI (Recommended)

The skills CLI discovers every SKILL.md in this repository and installs them into your agent's skills directory. See CLI docs and FAQ.

# Install all skills
npx skills add bagelhole/DevOps-Security-Agent-Skills

# List available skills
npx skills add bagelhole/DevOps-Security-Agent-Skills --list

# Install specific skills to a specific agent
npx skills add bagelhole/DevOps-Security-Agent-Skills --skill kubernetes-ops --skill hashicorp-vault -a cursor -y

# Global install
npx skills add bagelhole/DevOps-Security-Agent-Skills -g -y

# Install a single skill by URL
npx skills add https://github.com/bagelhole/DevOps-Security-Agent-Skills/tree/main/devops/orchestration/kubernetes-ops

Install from a local clone: npx skills add . --list from the repo root.

Option 2: Clone or Submodule

# Clone
git clone https://github.com/bagelhole/DevOps-Security-Agent-Skills.git ~/.skills/devops-security

# Or add as a submodule
git submodule add https://github.com/bagelhole/DevOps-Security-Agent-Skills.git .skills/devops-security

Option 3: For Humans

No agent? No problem. Browse the skills, copy the configs, run the scripts. MIT licensed — go wild.

📚 Skill Catalog

🔧 DevOps (40+ skills)

CI/CD

Skill	Description
github-actions	Build, test, and deploy with GitHub Actions
gitlab-ci	GitLab CI/CD pipelines and runners
jenkins	Jenkins pipelines and shared libraries
azure-devops	Azure Pipelines and release management
circleci	CircleCI workflows and orbs

Containers

Skill	Description
docker-management	Docker images, multi-stage builds, optimization
docker-compose	Multi-container applications
podman	Rootless container management
container-registries	ECR, ACR, GCR, Docker Hub

Orchestration

Skill	Description
kubernetes-ops	Deploy, scale, troubleshoot K8s
helm-charts	Helm chart development and deployment
argocd-gitops	GitOps with ArgoCD
kustomize	Kubernetes manifest customization
openshift	OpenShift cluster management
model-serving-kubernetes	KServe and Triton model serving with canary deployments and GPU autoscaling

Observability

Skill	Description
prometheus-grafana	Metrics and dashboards
opentelemetry	Vendor-neutral traces, metrics, and logs
ebpf-observability	Kernel-level observability with Cilium, Tetragon, and bpftrace
elk-stack	Elasticsearch, Logstash, Kibana
loki-logging	Grafana Loki log aggregation
datadog	Datadog monitoring and APM
new-relic	New Relic observability
alerting-oncall	Alert rules and on-call rotations

AI Engineering

Skill	Description
agent-observability	Tracing, latency, token, and cost telemetry for agents
agent-evals	Automated regression and safety eval suites for agents
llm-cost-optimization	Cut LLM API costs with caching, batching, model routing, and self-hosting
llm-caching	Exact and semantic caching layers to reduce API calls by 30-70%
ai-pipeline-orchestration	Orchestrate RAG ingestion, training, and batch inference with Prefect/Airflow
llmops-platform-engineering	Build enterprise LLMOps platforms with evaluation gates, promotions, and governance
model-registry-governance	Model metadata, approvals, lifecycle policy, and auditable promotion controls
rag-observability-evals	Measure retrieval quality, groundedness, and RAG regressions continuously
ai-sre-incident-response	AI-specific SRE playbooks for model outages, quality regressions, and spend spikes

Platform Engineering

Skill	Description
platform-engineering	Build internal developer platforms with Backstage, Crossplane, and golden paths

Developer Experience

Skill	Description
devcontainers-nix	Reproducible dev environments with Dev Containers, Nix, and Devbox

Release Management

Skill	Description
git-workflow	Branching strategies and PR workflows
semantic-versioning	Automated versioning and changelogs
feature-flags	LaunchDarkly, Unleash
blue-green-deploy	Zero-downtime deployments

🔒 Security (35+ skills)

Scanning

Skill	Description
vulnerability-scanning	CVE scanning with Trivy, Grype
sast-scanning	Semgrep, CodeQL, SonarQube
dast-scanning	OWASP ZAP, Nuclei
dependency-scanning	Snyk, Dependabot
container-scanning	Image vulnerability scanning
sbom-supply-chain	SBOM generation, signing, and provenance verification
supply-chain-attack-response	Detect, respond to, and prevent software supply chain attacks

Secrets Management

Skill	Description
hashicorp-vault	Vault setup, policies, secrets engines
aws-secrets-manager	AWS secrets and rotation
azure-keyvault	Azure Key Vault
gcp-secret-manager	GCP Secret Manager
sops-encryption	Mozilla SOPS

Hardening

Skill	Description
linux-hardening	CIS benchmarks, sysctl, SSH
windows-hardening	Windows security baselines
container-hardening	Secure Docker/K8s configs
kubernetes-hardening	K8s security contexts and policies
cis-benchmarks	CIS benchmark auditing
openclaw-deployment-hardening	OpenClaw CI/CD, container, and runtime hardening

Network Security

Skill	Description
firewall-config	iptables, UFW, cloud firewalls
waf-setup	AWS WAF, Cloudflare WAF
zero-trust	Zero-trust architecture
vpn-setup	WireGuard, OpenVPN
ssl-tls-management	Let's Encrypt, certificate management

Security Operations

Skill	Description
incident-response	IR playbooks and evidence collection
threat-modeling	STRIDE methodology
penetration-testing	Authorized security testing
security-automation	Security workflow automation

AI Security

Skill	Description
ai-agent-security	Defend agents against injection, tool abuse, and exfiltration
llm-app-security	Harden LLM app inputs, outputs, and tenant isolation
mcp-server-security	Secure MCP servers with auth, tool authorization, and audit logging
ai-coding-agent-guardrails	Safe Claude Code/Cursor/Codex usage with permission boundaries
ai-security-hardening	Harden LLM deployments against prompt injection and model theft
prompt-injection-defense	Multi-layer prompt injection defense with detection code
ai-red-teaming	Adversarial AI red team programs and testing frameworks
model-supply-chain-security	Model signing, provenance, and trusted promotion policies

☁️ Infrastructure (65+ skills)

AWS

Skill	Description
terraform-aws	AWS infrastructure as code
cloudformation	CloudFormation templates
aws-ec2	EC2 instances and AMIs
aws-ecs-fargate	Container orchestration
aws-lambda	Serverless functions
aws-rds	Managed databases
aws-s3	Object storage
aws-vpc	Networking
aws-iam	Identity and access
aws-cost-optimization	FinOps cost reduction and spend governance

Cloudflare

Skill	Description
cloudflare-workers	Edge functions and APIs with Wrangler
cloudflare-pages	Static/full-stack deployments with previews
cloudflare-r2	S3-compatible object storage without egress fees
cloudflare-zero-trust	Access policies and private app protection

Azure

Skill	Description
terraform-azure	Azure infrastructure as code
arm-templates	ARM/Bicep templates
azure-vms	Virtual machines
azure-functions	Serverless
azure-aks	Kubernetes
azure-sql	Databases
azure-networking	VNets and NSGs

GCP

Skill	Description
terraform-gcp	GCP infrastructure as code
gcp-compute	Compute Engine
gcp-cloud-functions	Serverless
gcp-gke	Kubernetes
gcp-cloud-sql	Databases
gcp-networking	VPCs and firewall

IaC

Skill	Description
opentofu-migration	Migrate from Terraform to the open-source OpenTofu fork

Server Management

Skill	Description
linux-administration	Core Linux admin
windows-server	Windows administration
ssh-configuration	SSH and bastion hosts
user-management	Users, groups, sudo
systemd-services	Services and timers
performance-tuning	System optimization
gpu-server-management	NVIDIA GPU driver setup, MIG partitioning, DCGM monitoring

Networking

Skill	Description
dns-management	DNS and Route53
load-balancing	ALB, nginx, HAProxy
cdn-setup	CloudFront, Cloudflare
reverse-proxy	nginx, Traefik
service-mesh	Istio, Linkerd
llm-gateway	Unified LLM API gateway with routing, rate limiting, and semantic caching
ai-inference-service-mesh	Service mesh for mTLS, canary inference routing, and resilient AI traffic

Databases

Skill	Description
postgresql	PostgreSQL admin
mysql	MySQL/MariaDB
planetscale	Branch-based MySQL schema deployments
mongodb	MongoDB clusters
redis	Redis caching
database-backups	Backup strategies
vector-database-ops	Qdrant, Weaviate, and pgvector for AI search and RAG

Storage

Skill	Description
block-storage	EBS, LVM
object-storage	S3, MinIO
nfs-storage	NFS servers
backup-recovery	Backup with restic

Platforms

Skill	Description
vercel-deployments	Preview and production web app deployments
convex-backend	Realtime managed backend with typed functions
firebase-app-platform	Firebase auth, data, functions, and hosting

Local AI Infrastructure

Skill	Description
ollama-stack	Private local inference stack with Ollama and Open WebUI
mac-mini-llm-lab	Mac mini setup for always-on local LLM serving
openclaw-local-mac-mini	OpenClaw local development and Mac mini hosting
openclaw-security-hardening	OpenClaw host, auth, secrets, and network hardening
vllm-server	High-throughput LLM serving with vLLM and PagedAttention
llm-inference-scaling	Auto-scale LLM inference on Kubernetes with KEDA
rag-infrastructure	Production RAG with vector stores, hybrid search, and reranking
llm-fine-tuning	QLoRA and full fine-tuning with Axolotl and DeepSpeed
gpu-kubernetes-operations	GPU Kubernetes with MIG, autoscaling, and AI cost controls
multi-tenant-llm-hosting	Multi-tenant LLM hosting with quotas and isolation

IT Operations

Skill	Description
startup-it-troubleshooting	Practical IT troubleshooting for small teams
mdm-device-management	Manage and secure company devices with Fleet, Jamf, or Intune
identity-access-management	SSO, SCIM provisioning, and MFA with Google Workspace or Okta
saas-security-posture	Audit and harden your SaaS stack (GitHub, Slack, Google Workspace)

📋 Compliance (20+ skills)

Frameworks

Skill	Description
soc2-compliance	SOC2 Trust Services Criteria
hipaa-compliance	HIPAA security rules
gdpr-compliance	GDPR data protection
pci-dss-compliance	PCI-DSS requirements
iso27001-compliance	ISO 27001 ISMS
fedramp-compliance	FedRAMP controls

Governance

Skill	Description
policy-as-code	OPA, Kyverno, Checkov
access-review	IAM access reviews
change-management	Change control
asset-inventory	Asset tracking
vendor-management	Third-party security

Auditing

Skill	Description
audit-logging	Centralized audit logs
aws-cloudtrail	CloudTrail configuration
azure-monitor-audit	Azure Monitor logs
gcp-audit-logs	GCP Cloud Audit Logs

Business Continuity

Skill	Description
disaster-recovery	DR strategies
business-continuity	BCP planning
incident-management	Incident processes
runbook-creation	Operational runbooks

🤝 Contributing

Found a gap? Want to add a skill? PRs are welcome!

See CONTRIBUTING.md for guidelines.

If this made your agent smarter, star this repo — it helps others find it.

Built by Toby Miller

⬆ Back to Top

DevOps-Security-Agent-Skills

About DevOps-Security-Agent-Skills

Platforms

Languages

Links

README.md

🛡️ DevOps & Security Agent Skills

Your AI-Powered Second Brain for Infrastructure & Security

Why This Exists

30-Second Install

What Makes This Different

What's in Each Skill

Hot Topics (March 2026)

How It Works

🏃 Quick Start

Option 1: skills.sh CLI (Recommended)

Option 2: Clone or Submodule

Option 3: For Humans

📚 Skill Catalog

CI/CD

Containers

Orchestration

Observability

AI Engineering

Platform Engineering

Developer Experience

Release Management

Scanning

Secrets Management

Hardening

Network Security

Security Operations

AI Security

AWS

Cloudflare

Azure

GCP

IaC

Server Management

Networking

Databases

Storage

Platforms

Local AI Infrastructure

IT Operations

Frameworks

Governance

Auditing

Business Continuity

🤝 Contributing

If this made your agent smarter, star this repo — it helps others find it.