Home
Softono
openclaw-ops

openclaw-ops

Open source MIT Shell
223
Stars
24
Forks
3
Issues
2
Watchers
3 weeks
Last Commit

About openclaw-ops

OpenClaw operations skill with health checks, repair scripts, watchdogs, update triage, and security scans.

Platforms

Web Self-hosted

Languages

Shell

Links

openclaw-ops

OpenClaw gateway operations skill for agent environments. Covers health checks, repair workflows, continuous monitoring, session analysis, update-change detection, and security review for a local or self-hosted OpenClaw install.

Tested against OpenClaw 2026.5.4.

What it does

Skill

  • /openclaw-ops — full triage and configuration: gateway, auth, exec approvals, cron jobs, channels, sessions, and installation

Scripts

Script Purpose
scripts/heal.sh One-shot auto-fix for the most common gateway issues
scripts/post-update.sh Explicit post-update orchestrator: check-update, heal, workspace reconcile, security scan, final health check, policy-guard sentinel trigger
scripts/watchdog.sh Runs every 5 min, restarts gateway if down, escalates after 3 failures
scripts/watchdog-install.sh Installs the watchdog as a macOS LaunchAgent
scripts/watchdog-uninstall.sh Removes the LaunchAgent
scripts/check-update.sh Detects version changes, explains broken config, auto-fix with --fix
scripts/health-check.sh Declarative URL/process health checks; auto-generates targets file on first run
scripts/security-scan.sh Config hardening and credential exposure scan (0–100 score); skips bulky runtime/log/session history unless --include-sessions is passed
scripts/skill-audit.sh Static security audit for third-party skills before installation
scripts/codex-perf-check.sh Check and fix GPT-5.x performance opt-ins that ship disabled by default; --fix to apply
scripts/workspace-auto-commit.sh Local-only git snapshot helper for OpenClaw workspace repos; defaults to ~/.openclaw/workspace, supports --workspace and --all, never pushes
scripts/workspace-git-audit.sh Audits ~/.openclaw/workspace* repos for git status and auto-commit cron coverage; --show-cron prints setup commands for uncovered repos
scripts/session-monitor.sh Behavioral checks over live session JSONL files; detects retry loops, stuck runs, auth errors
scripts/session-search.sh Full-text session search with structured output and secret redaction
scripts/session-resume.sh Compaction-first markdown resume for a single session, including failure context
scripts/daily-digest.sh Incident, activity, watchdog, and cost summary for the last N hours
scripts/incident-manager.sh Shared machine-readable incident lifecycle helper (sourced by other scripts; state/logs under ~/.openclaw/logs)
scripts/remediation-board.sh Human/agent repair board for recurring bugs, hacks/workarounds, upstream watches, incident notes, status transitions, evidence, and verification checks
scripts/lib.sh Shared helpers: logging, port resolution, state files, sanitization (sourced)

Prerequisites

Tool Required for
openclaw everything
python3 heal.sh, lib.sh, watchdog.sh, session scripts
curl watchdog.sh, health-check.sh HTTP checks
openssl heal.sh auth token generation
rg (ripgrep) session-search.sh
launchctl + macOS watchdog-install.sh (LaunchAgent)
osascript watchdog.sh macOS notifications (optional)

Linux: watchdog-install.sh is macOS only. Use cron instead:

*/5 * * * * bash /path/to/scripts/watchdog.sh >> ~/.openclaw/logs/watchdog.log 2>&1

Minimum version

v2026.2.12 or later. Versions before this contain critical CVEs (including CVE-2026-25253 plus additional SSRF, path traversal, and prompt-injection fixes).

openclaw --version

Quick start

# 1. One-time heal pass
bash scripts/heal.sh

# 2. After every OpenClaw update, run the explicit post-update hook
bash scripts/post-update.sh

#    The hook also runs the VPS workspace reconcile script when present and
#    touches ~/.openclaw/state/policy-guard.trigger so a VPS can react through
#    openclaw-policy-guard.path after the update.

# 3. If you only want the update triage report:
bash scripts/check-update.sh        # report only
bash scripts/check-update.sh --fix  # report + auto-fix

# 4. Install always-on watchdog (macOS)
bash scripts/watchdog-install.sh

# 5. View watchdog log
tail -f ~/.openclaw/logs/watchdog.log

# 6. View incident history
cat ~/.openclaw/logs/heal-incidents.jsonl

# 7. Run health checks — targets file is auto-generated on first run
bash scripts/health-check.sh --verbose

# 8. Audit workspace git protection and print cron setup suggestions
bash scripts/workspace-git-audit.sh --show-cron

Gateway port

Scripts read the gateway port directly from ~/.openclaw/openclaw.json — no hardcoded port, no manual setup. Override with the OPENCLAW_GATEWAY_PORT env var if needed.

Session monitoring

# Scan all active sessions for behavioral issues
bash scripts/session-monitor.sh --verbose

# Search session history
bash scripts/session-search.sh "unauthorized" --limit 10

# Build a resume for a specific session
bash scripts/session-resume.sh ~/.openclaw/agents/knox/sessions/<session>.jsonl

# 24-hour digest: incidents, activity, costs
bash scripts/daily-digest.sh --hours 24

# Track cron/ops findings to completion
bash scripts/remediation-board.sh import-cron-errors
bash scripts/remediation-board.sh list

# Mandatory: when investigation finds a real local OpenClaw error,
# regression, hack/workaround, security concern, or recurring ops finding,
# create/update a board item immediately. The board is the local repair loop;
# upstream links are optional metadata only when an external fix also exists.
bash scripts/remediation-board.sh add-incident log-gap "Log sweep missed runtime errors" --evidence "tmp OpenClaw log contained active failure"
bash scripts/remediation-board.sh close-criteria log-gap "Local log-sweep catches the failure class and the installed workflow is synced"

# Track a recurring bug without starting over next time
bash scripts/remediation-board.sh list --type incident
bash scripts/remediation-board.sh show telegram-split
bash scripts/remediation-board.sh add-incident telegram-split "Telegram topic replies split into many messages" --evidence "Observed in forum topic" --next "Check upstream issue and local channel config"
bash scripts/remediation-board.sh hypothesis telegram-split "Preview draft lane may send instead of edit" --confidence medium
bash scripts/remediation-board.sh tried telegram-split --step "Checked release notes" --result "Found related Telegram delivery fixes"
bash scripts/remediation-board.sh workaround telegram-split "Use explicit message.send for topic-visible replies"
bash scripts/remediation-board.sh export-note telegram-split

Watchdog escalation model

  1. Tier 1 — HTTP ping every 5 min (LaunchAgent or cron)
  2. Tier 2 — Gateway restart + heal.sh if simple restart fails
  3. Tier 3 — macOS notification after 3 failed attempts in 15 min

Platform support

Platform heal.sh watchdog LaunchAgent
macOS
Linux ✓ (via cron)
Windows WSL2 ✓ (via cron)

Viewing logs

macOS:

tail -f ~/.openclaw/logs/gateway.err.log
tail -f ~/.openclaw/logs/watchdog.log

Linux (systemd):

journalctl --user -u openclaw-gateway -f

Notes

  • health-check.sh creates ~/.openclaw/health-targets.conf automatically on first run using the port from openclaw.json. Edit the file to add custom targets or adjust thresholds.
  • health-check.sh can report a process uptime failure immediately after openclaw gateway restart if the target has a minimum uptime threshold (e.g. 300s). That is expected — lower the threshold during smoke tests, then restore it.
  • security-scan.sh reports file paths and line numbers for suspected secrets, but redacts the secret values themselves.
  • check-update.sh is intended for real post-upgrade triage. It is normal to report a version change the first time it runs after an upgrade.
  • post-update.sh is the explicit post-update orchestrator. It skips the heavy sequence when the current version matches the stored state and otherwise runs check-update.sh --fix, heal.sh, the workspace reconcile script if present, security-scan.sh, and a final openclaw health --json.
  • On the VPS, the workspace reconcile stage refreshes model policy, auth/profile state, voice defaults, and the gateway service through openclaw_post_update_reconcile.py (or the equivalent systemd oneshot wrapper).
  • After the health check it best-effort touches ~/.openclaw/state/policy-guard.trigger (creating parent dirs if needed). The VPS can wire openclaw-policy-guard.path to that sentinel after updates.
  • Set OPENCLAW_POST_UPDATE_RECONCILE_SCRIPT (and optionally OPENCLAW_POST_UPDATE_RECONCILE_INTERPRETER) if the reconcile script lives somewhere other than the default workspace path.
  • If another wrapper or automation layer launches the post-update hook, set OPENCLAW_SKIP_WRAPPER_BACKUP=1 for nested openclaw calls so internal subcommands do not trigger backup loops.
  • codex-perf-check.sh requires v2026.4.x or later — the four settings it checks do not exist in earlier releases.

Running tests

bash tests/run.sh

Open-Source Release Checklist

  • Remove any local ~/.openclaw state, logs, or example outputs from the repository.
  • Do not publish screenshots or pasted scan output that contain real tokens, session material, or private channel identifiers.
  • Keep examples generic: placeholder tokens, placeholder user IDs, and non-sensitive hostnames only.
  • Run bash tests/run.sh before publishing changes.