muad-dib

MUAD'DIB Logo

Installation | Usage | Features | VS Code | CI/CD

Why MUAD'DIB?

npm and PyPI supply-chain attacks are exploding. Shai-Hulud compromised 25K+ repos in 2025. Existing tools detect threats but don't help you respond.

MUAD'DIB combines 20 parallel scanners (264 detection rules), a deobfuscation engine, inter-module dataflow analysis, compound scoring (17 compound rules), and a gVisor/Docker sandbox to detect known threats and suspicious behavioral patterns in npm and PyPI packages. An XGBoost classifier exists in the codebase but is currently inactive (see Evaluation Metrics → ML Classifier section).

Positioning

MUAD'DIB is an educational tool and a free first line of defense. It detects known npm and PyPI threats (225,000+ IOCs) and suspicious behavioral patterns.

For enterprise protection, use:

Socket.dev - ML behavioral analysis, cloud sandboxing
Snyk - Massive vulnerability database, CI/CD integrations
Opengrep - Advanced dataflow analysis, Semgrep rules

Installation

npm (recommended)

npm install -g muaddib-scanner

From source

git clone https://github.com/DNSZLSK/muad-dib
cd muad-dib
npm install
npm link

Usage

Basic scan

muaddib scan .
muaddib scan /path/to/project

Scans both npm (package.json, node_modules) and Python (requirements.txt, setup.py, pyproject.toml) dependencies.

Interactive mode

muaddib

Safe install

muaddib install <package>
muaddib install lodash axios --save-dev
muaddib install suspicious-pkg --force    # Force install despite threats

Scans packages for threats BEFORE installing. Blocks known malicious packages.

Risk score

Each scan displays a 0-100 risk score:

[SCORE] 58/100 [***********---------] HIGH

Explain mode

muaddib scan . --explain

Shows rule ID, MITRE ATT&CK technique, references, and response playbook for each detection.

Export

muaddib scan . --json > results.json     # JSON
muaddib scan . --html report.html        # HTML
muaddib scan . --sarif results.sarif     # SARIF (GitHub Security)

Severity threshold

muaddib scan . --fail-on critical  # Fail only on CRITICAL
muaddib scan . --fail-on high      # Fail on HIGH and CRITICAL (default)

Paranoid mode

muaddib scan . --paranoid

Ultra-strict detection with lower tolerance. Detects any network access, subprocess execution, dynamic code evaluation, and sensitive file access.

Webhook alerts

muaddib scan . --webhook "https://discord.com/api/webhooks/..."

Strict filtering (v2.1.2): alerts only for IOC matches, sandbox-confirmed threats, or canary token exfiltration. Priority triage (v2.10.21): P1 (red, IOC/sandbox/canary), P2 (orange, high-score/compounds), P3 (yellow, rest).

Behavioral anomaly detection (v2.0)

muaddib scan . --temporal-full     # All 4 temporal features
muaddib scan . --temporal          # Sudden lifecycle script detection
muaddib scan . --temporal-ast      # AST diff between versions
muaddib scan . --temporal-publish  # Publish frequency anomaly
muaddib scan . --temporal-maintainer # Maintainer change detection

Detects supply-chain attacks before they appear in IOC databases by analyzing changes between package versions. See Evaluation Methodology for details.

Docker sandbox

muaddib sandbox <package-name>
muaddib sandbox <package-name> --strict

Dynamic analysis in an isolated Docker container: strace, tcpdump, filesystem diff, canary tokens, CI-aware environment, and monkey-patching preload for time-bomb detection (multi-run at [0h, 72h, 7d] offsets).

Other commands

muaddib watch .                    # Real-time monitoring
muaddib daemon                     # Daemon mode (auto-scan npm install)
muaddib update                     # Update IOCs (fast, ~5s)
muaddib scrape                     # Full IOC refresh (~5min)
muaddib diff HEAD~1                # Compare threats with previous commit
muaddib init-hooks                 # Pre-commit hooks (husky/pre-commit/git)
muaddib scan . --breakdown         # Explainable score decomposition
muaddib replay                     # Ground truth validation (90/94 TPR@3, v2.11.48)

Features

20 parallel scanners

Scanner	Detection
AST Parse (acorn)	eval, Function, credential theft, binary droppers, prototype hooks
Pattern Matching	Shell commands, reverse shells, dead man's switch
Dataflow Analysis	Credential read + network send (intra-file and cross-file)
Obfuscation Detection	JS obfuscation patterns (skip .min.js)
Deobfuscation Pre-processing	String concat, charcode, base64, hex array, const propagation
Inter-module Dataflow	Cross-file taint propagation (3-hop chains, class methods)
Intent Coherence	Intra-file source-sink pairing (credential + eval/network)
Typosquatting	npm + PyPI (Levenshtein distance)
Python Scanner	requirements.txt, setup.py, pyproject.toml, 14K+ PyPI IOCs
Shannon Entropy	High-entropy strings (5.5 bits + 50 chars min)
AI Config Scanner	.cursorrules, CLAUDE.md, copilot-instructions.md injection
Package/Dependencies	Lifecycle scripts, IOC matching (225K+ packages)
GitHub Actions	Shai-Hulud backdoor detection
Hash Scanner	Known malicious file hashes
IOC Strings (intel-triage P1.1)	YARA-style string matching (Axios 2026, TeamPCP, GlassWorm, CanisterSprawl)
Anti-Forensic AST (intel-triage P1.2)	XOR loop + self-delete + decoy write compound (csec autodelete)
Stub Package (intel-triage P1.3)	Tiny main file + external dep URL + lifecycle hook (ltidi chain)
Monorepo Scanner	Lerna/pnpm-workspace/turbo detection (Sprint 1 audit MR-C2 fix)
Trusted-Dep-Diff (opt-in)	Diff against trusted dep tarballs from registry (v2.10.x)
Python Source (PYSRC)	Import-time / install-time RCE patterns in `__init__.py` / `setup.py` (v2.11.41 — closes TrapDoor PyPI gap)
Python AST (PYAST)	Tree-sitter-Python AST with taint-aware detectors (v2.11.42+)

264 detection rules

All rules (259 RULES + 5 PARANOID) are mapped to MITRE ATT&CK techniques. See SECURITY.md for the complete rules reference.

Detected campaigns

Campaign	Status
GlassWorm (2026, 433+ packages)	Detected
Shai-Hulud v1/v2/v3 (2025)	Detected
event-stream (2018)	Detected
eslint-scope (2018)	Detected
Protestware (node-ipc, colors, faker)	Detected
Typosquats (crossenv, mongose, babelcli)	Detected

VS Code

The VS Code extension automatically scans your npm projects.

code --install-extension dnszlsk.muaddib-vscode

MUAD'DIB: Scan Project - Scan entire project
MUAD'DIB: Scan Current File - Scan current file
Settings: muaddib.autoScan, muaddib.webhookUrl, muaddib.failLevel

See vscode-extension/README.md for full documentation.

CI/CD

GitHub Actions (Marketplace)

name: Security Scan

on: [push, pull_request]

jobs:
  scan:
    runs-on: ubuntu-latest
    permissions:
      security-events: write
      contents: read
    steps:
      - uses: actions/checkout@v4
      - uses: DNSZLSK/muad-dib@v1
        with:
          path: '.'
          fail-on: 'high'
          sarif: 'results.sarif'

Input	Description	Default
`path`	Path to scan	`.`
`fail-on`	Minimum severity to fail	`high`
`sarif`	SARIF output file path
`paranoid`	Ultra-strict detection	`false`

Pre-commit hooks

muaddib init-hooks                        # Auto-detect (husky/pre-commit/git)
muaddib init-hooks --type husky           # Force husky
muaddib init-hooks --mode diff            # Only block NEW threats

With pre-commit framework:

repos:
  - repo: https://github.com/DNSZLSK/muad-dib
    rev: v2.11.76
    hooks:
      - id: muaddib-scan

Evaluation Metrics

Latest measurement: v2.11.48 (2026-05-26, Track D + PyPI download fix). Ground truth holds 96 samples (94 in-scope, 2 out-of-scope protestware). This run measures the full 94 in-scope set after the 2026-05-25 enrichment (Track C synthetic for the new PYSRC/PYAST/AST-092/AICONF-004/PKG-022 rules, Track A real-world tarballs recovered from VPS archive, Track B reconstructions from the in-house security-review benchmark).

Operational metrics (what an operator actually gets)

These are the numbers a user gets when running muaddib scan against npm or PyPI packages. The pipeline executes scanners + FP caps only — no ML filter is applied (see ML Classifier note below).

Metric	Result	Details
Wild TPR (Datadog 17K)	92.8% (13,538/14,587 in-scope)	17,922 packages. 3,335 skipped (no JS). By category: compromised_lib 97.8%, malicious_intent 92.1% — last measurement v2.9.4, independent of GT.
TPR@3 (detection rate, v2.11.48)	95.74% (90/94 in-scope)	Full GT re-measurement. Threshold=3: any signal. 13 PyPI samples (was 0). 4 misses incl. 3 browser-only (lottie-player, polyfill-io, trojanized-jquery).
TPR@20 (alert rate, v2.11.48)	88.30% (83/94 in-scope)	Operational alert threshold=20. +3.1pp vs v2.11.47 — Track D `recon_exfil_direct_ip` compound (MUADDIB-COMPOUND-016) closed the GT-095 gap (risk 3→50) and boosted GT-091 byvendors / GT-092 heloo131313 through `linux_fingerprint_exec`.
FPR rules (Benign curated, v2.11.48 measure)	1.10% (6/545 scanned, 548 total)	Unchanged after Track D — the new compound + types created zero new FPs (sameFile gate + public-IP-only filter). Drop from 15.6% (v2.10.95) is attributable to FP caps F1-F14 (v2.10.97 → v2.11.31). 6 remaining FPs are real (meteor, prisma, @prisma/client, drizzle-orm, scrypt, liquid).
FPR (Benign random, v2.11.48)	2.50% (5/200)	200 random npm packages, unchanged.
FPR PyPI (v2.11.48, first honest measurement)	9.68% (12/124 scanned, 132 total)	Track D fixed the PyPI downloader — removed `pip --no-binary :all:` flag (forced compile of wheel-only packages, timed out 38% of the time) + added `.whl` extraction via `extractArchive()`. Brought 42 previously-skipped giants (numpy/pandas/django/matplotlib/scikit-learn/...) into scope. All 12 FPs cluster at score 25-35: this is the cap-PyPI-35 artifact, not new rule misfires. Lifting the cap (Track E) would drop FPR PyPI to ≈0%. 8 residual fails are >500MB packages (torch, tensorflow, scipy, opencv-python, ansible…) hitting the 30s `PACK_TIMEOUT_MS`.
ADR (Adversarial + Holdout, v2.11.48)	96.26% (103/107)	67 adversarial + 40 holdout, global threshold=20. Stable vs v2.10.95.

4132 tests across 115 files. 264 rules (259 RULES + 5 PARANOID; v2.11.67/70 Phantom Gyp added PKG-023 + COMPOUND-017).

Known issues (v2.11.48):

Cap PyPI à 35/100: Python samples plafonnent à riskScore=35 even when globalRiskScore=100. Confirmed empirically — all 12 PyPI FPs at score 25-35 (flask 32, django 35, tornado 35, bottle 30, pandas 25, matplotlib 25, plotly 25, bokeh 25, pymongo 35, coverage 32, fabric 35, websockets 35). Lifting the cap will simultaneously drop FPR PyPI to ≈0% and unblock PyPI MALWARE detection at higher thresholds. Track E target.

Operational coverage (v2.11.67-76)

The static ground-truth TPR above is measured offline. Since v2.11.67 the monitor also tracks operational coverage on live npm/PyPI ingestion:

A per-scan ledger (data/scan-ledger.jsonl) records every scanned package's outcome; computeLedgerRollup() produces a 24h rollup (alertRate, per-ecosystem). Note: alertRate is a throughput signal, not detection TPR.
An active GHSA poller (~15 min; npm, pypi, crates) builds an authoritative "what should we have caught" denominator (data/ghsa-malware.jsonl), plus a feed-health alarm that fires when an IOC feed silently goes dark.
The Phase 5 coverage-audit (scripts/coverage-audit.js, daily 05:00 UTC) joins that denominator against ledger outcomes + the tarball archive to compute an honest GHSA-denominated operational TPR (alerted / total), and surfaces scannedClean misses as human-gated ground-truth candidates.

This operational TPR is the real production detection rate, distinct from the static GT TPR (which has not been re-measured since v2.11.48).

ML Classifier (offline only)

src/ml/classifier.js is not wired into muaddib scan. The XGBoost model is currently exercised only by muaddib evaluate (offline metric replay) and muaddib monitor (LOG-ONLY since 2026-04-08, model collapsed pending retrain — see src/monitor/queue.js:628). The v2.11.48 evaluate-time replay shows the same 1.10% FPR (no additional FPs filtered) — kept as a reference for retrain validation, but the published operational FPR is the rules-only number above.

Static evaluation caveats:

TPR measured on the full 94 in-scope samples from the 96-sample ground truth (2 out-of-scope protestware GT-005/GT-009 with min_threats=0)

TPR@3 = detection rate (any signal); TPR@20 = operational alert threshold

FPR rules measured on 548 curated popular npm packages (not a random sample)

FPR PyPI: 124/132 scanned (8 download fails on >500MB giants — torch/tensorflow/ansible/…). Smaller N than npm.

ADR measured with global threshold (score >= 20) as of v2.6.5

See Evaluation Methodology for the full experimental protocol, holdout history, and Datadog benchmark details.

ML Classifier — R&D, currently inactive

Status (2026-04-08 → present): The XGBoost classifier (src/ml/classifier.js) is not wired into muaddib scan at all, and in muaddib monitor it runs in LOG-ONLY mode since 2026-04-08 — the trained model collapsed (predicts p≈0.002 for every input, including clearly malicious lifecycle+exec+staged_payload patterns) and was disabled pending retrain on balanced JSONL data. The metrics below come from offline muaddib evaluate replay against a frozen bench. They describe what the model would contribute if it worked, not what an operator gets today.

Metric (offline `evaluate` replay)	Result	Details
ML FPR	2.85% (239/8,393 holdout)	XGBoost retrained on 56,564 samples, 64 features, threshold=0.710
ML TPR	99.93% (2,918/2,920 holdout)	377 confirmed_malicious via OSSF/GHSA/npm correlation
FPR after ML T1 (offline replay, v2.11.48)	1.10% (6/545 scanned)	Classifier filters 0/6 raw FPs in this run (filtered 1 at v2.11.47). Not applied during real scans — `muaddib scan` never invokes the classifier.

Retrain methodology (v2.10.51):

Ground truth: 377 confirmed_malicious via auto-labeler (OSSF malicious-packages, GitHub Advisory Database, npm registry takedown correlation)

Dataset: 56,564 samples (14,602 malicious, 41,962 clean). Stratified 80/20 split

Grid search: depth=4, estimators=300, lr=0.05. AUC-ROC=0.999, F1=0.960

Leaky feature filter: 23 dead/leaky features removed (source-identity proxies)

The shadow model continues to log predictions in muaddib monitor for retraining validation. When the next model passes shadow validation, the LOG-ONLY guard in src/monitor/queue.js:660 will be flipped and the metrics above will move back into the operational table.

Contributing

Add IOCs

Edit YAML files in iocs/:

- id: NEW-MALWARE-001
  name: "malicious-package"
  version: "*"
  severity: critical
  confidence: high
  source: community
  description: "Threat description"
  references:
    - https://example.com/article
  mitre: T1195.002

Development

git clone https://github.com/DNSZLSK/muad-dib
cd muad-dib
npm install
npm test

Testing

4132 tests across 115 modular test files
56 fuzz tests - Malformed inputs, ReDoS, unicode, binary
Datadog 17K benchmark - 14,587 confirmed malware samples (in-scope)
Ground truth validation - 96 real-world attacks (95.74% TPR@3, 88.30% TPR@20 — v2.11.48 full measure on 94 in-scope)
False positive validation (v2.11.48 measure) - 1.10% FPR rules (6/545 scanned), 2.50% on 200 random, 9.68% on 124/132 PyPI (first honest measurement post-Track-D download fix). ML classifier currently inactive — see Evaluation Metrics → ML Classifier.

Community

Discord: https://discord.gg/y8zxSmue

Documentation

Blog - Technical articles on supply-chain threat detection
Carnet de bord - Development journal (in French)
Documentation Index - All documentation in one place
Evaluation Methodology - Experimental protocol, holdout scores
Threat Model - What MUAD'DIB detects and doesn't detect
Security Policy - Detection rules reference (259 rules)
Security Audit - Bypass validation report
FP Analysis - Historical false positive analysis

License

MIT

The spice must flow. The worms must die.

About muad-dib

Platforms

Languages

Links

README.md