Installation | Usage | Features | VS Code | CI/CD
Why MUAD'DIB?
npm and PyPI supply-chain attacks are exploding. Shai-Hulud compromised 25K+ repos in 2025. Existing tools detect threats but don't help you respond.
MUAD'DIB combines 20 parallel scanners (264 detection rules), a deobfuscation engine, inter-module dataflow analysis, compound scoring (17 compound rules), and a gVisor/Docker sandbox to detect known threats and suspicious behavioral patterns in npm and PyPI packages. An XGBoost classifier exists in the codebase but is currently inactive (see Evaluation Metrics → ML Classifier section).
Positioning
MUAD'DIB is an educational tool and a free first line of defense. It detects known npm and PyPI threats (225,000+ IOCs) and suspicious behavioral patterns.
For enterprise protection, use:
- Socket.dev - ML behavioral analysis, cloud sandboxing
- Snyk - Massive vulnerability database, CI/CD integrations
- Opengrep - Advanced dataflow analysis, Semgrep rules
Installation
npm (recommended)
npm install -g muaddib-scanner
From source
git clone https://github.com/DNSZLSK/muad-dib
cd muad-dib
npm install
npm link
Usage
Basic scan
muaddib scan .
muaddib scan /path/to/project
Scans both npm (package.json, node_modules) and Python (requirements.txt, setup.py, pyproject.toml) dependencies.
Interactive mode
muaddib
Safe install
muaddib install <package>
muaddib install lodash axios --save-dev
muaddib install suspicious-pkg --force # Force install despite threats
Scans packages for threats BEFORE installing. Blocks known malicious packages.
Risk score
Each scan displays a 0-100 risk score:
[SCORE] 58/100 [***********---------] HIGH
Explain mode
muaddib scan . --explain
Shows rule ID, MITRE ATT&CK technique, references, and response playbook for each detection.
Export
muaddib scan . --json > results.json # JSON
muaddib scan . --html report.html # HTML
muaddib scan . --sarif results.sarif # SARIF (GitHub Security)
Severity threshold
muaddib scan . --fail-on critical # Fail only on CRITICAL
muaddib scan . --fail-on high # Fail on HIGH and CRITICAL (default)
Paranoid mode
muaddib scan . --paranoid
Ultra-strict detection with lower tolerance. Detects any network access, subprocess execution, dynamic code evaluation, and sensitive file access.
Webhook alerts
muaddib scan . --webhook "https://discord.com/api/webhooks/..."
Strict filtering (v2.1.2): alerts only for IOC matches, sandbox-confirmed threats, or canary token exfiltration. Priority triage (v2.10.21): P1 (red, IOC/sandbox/canary), P2 (orange, high-score/compounds), P3 (yellow, rest).
Behavioral anomaly detection (v2.0)
muaddib scan . --temporal-full # All 4 temporal features
muaddib scan . --temporal # Sudden lifecycle script detection
muaddib scan . --temporal-ast # AST diff between versions
muaddib scan . --temporal-publish # Publish frequency anomaly
muaddib scan . --temporal-maintainer # Maintainer change detection
Detects supply-chain attacks before they appear in IOC databases by analyzing changes between package versions. See Evaluation Methodology for details.
Docker sandbox
muaddib sandbox <package-name>
muaddib sandbox <package-name> --strict
Dynamic analysis in an isolated Docker container: strace, tcpdump, filesystem diff, canary tokens, CI-aware environment, and monkey-patching preload for time-bomb detection (multi-run at [0h, 72h, 7d] offsets).
Other commands
muaddib watch . # Real-time monitoring
muaddib daemon # Daemon mode (auto-scan npm install)
muaddib update # Update IOCs (fast, ~5s)
muaddib scrape # Full IOC refresh (~5min)
muaddib diff HEAD~1 # Compare threats with previous commit
muaddib init-hooks # Pre-commit hooks (husky/pre-commit/git)
muaddib scan . --breakdown # Explainable score decomposition
muaddib replay # Ground truth validation (90/94 TPR@3, v2.11.48)
Features
20 parallel scanners
| Scanner | Detection |
|---|---|
| AST Parse (acorn) | eval, Function, credential theft, binary droppers, prototype hooks |
| Pattern Matching | Shell commands, reverse shells, dead man's switch |
| Dataflow Analysis | Credential read + network send (intra-file and cross-file) |
| Obfuscation Detection | JS obfuscation patterns (skip .min.js) |
| Deobfuscation Pre-processing | String concat, charcode, base64, hex array, const propagation |
| Inter-module Dataflow | Cross-file taint propagation (3-hop chains, class methods) |
| Intent Coherence | Intra-file source-sink pairing (credential + eval/network) |
| Typosquatting | npm + PyPI (Levenshtein distance) |
| Python Scanner | requirements.txt, setup.py, pyproject.toml, 14K+ PyPI IOCs |
| Shannon Entropy | High-entropy strings (5.5 bits + 50 chars min) |
| AI Config Scanner | .cursorrules, CLAUDE.md, copilot-instructions.md injection |
| Package/Dependencies | Lifecycle scripts, IOC matching (225K+ packages) |
| GitHub Actions | Shai-Hulud backdoor detection |
| Hash Scanner | Known malicious file hashes |
| IOC Strings (intel-triage P1.1) | YARA-style string matching (Axios 2026, TeamPCP, GlassWorm, CanisterSprawl) |
| Anti-Forensic AST (intel-triage P1.2) | XOR loop + self-delete + decoy write compound (csec autodelete) |
| Stub Package (intel-triage P1.3) | Tiny main file + external dep URL + lifecycle hook (ltidi chain) |
| Monorepo Scanner | Lerna/pnpm-workspace/turbo detection (Sprint 1 audit MR-C2 fix) |
| Trusted-Dep-Diff (opt-in) | Diff against trusted dep tarballs from registry (v2.10.x) |
| Python Source (PYSRC) | Import-time / install-time RCE patterns in __init__.py / setup.py (v2.11.41 — closes TrapDoor PyPI gap) |
| Python AST (PYAST) | Tree-sitter-Python AST with taint-aware detectors (v2.11.42+) |
264 detection rules
All rules (259 RULES + 5 PARANOID) are mapped to MITRE ATT&CK techniques. See SECURITY.md for the complete rules reference.
Detected campaigns
| Campaign | Status |
|---|---|
| GlassWorm (2026, 433+ packages) | Detected |
| Shai-Hulud v1/v2/v3 (2025) | Detected |
| event-stream (2018) | Detected |
| eslint-scope (2018) | Detected |
| Protestware (node-ipc, colors, faker) | Detected |
| Typosquats (crossenv, mongose, babelcli) | Detected |
VS Code
The VS Code extension automatically scans your npm projects.
code --install-extension dnszlsk.muaddib-vscode
MUAD'DIB: Scan Project- Scan entire projectMUAD'DIB: Scan Current File- Scan current file- Settings:
muaddib.autoScan,muaddib.webhookUrl,muaddib.failLevel
See vscode-extension/README.md for full documentation.
CI/CD
GitHub Actions (Marketplace)
name: Security Scan
on: [push, pull_request]
jobs:
scan:
runs-on: ubuntu-latest
permissions:
security-events: write
contents: read
steps:
- uses: actions/checkout@v4
- uses: DNSZLSK/muad-dib@v1
with:
path: '.'
fail-on: 'high'
sarif: 'results.sarif'
| Input | Description | Default |
|---|---|---|
path |
Path to scan | . |
fail-on |
Minimum severity to fail | high |
sarif |
SARIF output file path | |
paranoid |
Ultra-strict detection | false |
Pre-commit hooks
muaddib init-hooks # Auto-detect (husky/pre-commit/git)
muaddib init-hooks --type husky # Force husky
muaddib init-hooks --mode diff # Only block NEW threats
With pre-commit framework:
repos:
- repo: https://github.com/DNSZLSK/muad-dib
rev: v2.11.76
hooks:
- id: muaddib-scan
Evaluation Metrics
Latest measurement: v2.11.48 (2026-05-26, Track D + PyPI download fix). Ground truth holds 96 samples (94 in-scope, 2 out-of-scope protestware). This run measures the full 94 in-scope set after the 2026-05-25 enrichment (Track C synthetic for the new PYSRC/PYAST/AST-092/AICONF-004/PKG-022 rules, Track A real-world tarballs recovered from VPS archive, Track B reconstructions from the in-house security-review benchmark).
Operational metrics (what an operator actually gets)
These are the numbers a user gets when running muaddib scan against npm or PyPI packages. The pipeline executes scanners + FP caps only — no ML filter is applied (see ML Classifier note below).
| Metric | Result | Details |
|---|---|---|
| Wild TPR (Datadog 17K) | 92.8% (13,538/14,587 in-scope) | 17,922 packages. 3,335 skipped (no JS). By category: compromised_lib 97.8%, malicious_intent 92.1% — last measurement v2.9.4, independent of GT. |
| TPR@3 (detection rate, v2.11.48) | 95.74% (90/94 in-scope) | Full GT re-measurement. Threshold=3: any signal. 13 PyPI samples (was 0). 4 misses incl. 3 browser-only (lottie-player, polyfill-io, trojanized-jquery). |
| TPR@20 (alert rate, v2.11.48) | 88.30% (83/94 in-scope) | Operational alert threshold=20. +3.1pp vs v2.11.47 — Track D recon_exfil_direct_ip compound (MUADDIB-COMPOUND-016) closed the GT-095 gap (risk 3→50) and boosted GT-091 byvendors / GT-092 heloo131313 through linux_fingerprint_exec. |
| FPR rules (Benign curated, v2.11.48 measure) | 1.10% (6/545 scanned, 548 total) | Unchanged after Track D — the new compound + types created zero new FPs (sameFile gate + public-IP-only filter). Drop from 15.6% (v2.10.95) is attributable to FP caps F1-F14 (v2.10.97 → v2.11.31). 6 remaining FPs are real (meteor, prisma, @prisma/client, drizzle-orm, scrypt, liquid). |
| FPR (Benign random, v2.11.48) | 2.50% (5/200) | 200 random npm packages, unchanged. |
| FPR PyPI (v2.11.48, first honest measurement) | 9.68% (12/124 scanned, 132 total) | Track D fixed the PyPI downloader — removed pip --no-binary :all: flag (forced compile of wheel-only packages, timed out 38% of the time) + added .whl extraction via extractArchive(). Brought 42 previously-skipped giants (numpy/pandas/django/matplotlib/scikit-learn/...) into scope. All 12 FPs cluster at score 25-35: this is the cap-PyPI-35 artifact, not new rule misfires. Lifting the cap (Track E) would drop FPR PyPI to ≈0%. 8 residual fails are >500MB packages (torch, tensorflow, scipy, opencv-python, ansible…) hitting the 30s PACK_TIMEOUT_MS. |
| ADR (Adversarial + Holdout, v2.11.48) | 96.26% (103/107) | 67 adversarial + 40 holdout, global threshold=20. Stable vs v2.10.95. |
4132 tests across 115 files. 264 rules (259 RULES + 5 PARANOID; v2.11.67/70 Phantom Gyp added PKG-023 + COMPOUND-017).
Known issues (v2.11.48):
- Cap PyPI à 35/100: Python samples plafonnent à
riskScore=35even whenglobalRiskScore=100. Confirmed empirically — all 12 PyPI FPs at score 25-35 (flask 32, django 35, tornado 35, bottle 30, pandas 25, matplotlib 25, plotly 25, bokeh 25, pymongo 35, coverage 32, fabric 35, websockets 35). Lifting the cap will simultaneously drop FPR PyPI to ≈0% and unblock PyPI MALWARE detection at higher thresholds. Track E target.
Operational coverage (v2.11.67-76)
The static ground-truth TPR above is measured offline. Since v2.11.67 the monitor also tracks operational coverage on live npm/PyPI ingestion:
- A per-scan ledger (
data/scan-ledger.jsonl) records every scanned package's outcome;computeLedgerRollup()produces a 24h rollup (alertRate, per-ecosystem). Note:alertRateis a throughput signal, not detection TPR. - An active GHSA poller (~15 min; npm, pypi, crates) builds an authoritative "what should we have caught" denominator (
data/ghsa-malware.jsonl), plus a feed-health alarm that fires when an IOC feed silently goes dark. - The Phase 5 coverage-audit (
scripts/coverage-audit.js, daily 05:00 UTC) joins that denominator against ledger outcomes + the tarball archive to compute an honest GHSA-denominated operational TPR (alerted / total), and surfacesscannedCleanmisses as human-gated ground-truth candidates.
This operational TPR is the real production detection rate, distinct from the static GT TPR (which has not been re-measured since v2.11.48).
ML Classifier (offline only)
src/ml/classifier.js is not wired into muaddib scan. The XGBoost model is currently exercised only by muaddib evaluate (offline metric replay) and muaddib monitor (LOG-ONLY since 2026-04-08, model collapsed pending retrain — see src/monitor/queue.js:628). The v2.11.48 evaluate-time replay shows the same 1.10% FPR (no additional FPs filtered) — kept as a reference for retrain validation, but the published operational FPR is the rules-only number above.
Static evaluation caveats:
- TPR measured on the full 94 in-scope samples from the 96-sample ground truth (2 out-of-scope protestware GT-005/GT-009 with
min_threats=0)- TPR@3 = detection rate (any signal); TPR@20 = operational alert threshold
- FPR rules measured on 548 curated popular npm packages (not a random sample)
- FPR PyPI: 124/132 scanned (8 download fails on >500MB giants — torch/tensorflow/ansible/…). Smaller N than npm.
- ADR measured with global threshold (score >= 20) as of v2.6.5
See Evaluation Methodology for the full experimental protocol, holdout history, and Datadog benchmark details.
ML Classifier — R&D, currently inactive
Status (2026-04-08 → present): The XGBoost classifier (
src/ml/classifier.js) is not wired intomuaddib scanat all, and inmuaddib monitorit runs in LOG-ONLY mode since 2026-04-08 — the trained model collapsed (predicts p≈0.002 for every input, including clearly malicious lifecycle+exec+staged_payload patterns) and was disabled pending retrain on balanced JSONL data. The metrics below come from offlinemuaddib evaluatereplay against a frozen bench. They describe what the model would contribute if it worked, not what an operator gets today.
Metric (offline evaluate replay) |
Result | Details |
|---|---|---|
| ML FPR | 2.85% (239/8,393 holdout) | XGBoost retrained on 56,564 samples, 64 features, threshold=0.710 |
| ML TPR | 99.93% (2,918/2,920 holdout) | 377 confirmed_malicious via OSSF/GHSA/npm correlation |
| FPR after ML T1 (offline replay, v2.11.48) | 1.10% (6/545 scanned) | Classifier filters 0/6 raw FPs in this run (filtered 1 at v2.11.47). Not applied during real scans — muaddib scan never invokes the classifier. |
Retrain methodology (v2.10.51):
- Ground truth: 377 confirmed_malicious via auto-labeler (OSSF malicious-packages, GitHub Advisory Database, npm registry takedown correlation)
- Dataset: 56,564 samples (14,602 malicious, 41,962 clean). Stratified 80/20 split
- Grid search: depth=4, estimators=300, lr=0.05. AUC-ROC=0.999, F1=0.960
- Leaky feature filter: 23 dead/leaky features removed (source-identity proxies)
The shadow model continues to log predictions in
muaddib monitorfor retraining validation. When the next model passes shadow validation, the LOG-ONLY guard insrc/monitor/queue.js:660will be flipped and the metrics above will move back into the operational table.
Contributing
Add IOCs
Edit YAML files in iocs/:
- id: NEW-MALWARE-001
name: "malicious-package"
version: "*"
severity: critical
confidence: high
source: community
description: "Threat description"
references:
- https://example.com/article
mitre: T1195.002
Development
git clone https://github.com/DNSZLSK/muad-dib
cd muad-dib
npm install
npm test
Testing
- 4132 tests across 115 modular test files
- 56 fuzz tests - Malformed inputs, ReDoS, unicode, binary
- Datadog 17K benchmark - 14,587 confirmed malware samples (in-scope)
- Ground truth validation - 96 real-world attacks (95.74% TPR@3, 88.30% TPR@20 — v2.11.48 full measure on 94 in-scope)
- False positive validation (v2.11.48 measure) - 1.10% FPR rules (6/545 scanned), 2.50% on 200 random, 9.68% on 124/132 PyPI (first honest measurement post-Track-D download fix). ML classifier currently inactive — see Evaluation Metrics → ML Classifier.
Community
- Discord: https://discord.gg/y8zxSmue
Documentation
- Blog - Technical articles on supply-chain threat detection
- Carnet de bord - Development journal (in French)
- Documentation Index - All documentation in one place
- Evaluation Methodology - Experimental protocol, holdout scores
- Threat Model - What MUAD'DIB detects and doesn't detect
- Security Policy - Detection rules reference (259 rules)
- Security Audit - Bypass validation report
- FP Analysis - Historical false positive analysis
License
MIT
The spice must flow. The worms must die.