Home
Softono
muad-dib

muad-dib

Open source MIT JavaScript
11
Stars
4
Forks
2
Issues
0
Watchers
1 week
Last Commit

About muad-dib

Real-time npm/PyPI supply-chain threat detection. Behavioral chain analysis, AST scanning, IOC feeds, and compound scoring engine.

Platforms

Web Self-hosted

Languages

JavaScript

MUAD'DIB Logo

npm version CI Coverage OpenSSF Scorecard License Node IOCs

Installation | Usage | Features | VS Code | CI/CD

Version francaise


Why MUAD'DIB?

npm and PyPI supply-chain attacks are exploding. Shai-Hulud compromised 25K+ repos in 2025. Existing tools detect threats but don't help you respond.

MUAD'DIB combines 20 parallel scanners (264 detection rules), a deobfuscation engine, inter-module dataflow analysis, compound scoring (17 compound rules), and a gVisor/Docker sandbox to detect known threats and suspicious behavioral patterns in npm and PyPI packages. An XGBoost classifier exists in the codebase but is currently inactive (see Evaluation Metrics → ML Classifier section).


Positioning

MUAD'DIB is an educational tool and a free first line of defense. It detects known npm and PyPI threats (225,000+ IOCs) and suspicious behavioral patterns.

For enterprise protection, use:

  • Socket.dev - ML behavioral analysis, cloud sandboxing
  • Snyk - Massive vulnerability database, CI/CD integrations
  • Opengrep - Advanced dataflow analysis, Semgrep rules

Installation

npm (recommended)

npm install -g muaddib-scanner

From source

git clone https://github.com/DNSZLSK/muad-dib
cd muad-dib
npm install
npm link

Usage

Basic scan

muaddib scan .
muaddib scan /path/to/project

Scans both npm (package.json, node_modules) and Python (requirements.txt, setup.py, pyproject.toml) dependencies.

Interactive mode

muaddib

Safe install

muaddib install <package>
muaddib install lodash axios --save-dev
muaddib install suspicious-pkg --force    # Force install despite threats

Scans packages for threats BEFORE installing. Blocks known malicious packages.

Risk score

Each scan displays a 0-100 risk score:

[SCORE] 58/100 [***********---------] HIGH

Explain mode

muaddib scan . --explain

Shows rule ID, MITRE ATT&CK technique, references, and response playbook for each detection.

Export

muaddib scan . --json > results.json     # JSON
muaddib scan . --html report.html        # HTML
muaddib scan . --sarif results.sarif     # SARIF (GitHub Security)

Severity threshold

muaddib scan . --fail-on critical  # Fail only on CRITICAL
muaddib scan . --fail-on high      # Fail on HIGH and CRITICAL (default)

Paranoid mode

muaddib scan . --paranoid

Ultra-strict detection with lower tolerance. Detects any network access, subprocess execution, dynamic code evaluation, and sensitive file access.

Webhook alerts

muaddib scan . --webhook "https://discord.com/api/webhooks/..."

Strict filtering (v2.1.2): alerts only for IOC matches, sandbox-confirmed threats, or canary token exfiltration. Priority triage (v2.10.21): P1 (red, IOC/sandbox/canary), P2 (orange, high-score/compounds), P3 (yellow, rest).

Behavioral anomaly detection (v2.0)

muaddib scan . --temporal-full     # All 4 temporal features
muaddib scan . --temporal          # Sudden lifecycle script detection
muaddib scan . --temporal-ast      # AST diff between versions
muaddib scan . --temporal-publish  # Publish frequency anomaly
muaddib scan . --temporal-maintainer # Maintainer change detection

Detects supply-chain attacks before they appear in IOC databases by analyzing changes between package versions. See Evaluation Methodology for details.

Docker sandbox

muaddib sandbox <package-name>
muaddib sandbox <package-name> --strict

Dynamic analysis in an isolated Docker container: strace, tcpdump, filesystem diff, canary tokens, CI-aware environment, and monkey-patching preload for time-bomb detection (multi-run at [0h, 72h, 7d] offsets).

Other commands

muaddib watch .                    # Real-time monitoring
muaddib daemon                     # Daemon mode (auto-scan npm install)
muaddib update                     # Update IOCs (fast, ~5s)
muaddib scrape                     # Full IOC refresh (~5min)
muaddib diff HEAD~1                # Compare threats with previous commit
muaddib init-hooks                 # Pre-commit hooks (husky/pre-commit/git)
muaddib scan . --breakdown         # Explainable score decomposition
muaddib replay                     # Ground truth validation (90/94 TPR@3, v2.11.48)

Features

20 parallel scanners

Scanner Detection
AST Parse (acorn) eval, Function, credential theft, binary droppers, prototype hooks
Pattern Matching Shell commands, reverse shells, dead man's switch
Dataflow Analysis Credential read + network send (intra-file and cross-file)
Obfuscation Detection JS obfuscation patterns (skip .min.js)
Deobfuscation Pre-processing String concat, charcode, base64, hex array, const propagation
Inter-module Dataflow Cross-file taint propagation (3-hop chains, class methods)
Intent Coherence Intra-file source-sink pairing (credential + eval/network)
Typosquatting npm + PyPI (Levenshtein distance)
Python Scanner requirements.txt, setup.py, pyproject.toml, 14K+ PyPI IOCs
Shannon Entropy High-entropy strings (5.5 bits + 50 chars min)
AI Config Scanner .cursorrules, CLAUDE.md, copilot-instructions.md injection
Package/Dependencies Lifecycle scripts, IOC matching (225K+ packages)
GitHub Actions Shai-Hulud backdoor detection
Hash Scanner Known malicious file hashes
IOC Strings (intel-triage P1.1) YARA-style string matching (Axios 2026, TeamPCP, GlassWorm, CanisterSprawl)
Anti-Forensic AST (intel-triage P1.2) XOR loop + self-delete + decoy write compound (csec autodelete)
Stub Package (intel-triage P1.3) Tiny main file + external dep URL + lifecycle hook (ltidi chain)
Monorepo Scanner Lerna/pnpm-workspace/turbo detection (Sprint 1 audit MR-C2 fix)
Trusted-Dep-Diff (opt-in) Diff against trusted dep tarballs from registry (v2.10.x)
Python Source (PYSRC) Import-time / install-time RCE patterns in __init__.py / setup.py (v2.11.41 — closes TrapDoor PyPI gap)
Python AST (PYAST) Tree-sitter-Python AST with taint-aware detectors (v2.11.42+)

264 detection rules

All rules (259 RULES + 5 PARANOID) are mapped to MITRE ATT&CK techniques. See SECURITY.md for the complete rules reference.

Detected campaigns

Campaign Status
GlassWorm (2026, 433+ packages) Detected
Shai-Hulud v1/v2/v3 (2025) Detected
event-stream (2018) Detected
eslint-scope (2018) Detected
Protestware (node-ipc, colors, faker) Detected
Typosquats (crossenv, mongose, babelcli) Detected

VS Code

The VS Code extension automatically scans your npm projects.

code --install-extension dnszlsk.muaddib-vscode
  • MUAD'DIB: Scan Project - Scan entire project
  • MUAD'DIB: Scan Current File - Scan current file
  • Settings: muaddib.autoScan, muaddib.webhookUrl, muaddib.failLevel

See vscode-extension/README.md for full documentation.


CI/CD

GitHub Actions (Marketplace)

name: Security Scan

on: [push, pull_request]

jobs:
  scan:
    runs-on: ubuntu-latest
    permissions:
      security-events: write
      contents: read
    steps:
      - uses: actions/checkout@v4
      - uses: DNSZLSK/muad-dib@v1
        with:
          path: '.'
          fail-on: 'high'
          sarif: 'results.sarif'
Input Description Default
path Path to scan .
fail-on Minimum severity to fail high
sarif SARIF output file path
paranoid Ultra-strict detection false

Pre-commit hooks

muaddib init-hooks                        # Auto-detect (husky/pre-commit/git)
muaddib init-hooks --type husky           # Force husky
muaddib init-hooks --mode diff            # Only block NEW threats

With pre-commit framework:

repos:
  - repo: https://github.com/DNSZLSK/muad-dib
    rev: v2.11.76
    hooks:
      - id: muaddib-scan

Evaluation Metrics

Latest measurement: v2.11.48 (2026-05-26, Track D + PyPI download fix). Ground truth holds 96 samples (94 in-scope, 2 out-of-scope protestware). This run measures the full 94 in-scope set after the 2026-05-25 enrichment (Track C synthetic for the new PYSRC/PYAST/AST-092/AICONF-004/PKG-022 rules, Track A real-world tarballs recovered from VPS archive, Track B reconstructions from the in-house security-review benchmark).

Operational metrics (what an operator actually gets)

These are the numbers a user gets when running muaddib scan against npm or PyPI packages. The pipeline executes scanners + FP caps only — no ML filter is applied (see ML Classifier note below).

Metric Result Details
Wild TPR (Datadog 17K) 92.8% (13,538/14,587 in-scope) 17,922 packages. 3,335 skipped (no JS). By category: compromised_lib 97.8%, malicious_intent 92.1% — last measurement v2.9.4, independent of GT.
TPR@3 (detection rate, v2.11.48) 95.74% (90/94 in-scope) Full GT re-measurement. Threshold=3: any signal. 13 PyPI samples (was 0). 4 misses incl. 3 browser-only (lottie-player, polyfill-io, trojanized-jquery).
TPR@20 (alert rate, v2.11.48) 88.30% (83/94 in-scope) Operational alert threshold=20. +3.1pp vs v2.11.47 — Track D recon_exfil_direct_ip compound (MUADDIB-COMPOUND-016) closed the GT-095 gap (risk 3→50) and boosted GT-091 byvendors / GT-092 heloo131313 through linux_fingerprint_exec.
FPR rules (Benign curated, v2.11.48 measure) 1.10% (6/545 scanned, 548 total) Unchanged after Track D — the new compound + types created zero new FPs (sameFile gate + public-IP-only filter). Drop from 15.6% (v2.10.95) is attributable to FP caps F1-F14 (v2.10.97 → v2.11.31). 6 remaining FPs are real (meteor, prisma, @prisma/client, drizzle-orm, scrypt, liquid).
FPR (Benign random, v2.11.48) 2.50% (5/200) 200 random npm packages, unchanged.
FPR PyPI (v2.11.48, first honest measurement) 9.68% (12/124 scanned, 132 total) Track D fixed the PyPI downloader — removed pip --no-binary :all: flag (forced compile of wheel-only packages, timed out 38% of the time) + added .whl extraction via extractArchive(). Brought 42 previously-skipped giants (numpy/pandas/django/matplotlib/scikit-learn/...) into scope. All 12 FPs cluster at score 25-35: this is the cap-PyPI-35 artifact, not new rule misfires. Lifting the cap (Track E) would drop FPR PyPI to ≈0%. 8 residual fails are >500MB packages (torch, tensorflow, scipy, opencv-python, ansible…) hitting the 30s PACK_TIMEOUT_MS.
ADR (Adversarial + Holdout, v2.11.48) 96.26% (103/107) 67 adversarial + 40 holdout, global threshold=20. Stable vs v2.10.95.

4132 tests across 115 files. 264 rules (259 RULES + 5 PARANOID; v2.11.67/70 Phantom Gyp added PKG-023 + COMPOUND-017).

Known issues (v2.11.48):

  • Cap PyPI à 35/100: Python samples plafonnent à riskScore=35 even when globalRiskScore=100. Confirmed empirically — all 12 PyPI FPs at score 25-35 (flask 32, django 35, tornado 35, bottle 30, pandas 25, matplotlib 25, plotly 25, bokeh 25, pymongo 35, coverage 32, fabric 35, websockets 35). Lifting the cap will simultaneously drop FPR PyPI to ≈0% and unblock PyPI MALWARE detection at higher thresholds. Track E target.

Operational coverage (v2.11.67-76)

The static ground-truth TPR above is measured offline. Since v2.11.67 the monitor also tracks operational coverage on live npm/PyPI ingestion:

  • A per-scan ledger (data/scan-ledger.jsonl) records every scanned package's outcome; computeLedgerRollup() produces a 24h rollup (alertRate, per-ecosystem). Note: alertRate is a throughput signal, not detection TPR.
  • An active GHSA poller (~15 min; npm, pypi, crates) builds an authoritative "what should we have caught" denominator (data/ghsa-malware.jsonl), plus a feed-health alarm that fires when an IOC feed silently goes dark.
  • The Phase 5 coverage-audit (scripts/coverage-audit.js, daily 05:00 UTC) joins that denominator against ledger outcomes + the tarball archive to compute an honest GHSA-denominated operational TPR (alerted / total), and surfaces scannedClean misses as human-gated ground-truth candidates.

This operational TPR is the real production detection rate, distinct from the static GT TPR (which has not been re-measured since v2.11.48).

ML Classifier (offline only)

src/ml/classifier.js is not wired into muaddib scan. The XGBoost model is currently exercised only by muaddib evaluate (offline metric replay) and muaddib monitor (LOG-ONLY since 2026-04-08, model collapsed pending retrain — see src/monitor/queue.js:628). The v2.11.48 evaluate-time replay shows the same 1.10% FPR (no additional FPs filtered) — kept as a reference for retrain validation, but the published operational FPR is the rules-only number above.

Static evaluation caveats:

  • TPR measured on the full 94 in-scope samples from the 96-sample ground truth (2 out-of-scope protestware GT-005/GT-009 with min_threats=0)
  • TPR@3 = detection rate (any signal); TPR@20 = operational alert threshold
  • FPR rules measured on 548 curated popular npm packages (not a random sample)
  • FPR PyPI: 124/132 scanned (8 download fails on >500MB giants — torch/tensorflow/ansible/…). Smaller N than npm.
  • ADR measured with global threshold (score >= 20) as of v2.6.5

See Evaluation Methodology for the full experimental protocol, holdout history, and Datadog benchmark details.

ML Classifier — R&D, currently inactive

Status (2026-04-08 → present): The XGBoost classifier (src/ml/classifier.js) is not wired into muaddib scan at all, and in muaddib monitor it runs in LOG-ONLY mode since 2026-04-08 — the trained model collapsed (predicts p≈0.002 for every input, including clearly malicious lifecycle+exec+staged_payload patterns) and was disabled pending retrain on balanced JSONL data. The metrics below come from offline muaddib evaluate replay against a frozen bench. They describe what the model would contribute if it worked, not what an operator gets today.

Metric (offline evaluate replay) Result Details
ML FPR 2.85% (239/8,393 holdout) XGBoost retrained on 56,564 samples, 64 features, threshold=0.710
ML TPR 99.93% (2,918/2,920 holdout) 377 confirmed_malicious via OSSF/GHSA/npm correlation
FPR after ML T1 (offline replay, v2.11.48) 1.10% (6/545 scanned) Classifier filters 0/6 raw FPs in this run (filtered 1 at v2.11.47). Not applied during real scans — muaddib scan never invokes the classifier.

Retrain methodology (v2.10.51):

  • Ground truth: 377 confirmed_malicious via auto-labeler (OSSF malicious-packages, GitHub Advisory Database, npm registry takedown correlation)
  • Dataset: 56,564 samples (14,602 malicious, 41,962 clean). Stratified 80/20 split
  • Grid search: depth=4, estimators=300, lr=0.05. AUC-ROC=0.999, F1=0.960
  • Leaky feature filter: 23 dead/leaky features removed (source-identity proxies)

The shadow model continues to log predictions in muaddib monitor for retraining validation. When the next model passes shadow validation, the LOG-ONLY guard in src/monitor/queue.js:660 will be flipped and the metrics above will move back into the operational table.


Contributing

Add IOCs

Edit YAML files in iocs/:

- id: NEW-MALWARE-001
  name: "malicious-package"
  version: "*"
  severity: critical
  confidence: high
  source: community
  description: "Threat description"
  references:
    - https://example.com/article
  mitre: T1195.002

Development

git clone https://github.com/DNSZLSK/muad-dib
cd muad-dib
npm install
npm test

Testing

  • 4132 tests across 115 modular test files
  • 56 fuzz tests - Malformed inputs, ReDoS, unicode, binary
  • Datadog 17K benchmark - 14,587 confirmed malware samples (in-scope)
  • Ground truth validation - 96 real-world attacks (95.74% TPR@3, 88.30% TPR@20 — v2.11.48 full measure on 94 in-scope)
  • False positive validation (v2.11.48 measure) - 1.10% FPR rules (6/545 scanned), 2.50% on 200 random, 9.68% on 124/132 PyPI (first honest measurement post-Track-D download fix). ML classifier currently inactive — see Evaluation Metrics → ML Classifier.

Community


Documentation


License

MIT


The spice must flow. The worms must die.