🎯 OVERWATCH
An open, hackable take on connected-warfare-style perception — running on a $500 dev kit.
Multi-sensor fusion · Cross-camera tracking · Tactical AR HUD · Edge inference on Jetson Orin Nano
Inspired by Anduril Connected Warfare & the Lattice OS concept —
built as a community reference implementation, not affiliated with Anduril Industries.
Demo · Inspiration · Features · Architecture · Quick Start · Deployment · Testing · API · Troubleshooting
Demo Video
Watch on YouTube — prototype iteration 1.
🛰️ Inspiration & scope
OVERWATCH is a publicly-available reference implementation of the multi-sensor situational-awareness concept popularised by Anduril's Connected Warfare and its Lattice software platform — the idea that a network of low-cost, heterogeneous sensors can be fused at the edge into a single, AI-driven view of the battlespace.
This project takes that idea and runs with it on commodity hardware:
- A $500 NVIDIA Jetson Orin Nano instead of a hardened tactical server
- IP webcams + mobile phone cameras instead of dedicated military-grade sensors
- YOLOv8 + Kalman + homography instead of classified perception stacks
- A FastAPI/React stack instead of proprietary tactical software
The visual language — diamond IFF markers, compass ribbon, threat rings, ghost predictions — is inspired by Anduril's EagleEye HUD aesthetic (one of the publicly-shown UI surfaces of Lattice). It is not a clone, not affiliated with or endorsed by Anduril Industries, and not a substitute for their products. Trademarks belong to their respective owners.
Scope honesty: this is a research/educational project. It demonstrates the principles of connected sensing — sensor fusion, cross-camera re-ID, edge inference, real-time broadcast — at a scale that fits in a backpack. It is not military-grade, not C2-system-grade, and not certified for any operational use.
What's the same idea, what's different
| Anduril Lattice / Connected Warfare | OVERWATCH (this repo) | |
|---|---|---|
| Goal | Unified situational awareness across heterogeneous sensors | Same — at hobbyist scale |
| Sensor mix | Cameras, radar, RF, sonar, drones, ground vehicles, … | IP cameras + phone cameras (extensible) |
| Fusion | Proprietary, classified | Open: Kalman + Hungarian + homography |
| Edge compute | Hardened tactical hardware | Jetson Orin Nano dev kit |
| HUD style | EagleEye tactical UI | EagleEye-inspired canvas overlay |
| Autonomy | Multi-asset autonomous teaming | Single-pipeline perception only |
| Use | Defense / national security | Research, learning, civilian situational awareness |
| Source | Closed | Public on GitHub (license: see LICENSE) |
If you're building something in this space — researchers, students, civilian defense-tech tinkerers, public-safety folks — this repo is meant to be a starting point you can fork, hack on, and learn from.
Overview
OVERWATCH is a real-time multi-camera situational awareness platform built for edge deployment on NVIDIA Jetson Orin Nano. It fuses video from IP cameras and mobile phones into a unified world model using YOLOv8 detection, Hungarian-assignment tracking, adaptive Kalman filtering, and cross-camera appearance re-identification — all at TensorRT FP16 speeds.
The system runs a singleton perception pipeline: detection, tracking, and fusion execute once per tick regardless of how many viewers are connected, then broadcast pre-serialized snapshots to all clients over binary WebSocket.
1 camera + 10 viewers = 1 GPU inference, not 10.
🚀 Features
Core perception
| Capability | Implementation |
|---|---|
| Person detection | YOLOv8n with NMS-level class filter (classes=[0]) — person-only |
| TensorRT FP16 | .engine export on Jetson — ~8 MiB, sub-10 ms inference |
| Hungarian tracking | scipy.optimize.linear_sum_assignment — 0.6 × IoU + 0.4 × cosine appearance cost |
| Tracker fallback chain | DeepSORT (MobileNet) → Hungarian (scipy) → Centroid |
| Adaptive Kalman filter | 6-state [x, y, z, vx, vy, vz] — measurement noise scales by confidence, bbox area, sensor trust |
| Cross-camera re-ID | 64-dim HSV histogram descriptors, L2-normalized, EMA-smoothed (α = 0.3) |
| Sensor trust scoring | Per-sensor trust ∈ [0.1, 1.0] — increases for consistent measurements, decays for innovation outliers |
| Cross-camera homography | Self-calibrating ground-plane H from shared foot-point observations via cv2.findHomography + RANSAC |
| 3-path ghost predictions | (A) homography projection from any source camera (green), (B) pixel extrapolation with adaptive budget (red), (C) world-coordinate pinhole projection fallback (orange) |
Platform
| Capability | Implementation |
|---|---|
| Multi-camera | Up to 4 concurrent streams (physical MJPEG/RTSP + mobile virtual cameras) |
| Mobile streaming | Phone browsers → getUserMedia → binary JPEG over WebSocket → VirtualCamera |
| GPS + IMU fusion | Mobile geolocation → equirectangular projection; DeviceOrientationEvent → camera rotation |
| AR overlays | Canvas-based: cyan detection brackets, amber track boxes, green/orange/red ghost predictions |
| Binary protocol | msgpack-serialized snapshots — zero-copy broadcast to all viewers |
| SSL/TLS | Self-signed certificates with SAN for LAN IP access (required for getUserMedia) |
| Optional JWT auth | Default-off; enable with AUTH_ENABLED=true. Token issuance via POST /api/token; WS endpoints accept ?token=... query param |
| Edge deployment | Automated SSH/SFTP deployment to Jetson Orin Nano via paramiko, with atomic staging swap and --rollback |
🏗️ Architecture
┌─────────────────────────────────┐
│ OVERWATCH v2.0.0 │
└─────────────────────────────────┘
╔═══════════════╗ ╔═══════════════════════════════════════════════════╗
║ DATA SOURCES ║ ║ JETSON ORIN NANO (backend :8000) ║
╠═══════════════╣ ╠═══════════════════════════════════════════════════╣
║ ║ ║ ║
║ 📷 IP Camera ─────────► CameraCapture (OpenCV, MJPEG/RTSP) ║
║ ║ ║ │ ║
║ 📱 Mobile ─────────► VirtualCamera (binary JPEG push) ║
║ Phone ║ws/cam ║ │ + GPS/IMU sensor data ║
║ ║ ║ ▼ ║
║ ║ ║ ┌──────────────────────────────────────────┐ ║
║ ║ ║ │ PerceptionPipeline (singleton) │ ║
║ ║ ║ │ │ ║
║ ║ ║ │ 1. DETECT YOLOv8n TensorRT FP16 │ ║
║ ║ ║ │ + HSV appearance features │ ║
║ ║ ║ │ │ │ ║
║ ║ ║ │ 2. TRACK Hungarian assignment │ ║
║ ║ ║ │ IoU + cosine appearance │ ║
║ ║ ║ │ │ │ ║
║ ║ ║ │ 3. FUSE Adaptive Kalman 6-state │ ║
║ ║ ║ │ Cross-camera matching │ ║
║ ║ ║ │ Sensor trust scoring │ ║
║ ║ ║ │ │ │ ║
║ ║ ║ │ 4. SNAPSHOT Pre-serialized msgpack │ ║
║ ║ ║ └──────────────┬───────────────────────────┘ ║
║ ║ ║ │ ║
╚═══════════════╝ ║ ▼ broadcast ║
║ WebSocketManager (/ws, msgpack binary) ║
║ │ │ │ ║
╚═════════╪═══════════╪═══════════╪═════════════════╝
│ │ │
┌─────────▼──┐ ┌─────▼──┐ ┌─────▼─────┐
│ Viewer 1 │ │Viewer 2│ │ Viewer N │
│ React │ │ React │ │ React │
│ AR Canvas │ │ ... │ │ ... │
└────────────┘ └────────┘ └───────────┘
Pipeline design
OVERWATCH runs a single shared pipeline rather than per-viewer. The PerceptionPipeline singleton executes detect → track → fuse once per tick, produces a PerceptionSnapshot with pre-serialized msgpack packets, and all connected viewers read from the latest snapshot.
- 1 camera + 10 viewers = 1 GPU inference
- Zero-copy broadcast via pre-serialized binary packets
- Slow viewers gracefully skip intermediate frames (per-client 2 s send timeout)
📁 Project structure
OVERWATCH/
├── backend/ # FastAPI + perception engine
│ ├── main.py # App entry, lifespan, REST + WS endpoints
│ ├── requirements.txt # Python deps (CPU/Windows dev)
│ ├── requirements-jetson.txt # Jetson Orin Nano deps (pinned)
│ ├── .env.example # Config template
│ └── app/
│ ├── domain/entities.py # Detection, Track, WorldObject, ...
│ ├── application/
│ │ ├── ports.py # Repository interfaces
│ │ └── services.py # PerceptionPipelineService
│ └── infrastructure/
│ ├── auth.py # Optional JWT verify/issue
│ ├── camera_adapter.py # OpenCV capture + virtual cameras
│ ├── config_adapter.py # Pydantic settings
│ ├── container.py # DI container
│ ├── detection_adapter.py # YOLO wrapper
│ ├── frame_encoder_adapter.py # JPEG encode
│ ├── tracking_adapter.py # Hungarian + DeepSORT
│ ├── websocket_adapter.py # msgpack broadcast
│ └── world_model_adapter.py # Kalman fusion + homography
│ └── tests/
│ ├── conftest.py # Shared fixtures
│ └── unit/ # 57 unit tests
│
├── frontend/ # React 18 admin dashboard
│ ├── package.json
│ └── src/
│ ├── pages/
│ │ ├── AdminDashboard.jsx # Main camera grid
│ │ └── MobileCamera.jsx # Phone camera streaming UI
│ ├── components/
│ │ ├── CameraDisplay.jsx # Canvas AR overlay renderer
│ │ ├── ErrorBoundary.jsx # Top-level error fallback
│ │ ├── StatsPanel.jsx
│ │ └── ConnectionStatus.jsx
│ ├── application/hooks/ # useCameraData, useWebSocket, useSystemStats
│ └── infrastructure/ # websocketAdapter, cameraStreamAdapter, apiAdapter
│
├── scripts/ # Deployment & ops
│ ├── _jetson_common.py # Shared SSH/SFTP helper (env-driven creds)
│ ├── deploy_jetson.py # Atomic deploy with --rollback
│ ├── restart_jetson.py # Quick backend restart
│ ├── check_logs.py / check_status.py
│ ├── ws_test.py
│ └── archive/ # Retired/duplicate scripts (reference only)
│
├── certs/ # SSL certificates (gitignored)
├── .github/workflows/ci.yml # GitHub Actions test runner
├── pyproject.toml # pytest + project metadata
└── README.md
⚡ Quick start
Prerequisites
- Python 3.10+ with pip
- Node.js 18+ with npm
- NVIDIA Jetson Orin Nano for production, or any machine with CUDA for development
1. Clone
git clone https://github.com/mandarwagh9/overwatch.git
cd overwatch
2. Local development
# Backend
cd backend
pip install -r requirements.txt
python main.py
# Frontend (new terminal)
cd frontend
npm install
npm start
Open https://localhost:3000 — accept the self-signed certificate warning.
3. Single-binary mode (frontend served by backend)
cd frontend && npm install && npm run build && cd ..
cd backend && python main.py
Backend serves both the React app and the API at https://localhost:8000.
🚀 Deployment
Deploy to Jetson
Credentials are read from environment — never hardcoded:
export JETSON_HOST=192.168.1.10 # default if unset
export JETSON_USER=mandar # default if unset
export JETSON_PASS=... # or use JETSON_KEY=/path/to/id_rsa
python scripts/deploy_jetson.py
The script:
- Connects via SSH/SFTP using paramiko
- Uploads backend, frontend build, certs to
<remote>.new/(staging) - Atomically swaps
<remote>.new→<remote>, keeping previous version at<remote>.bak - Generates a fresh
JWT_SECRETand writes achmod 600.env - Installs Python dependencies
- Starts the backend
Rollback
python scripts/deploy_jetson.py --rollback
Swaps the last .bak directory back into place. Use after a bad deploy.
Quick operations
# Restart backend without redeploying
python scripts/restart_jetson.py
# Tail logs
python scripts/check_logs.py
# Quick status
python scripts/check_status.py
Access (replace with your JETSON_HOST)
| Service | URL |
|---|---|
| Admin Dashboard | https://<jetson-host>:8000 |
| Mobile Camera (standalone) | https://<jetson-host>:8000/mobile |
🧪 Testing
The backend has 57 unit tests covering domain primitives, Kalman filtering, coordinate transforms, tracking, and configuration. CI runs them on every push and PR.
python -m pytest backend/tests/unit -v
Tests are pure-Python and do not require CUDA, ultralytics, or torch. They use pytest.importorskip("cv2") where OpenCV is needed.
📡 API reference
REST endpoints
| Method | Endpoint | Description |
|---|---|---|
GET |
/ |
Serves the React app (or returns API status if no build present) |
GET |
/health |
Detailed health status |
GET |
/status |
System status (cameras, clients, detection engine, pipeline metrics) |
GET |
/cameras |
Active camera list |
POST |
/cameras/{id}/start |
Start a physical camera |
POST |
/cameras/{id}/stop |
Stop a camera |
POST |
/api/token |
Issue a JWT (only when AUTH_ENABLED=true) |
WebSocket endpoints
| Endpoint | Direction | Format | Purpose |
|---|---|---|---|
/ws |
Server → client | msgpack binary | Viewer stream (frames + detections + tracks + predictions) |
/ws/camera |
Client → server | Binary JPEG + JSON | Mobile camera source |
When AUTH_ENABLED=true, both endpoints require a ?token=<jwt> query parameter; unauthorized connections close with 1008.
Mobile registration handshake
Client → { "type": "register", "role": "camera_source", "camera_id": null }
Server → { "type": "registered", "camera_id": 0, "target_fps": 15 }
Client → [binary JPEG frames at target FPS]
Client → { "type": "sensor_data", "gps": {...}, "orientation": {...} }
🔧 Configuration
Backend .env
| Variable | Default | Description |
|---|---|---|
MODEL_PATH |
yolov8n.pt |
Model file — .pt, .engine (TensorRT), or .onnx |
DEVICE |
auto |
Compute device — auto, cpu, cuda:0 |
HALF_PRECISION |
false |
FP16 inference (set true on Jetson with .engine) |
DETECTION_CLASSES |
[0] |
COCO class IDs to detect (0 = person) |
CONFIDENCE_THRESHOLD |
0.5 |
Detection confidence threshold |
IOU_THRESHOLD |
0.45 |
NMS IoU threshold |
TARGET_FPS |
24 |
Processing framerate target |
MAX_CAMERAS |
4 |
Maximum concurrent camera streams |
TRACKING_MAX_AGE |
30 |
Max frames to keep lost tracks |
TRACKING_MIN_HITS |
3 |
Min hits to confirm a track |
TRACKING_IOU_THRESHOLD |
0.25 |
IoU threshold for tracking |
MOBILE_CAMERA_FPS |
15 |
Mobile camera target FPS |
MOBILE_CAMERA_MAX_WIDTH |
640 |
Mobile camera max width |
SSL_ENABLED |
true |
Enable HTTPS/WSS |
SSL_CERTFILE |
certs/cert.pem |
SSL certificate path |
SSL_KEYFILE |
certs/key.pem |
SSL private key path |
HOST |
0.0.0.0 |
Bind address |
PORT |
8000 |
Bind port |
AUTH_ENABLED |
false |
Require JWT on WS + REST when true |
JWT_SECRET |
(empty) | HS256 signing key — required when AUTH_ENABLED=true |
CORS_ORIGINS |
["*"] |
JSON list of allowed origins |
MAX_WS_CLIENTS |
100 |
Hard cap on concurrent viewer WebSocket connections |
Frontend .env
| Variable | Default | Description |
|---|---|---|
REACT_APP_BACKEND_HOST |
window.location.hostname |
Backend IP or hostname |
REACT_APP_BACKEND_PORT |
8000 |
Backend port |
REACT_APP_BACKEND_PROTOCOL |
wss (https) / ws (http) |
WebSocket protocol |
REACT_APP_MAX_CAMERAS |
4 |
Maximum cameras to display |
REACT_APP_CAMERA_INACTIVITY_TIMEOUT |
3000 |
ms before marking camera offline |
REACT_APP_MOBILE_TARGET_FPS |
15 |
Mobile streaming FPS |
REACT_APP_MOBILE_JPEG_QUALITY |
0.5 |
Mobile JPEG quality (0–1) |
REACT_APP_MOBILE_MAX_WIDTH |
640 |
Mobile frame width |
Deploy environment
| Variable | Default | Description |
|---|---|---|
JETSON_HOST |
192.168.1.10 |
Jetson IP or hostname |
JETSON_USER |
mandar |
SSH user |
JETSON_PASS |
(prompt) | SSH password — fallback to getpass if unset |
JETSON_KEY |
(unset) | Path to private key (preferred over password) |
🎯 AR overlay system — EagleEye-inspired tactical HUD
The frontend renders a tactical HUD inspired by Anduril's EagleEye UI — diamond IFF markers, compass ribbon, threat rings — implemented entirely in HTML5 Canvas. (Visual style only; rendered from open code, no Anduril assets used.)
| Layer | Color | Elements |
|---|---|---|
| Detections | Slate-blue #64b5f6 |
Diamond markers, corner brackets, PERSON confidence pill |
| Tracks | Amber #ffd740 |
Diamond/chevron markers, velocity vector arrows, track ID callouts |
| Predictions (H-PROJ) | Green #00ff82 solid |
Homography-projected ghost — accurate, real-time cross-camera |
| Predictions (EXTRAP) | Red #ff5050 dashed |
Pixel-extrapolated ghost — time-decaying dead-reckoning |
| Predictions (WORLD) | Orange #ff9800 dashed |
World-coordinate projection — pinhole-model fallback |
| Compass ribbon | — | Heading ribbon with N/E/S/W and bearing tick marks |
| Threat ring | Per-IFF color | Inner ring around feed showing bearing to off-screen predictions |
Detection overlays show what the model sees right now. Track overlays show persistent identity across frames. Predictions show cross-camera projections — green for homography (most accurate), orange for world-model fallback (rough but always available), red for pixel extrapolation (last resort).
🌍 World model & sensor fusion
Kalman filter
Each fused world object maintains a 6-state Kalman filter [x, y, z, vx, vy, vz] with constant-velocity dynamics. Measurement noise R adapts per-update based on detection confidence, bounding box area, and sensor trust — higher-quality observations tighten the filter, while noisy or untrusted sensors widen it. dt is clamped to ≥ 0 to defend against cross-camera clock skew.
Cross-camera association
Objects from different cameras are matched when:
- Euclidean distance < 2 m
- Same
class_id - Appearance cosine similarity > 0.5 (when feature vectors available)
Sensor trust
Each camera/sensor earns trust through consistency:
- Consistent measurements → trust increases (capped at 1.0)
- Innovation outliers → trust decays (floored at 0.1)
Appearance re-ID
- 64-dimensional HSV histogram descriptors computed per detection (~0.1 ms each)
- L2-normalized for cosine similarity
- Exponential moving average (α = 0.3) for descriptor stability across frames
📐 Cross-camera homography — how it works
The signature feature is ghost prediction: when Camera 0 can't see a person but Camera 1 can, the system renders a ghost overlay on Camera 0's feed showing where that person is.
The problem with naive extrapolation
Sliding a person's last-known pixel position forward in time fails within seconds because:
- Different cameras have completely different pixel coordinate systems
- The mapping between camera views is a projective transformation, not a linear offset
- A person at pixel
(400, 300)in Camera 1 might correspond to(800, 500)in Camera 0
The solution: learn the camera-to-camera transform
When both cameras simultaneously observe the same person (matched via appearance re-ID), the system records foot-point correspondence pairs — the bottom-center of the bounding box in each view. These foot points project to the same physical ground-plane location.
With ≥ 4 such pairs, cv2.findHomography() + RANSAC computes a 3×3 homography matrix $H$ that maps any ground-plane point from one camera's pixel space to another's:
$$\begin{pmatrix} x' \ y' \ w \end{pmatrix} = H \cdot \begin{pmatrix} x \ y \ 1 \end{pmatrix}$$
Self-calibrating pipeline
- Collect: when re-ID matches a person across Camera 0 and Camera 1, record
(foot_cam0, foot_cam1)pair - Estimate: after 4+ pairs, compute $H{0\to 1}$ and $H{1\to 0}$ via RANSAC (re-estimated every 5 new pairs)
- Project: when Camera 0 loses a person but Camera 1 still sees them, apply $H_{1\to 0}$ to Camera 1's current foot point → position on Camera 0's feed
- Validate: monitor reprojection error; if it spikes (camera moved), flush and re-learn
Computational cost
- Homography estimation: < 0.1 ms (called every 5 new pairs, not every frame)
- Per-prediction projection: < 0.001 ms (one 3×3 matrix multiply)
- Total overhead per frame: effectively zero on Jetson Orin Nano
Visual indicators
| Ghost color | Tag | Source | Meaning |
|---|---|---|---|
| 🟢 Green solid | H-PROJ |
Path A — homography | Cross-camera ground-plane projection. Tries all source cameras with valid $H$ to the target, picks the freshest. Most accurate. |
| 🟠 Orange dashed | WORLD |
Path C — world projection | Fused 3D world position → pinhole camera model. Rough but always works even when no homography exists and the target camera has never seen the person. |
| 🔴 Red dashed | EXTRAP |
Path B — pixel extrapolation | Slides last-known pixel position by velocity × time. Adaptive budget: min(250 px, 80 + 40 × t). Only works if the target camera previously saw the person. |
📱 Mobile camera streaming
Any phone on the same LAN can become a camera source:
- Via React app:
https://<frontend-ip>:3000/mobile - Standalone page:
https://<jetson-ip>:8000/mobile
The mobile client:
- Opens the rear camera via
getUserMedia(1280×720) - Renders to an offscreen canvas, extracts a JPEG blob
- Sends binary frames over WebSocket to
/ws/camera - Captures GPS (
watchPosition, high-accuracy) and IMU (DeviceOrientationEvent) at 2 Hz - Sends sensor data as JSON for camera calibration fusion
getUserMediarequires HTTPS — this is why SSL certificates are mandatory even on LAN.
⚠️ Edge cases & known limitations
Cross-camera prediction
| Edge case | Behavior | Mitigation |
|---|---|---|
| No homography learned yet | Path A fails silently; falls through to Path B (extrap) or Path C (world projection). Ghost appears orange instead of green. | Walk through overlapping camera FOVs to collect ≥ 4 foot-point pairs. Homography auto-learns within ~5 s of co-visibility. |
| Camera moved after calibration | Reprojection error spikes; stale $H$ produces offset ghosts. | The system monitors error and flushes the homography when it exceeds 50 px. Walk through overlap again to re-learn. |
| Person only seen by one camera ever | Path A has no source to project from; Path B has no pixel history for the target. Path C is the only option. | Path C accuracy depends on calibrated camera positions in the CoordinateTransformer (set via CAMERA_POSITIONS env). |
| Cameras with no overlapping FOV | No co-visible observations → no foot-point pairs → no homography. Path A never activates between these cameras. | Path C still works. For better accuracy, set CAMERA_POSITIONS to your physical camera extrinsics. |
Tracking & re-ID
| Edge case | Behavior | Mitigation |
|---|---|---|
| Identical clothing | HSV histograms are nearly identical; re-ID may merge two people into one world object. | The system uses spatial distance (< 2 m) AND appearance similarity (> 0.5 cosine). If two people are spatially separated, they stay separate even with identical appearance. |
| Person temporarily fully occluded | Track coasts for prediction_horizon seconds (default 5 s); confidence decays linearly. After timeout, the track is pruned. |
Increase prediction_horizon if longer persistence is needed. Kalman velocity keeps the ghost moving during occlusion. |
| Crowded scenes (> 10 people) | Hungarian cost matrix grows as O(n × m); appearance feature extraction adds ~0.1 ms per detection. |
Throughput may drop below target FPS. YOLOv8n NMS already limits detections. |
| Person enters from off-screen | No pixel history, no world object yet. First detection creates a new track with high measurement noise. | Kalman initializes with large uncertainty; trust builds over 5–10 consistent frames. |
Sensor fusion
| Edge case | Behavior | Mitigation |
|---|---|---|
| Mobile GPS jitter indoors | GPS accuracy can be 10–50 m indoors; Kalman receives noisy position updates. | Sensor trust scoring down-weights high-innovation sources. Trust floor (0.1) prevents complete rejection. |
| Mobile phone loses WebSocket | Virtual camera stream stops; existing tracks coast via Kalman prediction. | Tracks persist for prediction_horizon. Phone auto-reconnects (with intentional-close handling) and gets a new camera ID. |
| Clock drift between cameras | Frame timestamps may not be synchronized. Co-visibility matching uses a 0.5 s window. | The 0.5 s window is generous for typical LAN latency. NTP sync is recommended for sub-100 ms accuracy. The Kalman dt is clamped to ≥ 0 so negative skew can't corrupt covariance. |
Network & deployment
| Edge case | Behavior | Mitigation |
|---|---|---|
| Self-signed cert rejected by browser | WebSocket connection fails silently; frontend shows no feeds. | Visit https://<jetson-ip>:8000 directly and accept the certificate (once per browser session). |
| Jetson runs out of GPU memory | TensorRT engine uses ~30 MiB. With 4 cameras at 640×640, CUDA memory ≈ 200 MiB total. Orin Nano has 8 GB shared. | Monitor with tegrastats. Reduce MAX_CAMERAS or input resolution if tight. |
| Backend crash | Run via nohup in deploy script; no auto-restart. |
Add a systemd unit with Restart=always for production. python scripts/restart_jetson.py brings it back manually. |
| Many viewers cause lag | The singleton pipeline runs once per tick regardless of viewers, but msgpack serialize + send scales linearly. Per-client send timeout is 2 s. | Pre-serialized snapshots minimize per-viewer cost. For > 10 viewers, consider a pub/sub layer (Redis, NATS). |
| Mid-deploy network drop | Atomic SFTP staging means previous version stays at <remote> until the swap. |
If a deploy partially fails, run python scripts/deploy_jetson.py --rollback to restore the last .bak. |
🐛 Troubleshooting
WebSocket won't connect
- Visit
https://<jetson-ip>:8000in your browser and accept the self-signed certificate - Verify
REACT_APP_BACKEND_HOSTinfrontend/.envmatches the backend IP - Check the backend is running:
curl -sk https://<jetson-ip>:8000/health - If
AUTH_ENABLED=true, ensure the client supplies?token=<jwt>fromPOST /api/token
Mobile camera shows black screen
- HTTPS is required for
getUserMedia— ensureSSL_ENABLED=true - Phone must be on the same LAN as the backend
- Allow camera permission when the browser prompts
- Try the standalone page:
https://<jetson-ip>:8000/mobile
Port already in use on Jetson
python scripts/restart_jetson.py
# Or manually over SSH:
JETSON_HOST=192.168.1.10 JETSON_PASS=... ssh "$JETSON_USER@$JETSON_HOST" \
'pkill -9 -f "python3 main.py"; sleep 2; cd /home/$USER/overwatch/backend && nohup python3 main.py > /tmp/overwatch.log 2>&1 &'
Checking Jetson logs
python scripts/check_logs.py # Tails the last 50 lines
python scripts/check_status.py # Backend status snapshot
Bad deploy — roll back
python scripts/deploy_jetson.py --rollback
Swaps the last <remote>.bak directory back into place.
Ghost predictions not appearing on a camera
- Check homography status — look for
H learned: cam0→cam1in logs. If missing, walk through both camera FOVs simultaneously to collect correspondence pairs. - Check world projection — Path C (orange ghost) should always work if camera positions are configured. If missing, verify the
CoordinateTransformercalibration matches your physical setup. - Check prediction horizon — if
time_since_seen > prediction_horizon(default 5 s), the object is pruned. The person must be actively tracked by at least one camera. - Check
source_tracks— if Camera 1 is currently tracking the person, no prediction is generated for it (live track, not a ghost).
Ghosts flicker between green and orange
The homography is borderline — sometimes projection succeeds (green), sometimes it fails and falls through to Path C (orange):
- Homography learned from too few correspondence pairs (minimum 4, but 8+ is more stable)
- Person is near the edge of the overlap zone where reprojection error is highest
- Walk more paths through the camera overlap to improve $H$ stability
Two people merged into one ghost
Cross-camera re-ID matched two different people as the same world object. This happens with:
- Identical clothing (same HSV histogram)
- People standing < 2 m apart in world coordinates
- Temporary occlusion causing track ID swap
The system self-corrects once the people separate spatially. The appearance descriptor EMA (α = 0.3) gradually diverges.
Pydantic Config error
Use only the model_config = SettingsConfigDict(...) dict pattern — do not define an inner class Config. This is the Pydantic v2 convention.
🧰 Tech stack
| Layer | Technology |
|---|---|
| Detection | Ultralytics YOLOv8 (nano) |
| Inference | NVIDIA TensorRT FP16 / ONNX Runtime / PyTorch |
| Tracking | DeepSORT / Hungarian (scipy) / Centroid |
| Fusion | Custom 6-state Kalman filter with adaptive noise |
| Cross-camera | Ground-plane homography via OpenCV findHomography + RANSAC |
| Backend | FastAPI + Uvicorn (ASGI) |
| Protocol | msgpack binary over WebSocket |
| Frontend | React 18 + Canvas 2D API |
| Auth (optional) | PyJWT (HS256) |
| Hardware | NVIDIA Jetson Orin Nano (JetPack 6.x, R36) |
| Deployment | paramiko SSH/SFTP automation |
| Tests + CI | pytest + GitHub Actions |
📚 References & sources
The cross-camera homography system is built on established multi-view geometry principles and inspired by several academic works and open-source implementations.
Foundational theory
| Source | Relevance |
|---|---|
| Hartley, R. & Zisserman, A. (2004). Multiple View Geometry in Computer Vision, 2nd ed. Cambridge University Press. | Chapter 13: ground-plane homography between uncalibrated camera pairs. |
| Faugeras, O. (1993). Three-Dimensional Computer Vision: A Geometric Viewpoint, MIT Press. | Projective geometry fundamentals used in the homography estimation pipeline. |
Research papers
| Paper | Venue | Contribution |
|---|---|---|
| Hou, Y., Zheng, L., & Gould, S. (2020). Multiview Detection with Feature Perspective Transformation | ECCV 2020 | Ground-plane projection of CNN feature maps via homography for multi-view pedestrian detection. 88.2% MODA on Wildtrack. |
| Hou, Y. & Zheng, L. (2021). MVDeTr: Multiview Detection with Shadow Transformer | ACM MM 2021 | Deformable transformer extension of MVDet. 91.5% MODA on Wildtrack. |
| Psaltis, A. et al. (2021). Tracking Grow-Finish Pigs Across Large Pens Using Multiple Cameras | CVPR 2021 Workshop | Production homography-based cross-camera tracking with DeepSORT + YOLOv4. |
| Ristani, E. et al. (2016). Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking | ECCV 2016 Workshop | Defined IDF1, IDP, IDR metrics. Established the DukeMTMC benchmark. |
| Jeon, Y. et al. (2023). Leveraging Future Trajectory Prediction for Multi-Camera People Tracking | CVPR 2023 Workshop | Spatial-temporal cross-camera graph for MCMT. |
| Chen, C. et al. (2023). ReST: A Reconfigurable Spatial-Temporal Graph Model for MCMT | ICCV 2023 | Graph-based cross-camera association that learns spatial topology from observations. |
| Fischler, M.A. & Bolles, R.C. (1981). Random Sample Consensus | CACM 24(6) | The RANSAC algorithm used in cv2.findHomography. |
Open-source implementations
| Repository | Usage |
|---|---|
| hou-yz/MVDet | Reference for get_worldcoord_from_imgcoord() and multi-view feature fusion. |
| hou-yz/MVDeTr | Reference for deformable transformer attention across multi-view projected features. |
| AIFARMS/multi-camera-pig-tracking | Direct inspiration for the homography-based cross-camera approach. |
| yuntaeJ/SCIT-MCMT-Tracking | Reference for spatial-temporal cross-camera association graphs. |
| chengche6230/ReST | Reference for reconfigurable spatial-temporal graphs in MCMT. |
| ultralytics/ultralytics | YOLOv8 detection model. |
| levan92/deep_sort_realtime | DeepSORT tracker implementation. |
Datasets referenced
| Dataset | Citation |
|---|---|
| Wildtrack | Chavdarova, T. et al. (2018). CVPR. |
| MultiviewX | Hou, Y. et al. (2020). Synthetic multi-view pedestrian dataset introduced with MVDet. |
| DukeMTMC | Ristani, E. et al. (2016). |
📄 License & trademarks
OVERWATCH is released under the MIT License — copyright © 2024–2026 Mandar Wagh. You're free to use, copy, modify, merge, publish, distribute, sublicense, and sell copies of the software, subject to the conditions in the license file. If you build something cool with it, a link back is appreciated but not required.
"Anduril," "Lattice," "Connected Warfare," and "EagleEye" are trademarks of Anduril Industries, Inc. This project is an independent community implementation inspired by publicly-shown concepts of those products. It is not affiliated with, endorsed by, or sponsored by Anduril Industries. No proprietary information, code, or assets from Anduril are used.
All other third-party trademarks (NVIDIA, Jetson, TensorRT, React, FastAPI, etc.) belong to their respective owners.
Connected sensing, at hackathon scale. 🎯
Inspired by Anduril Connected Warfare · Built on open tools.