Home
Softono
datahaven

datahaven

Open source Rust
8K
Stars
144
Forks
0
Issues
27
Watchers
2 months
Last Commit

About datahaven

An EVM compatible Substrate chain, powered by StorageHub and secured by EigenLayer

Platforms

Web Self-hosted

Languages

Rust

DataHaven 🫎

AI-First Decentralized Storage secured by EigenLayer β€” a verifiable storage network for AI training data, machine learning models, and Web3 applications.

Overview

DataHaven is a decentralized storage and retrieval network designed for applications that need verifiable, production-scale data storage. Built on StorageHub and secured by EigenLayer's restaking protocol, DataHaven separates storage from verification: providers store data off-chain while cryptographic commitments are anchored on-chain for tamper-evident verification.

Core Capabilities:

  • Verifiable Storage: Files are chunked, hashed into Merkle trees, and committed on-chain β€” enabling cryptographic proof that data hasn't been tampered with
  • Provider Network: Main Storage Providers (MSPs) serve data with competitive offerings, while Backup Storage Providers (BSPs) ensure redundancy through decentralized replication with on-chain slashing for failed proof challenges
  • EigenLayer Security: Validator set secured by Ethereum restaking β€” DataHaven validators register as EigenLayer operators with slashing for misbehavior
  • EVM Compatibility: Full Ethereum support via Frontier pallets for smart contracts and familiar Web3 tooling
  • Cross-chain Bridge: Native, trustless bridging with Ethereum via Snowbridge for tokens and messages

Architecture

DataHaven combines EigenLayer's shared security with StorageHub's decentralized storage infrastructure:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                              Ethereum (L1)                                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  EigenLayer AVS Contracts                                             β”‚  β”‚
β”‚  β”‚  β€’ DataHavenServiceManager (validator lifecycle & slashing)           β”‚  β”‚
β”‚  β”‚  β€’ RewardsRegistry (validator performance & rewards)                  β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                    ↕                                        β”‚
β”‚                          Snowbridge Protocol                                β”‚
β”‚                    (trustless cross-chain messaging)                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                     ↕
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                          DataHaven (Substrate)                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  StorageHub Pallets                     DataHaven Pallets             β”‚  β”‚
β”‚  β”‚  β€’ file-system (file operations)        β€’ External Validators         β”‚  β”‚
β”‚  β”‚  β€’ providers (MSP/BSP registry)         β€’ Native Transfer             β”‚  β”‚
β”‚  β”‚  β€’ proofs-dealer (challenge/verify)     β€’ Rewards                     β”‚  β”‚
β”‚  β”‚  β€’ payment-streams (storage payments)   β€’ Frontier (EVM)              β”‚  β”‚
β”‚  β”‚  β€’ bucket-nfts (bucket ownership)                                     β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                     ↕
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        Storage Provider Network                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”‚
β”‚  β”‚  Main Storage Providers     β”‚    β”‚  Backup Storage Providers   β”‚        β”‚
β”‚  β”‚  (MSP)                      β”‚    β”‚  (BSP)                      β”‚        β”‚
β”‚  β”‚  β€’ User-selected            β”‚    β”‚  β€’ Network-assigned         β”‚        β”‚
β”‚  β”‚  β€’ Serve read requests      β”‚    β”‚  β€’ Replicate data           β”‚        β”‚
β”‚  β”‚  β€’ Anchor bucket roots      β”‚    β”‚  β€’ Proof challenges         β”‚        β”‚
β”‚  β”‚  β€’ MSP Backend service      β”‚    β”‚  β€’ On-chain slashing        β”‚        β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”‚
β”‚  β”‚  Indexer                    β”‚    β”‚  Fisherman                  β”‚        β”‚
β”‚  β”‚  β€’ Index on-chain events    β”‚    β”‚  β€’ Audit storage proofs     β”‚        β”‚
β”‚  β”‚  β€’ Query storage metadata   β”‚    β”‚  β€’ Trigger challenges       β”‚        β”‚
β”‚  β”‚  β€’ PostgreSQL backend       β”‚    β”‚  β€’ Detect misbehavior       β”‚        β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

How Storage Works

  1. Upload: User selects an MSP, creates a bucket, and uploads files. Files are chunked (8KB default), hashed into Merkle trees, and the root is anchored on-chain.
  2. Replication: The MSP coordinates with BSPs to replicate data across the network based on the bucket's replication policy.
  3. Retrieval: MSP returns files with Merkle proofs that users verify against on-chain commitments.
  4. Verification: BSPs face periodic proof challenges β€” failure to prove data custody results in on-chain slashing via StorageHub pallets.

Repository Structure

datahaven/
β”œβ”€β”€ contracts/      # EigenLayer AVS smart contracts
β”‚   β”œβ”€β”€ src/       # Service Manager, Rewards Registry, Slasher
β”‚   β”œβ”€β”€ script/    # Deployment scripts
β”‚   └── test/      # Foundry test suites
β”œβ”€β”€ operator/       # Substrate-based DataHaven node
β”‚   β”œβ”€β”€ node/      # Node implementation & chain spec
β”‚   β”œβ”€β”€ pallets/   # Custom pallets (validators, rewards, transfers)
β”‚   └── runtime/   # Runtime configurations (mainnet/stagenet/testnet)
β”œβ”€β”€ test/           # E2E testing framework
β”‚   β”œβ”€β”€ suites/    # Integration test scenarios
β”‚   β”œβ”€β”€ framework/ # Test utilities and helpers
β”‚   └── launcher/  # Network deployment automation
β”œβ”€β”€ deploy/         # Kubernetes deployment charts
β”‚   β”œβ”€β”€ charts/    # Helm charts for nodes and relayers
β”‚   └── environments/ # Environment-specific configurations
β”œβ”€β”€ tools/          # GitHub automation and release scripts
└── .github/        # CI/CD workflows

Each directory contains its own README with detailed information. See:

Quick Start

Prerequisites

  • Kurtosis - Network orchestration
  • Bun v1.3.2+ - TypeScript runtime
  • Docker - Container management
  • Foundry - Solidity toolkit
  • Rust - For building the operator
  • Helm - Kubernetes deployments (optional)
  • Zig - For macOS cross-compilation (macOS only)

Launch Local Network

The fastest way to get started is with the interactive CLI:

cd test
bun i                    # Install dependencies
bun cli launch           # Interactive launcher with prompts

This deploys a complete environment including:

  • Ethereum network: 2x EL clients (reth), 2x CL clients (lodestar)
  • Block explorers: Blockscout (optional), Dora consensus explorer
  • DataHaven node: Single validator with fast block times
  • Storage providers: MSP and BSP nodes for decentralized storage
  • AVS contracts: Deployed and configured on Ethereum
  • Snowbridge relayers: Bidirectional message passing

For more options and detailed instructions, see the test README.

Run Tests

cd test
bun test:e2e              # Run all integration tests
bun test:e2e:parallel     # Run with limited concurrency

NOTES: Adding the environment variable INJECT_CONTRACTS=true will inject the contracts when starting the tests to speed up setup.

Development Workflows

Smart Contract Development:

cd contracts
forge build               # Compile contracts
forge test                # Run contract tests

Node Development:

cd operator
cargo build --release --features fast-runtime
cargo test
./scripts/run-benchmarks.sh

After Making Changes:

cd test
bun generate:wagmi        # Regenerate contract bindings
bun generate:types        # Regenerate runtime types

Key Features

Verifiable Decentralized Storage

Production-scale storage with cryptographic guarantees:

  • Buckets: User-created containers managed by an MSP, summarized by a Merkle-Patricia trie root on-chain
  • Files: Deterministically chunked, hashed into Merkle trees, with roots serving as immutable fingerprints
  • Proofs: Merkle proofs enable verification of data integrity without trusting intermediaries
  • Audits: BSPs prove ongoing data custody via randomized proof challenges

Storage Provider Network

Two-tier provider model balancing performance and reliability:

  • MSPs: User-selected providers offering data retrieval with competitive service offerings
  • BSPs: Network-assigned backup providers ensuring data redundancy and availability, with on-chain slashing for failed proof challenges
  • Fisherman: Auditing service that monitors proofs and triggers challenges for misbehavior
  • Indexer: Indexes on-chain storage events for efficient querying

EigenLayer Security

DataHaven validators secured through Ethereum restaking:

  • Validators register as operators via DataHavenServiceManager contract
  • Economic security through ETH restaking
  • Slashing for validator misbehavior (separate from BSP slashing which is on-chain)
  • Performance-based validator rewards through RewardsRegistry

EVM Compatibility

Full Ethereum Virtual Machine support via Frontier pallets:

  • Deploy Solidity smart contracts
  • Use existing Ethereum tooling (MetaMask, Hardhat, etc.)
  • Compatible with ERC-20, ERC-721, and other standards

Cross-chain Communication

Trustless bridging via Snowbridge:

  • Native token transfers between Ethereum ↔ DataHaven
  • Cross-chain message passing
  • Finality proofs via BEEFY consensus
  • Three specialized relayers (beacon, BEEFY, execution)

Use Cases

DataHaven is designed for applications requiring verifiable, tamper-proof data storage:

  • AI & Machine Learning: Store training datasets, model weights, and agent configurations with cryptographic proofs of integrity β€” enabling federated learning and verifiable AI pipelines
  • DePIN (Decentralized Physical Infrastructure): Persistent storage for IoT sensor data, device configurations, and operational logs with provable data lineage
  • Real World Assets (RWAs): Immutable storage for asset documentation, ownership records, and compliance data with on-chain verification

Docker Images

Production images published to DockerHub.

Build optimizations:

Build locally:

cd test
bun build:docker:operator    # Creates datahavenxyz/datahaven:local

Development Environment

VS Code Configuration

IDE configurations are excluded from version control for personalization, but these settings are recommended for optimal developer experience. Add to your .vscode/settings.json:

Rust Analyzer:

{
  "rust-analyzer.linkedProjects": ["./operator/Cargo.toml"],
  "rust-analyzer.cargo.allTargets": true,
  "rust-analyzer.procMacro.enable": false,
  "rust-analyzer.server.extraEnv": {
    "CARGO_TARGET_DIR": "target/.rust-analyzer",
    "SKIP_WASM_BUILD": 1
  },
  "rust-analyzer.diagnostics.disabled": ["unresolved-macro-call"],
  "rust-analyzer.cargo.buildScripts.enable": false
}

Optimizations:

  • Links operator/ directory as the primary Rust project
  • Disables proc macros and build scripts for faster analysis (Substrate macros are slow)
  • Uses dedicated target directory to avoid conflicts
  • Skips WASM builds during development

Solidity (Juan Blanco's extension):

{
  "solidity.formatter": "forge",
  "solidity.compileUsingRemoteVersion": "v0.8.28+commit.7893614a",
  "[solidity]": {
    "editor.defaultFormatter": "JuanBlanco.solidity"
  }
}

Note: Solidity version must match foundry.toml

TypeScript (Biome):

{
  "biome.lsp.bin": "test/node_modules/.bin/biome",
  "[typescript]": {
    "editor.defaultFormatter": "biomejs.biome",
    "editor.codeActionsOnSave": {
      "source.organizeImports.biome": "always"
    }
  }
}

CI/CD

Local CI Testing

Run GitHub Actions workflows locally using act:

# Run E2E workflow
act -W .github/workflows/e2e.yml -s GITHUB_TOKEN="$(gh auth token)"

# Run specific job
act -W .github/workflows/e2e.yml -j test-job-name

Automated Workflows

The repository includes GitHub Actions for:

  • E2E Testing: Full integration tests on PR and main branch
  • Contract Testing: Foundry test suites for smart contracts
  • Rust Testing: Unit and integration tests for operator
  • Docker Builds: Multi-platform image builds with caching
  • Release Automation: Version tagging and changelog generation

See .github/workflows/ for workflow definitions.

Contributing

Development Cycle

  1. Make Changes: Edit contracts, runtime, or tests
  2. Run Tests: Component-specific tests (forge test, cargo test)
  3. Regenerate Types: Update bindings if contracts/runtime changed
  4. Integration Test: Run E2E tests to verify cross-component behavior
  5. Code Quality: Format and lint (cargo fmt, forge fmt, bun fmt:fix)

Common Pitfalls

  • Type mismatches: Regenerate with bun generate:types after runtime changes
  • Contract changes not reflected: Run bun generate:wagmi after modifications
  • Kurtosis issues: Ensure Docker is running and Kurtosis engine is started
  • Slow development: Use --features fast-runtime for shorter epochs/eras (block time stays 6s)
  • Network launch hangs: Check Blockscout - forge output can appear frozen

See CLAUDE.md for detailed development guidance.

License

GPL-3.0 - See LICENSE file for details

Links