M2M Vector Search
Machine-to-Memory — A vector search engine with probabilistic Gaussian Splats, online learning via feedback, and multi-backend GPU acceleration.
Quick Start • Features • Architecture • Memory • Benchmarks
Features
| Feature | Description |
|---|---|
| ๐ฎ Gaussian Splats | Probabilistic vector representation: score(x, i) = ฮฑแตข ยท exp(-ฮบแตข ยท โx โ ฮผแตขโยฒ) |
| ๐ Online Learning | Hebbian update rules adapt ฮฑ, ฮบ, ฮผ from user feedback (relevant / not_relevant) |
| โก Multi-GPU Backend | CPU, NVIDIA CUDA, and AMD Vulkan via a single API |
| ๐ง Energy-Based Model | Native uncertainty quantification via energy landscape |
| ๐ HRM2 Engine | Hierarchical Routing with Mixture Models and adaptive probing |
| ๐งน SOC Consolidation | Self-Organized Criticality prunes low-ฮฑ splats automatically |
| ๐๏ธ Semantic Memory | Hybrid BM25 + vector search with Reciprocal Rank Fusion |
| ๐ LangChain Ready | Native Retriever interface |
Quick Start
Install
pip install m2m-vector-search
Minimal Example
from m2m import SimpleVectorDB
import numpy as np
# Create a vector database
db = SimpleVectorDB(latent_dim=128)
# Add vectors
vectors = np.random.randn(1000, 128).astype(np.float32)
db.add(vectors=vectors, ids=[f"doc_{i}" for i in range(1000)])
# Search
query = np.random.randn(128).astype(np.float32)
results = db.search(query, k=10)
for r in results:
print(f" {r.id}: score={r.score:.4f}")
Gaussian Splats with Online Learning
This is the core differentiator. Each vector is a Gaussian Splat with three learnable parameters:
- ฮผ (mean): position in embedding space
- ฮบ (concentration): how sharply the splat responds โ higher ฮบ = more precise match required
- ฮฑ (amplitude): how "important" the memory is โ grows with positive feedback, decays with negative
from m2m import AdvancedVectorDB
import numpy as np
db = AdvancedVectorDB(latent_dim=128, use_gaussian_splats=True)
vectors = np.random.randn(500, 128).astype(np.float32)
db.add(vectors=vectors, ids=[f"item_{i}" for i in range(500)])
# Search uses Gaussian scoring: ฮฑยทexp(-ฮบยทโxโฮผโยฒ)
query = np.random.randn(128).astype(np.float32)
results = db.search(query, k=10)
# Provide feedback โ the system learns from it
db.feedback(
query=query,
relevant_ids=[results[0].id, results[1].id],
irrelevant_ids=[results[8].id, results[9].id],
)
# Next search adapts: strong splats get promoted, weak ones demoted
results2 = db.search(query, k=10)
Update rules (Hebbian + temporal decay):
| Event | ฮฑ (importance) | ฮบ (concentration) | ฮผ (position) |
|---|---|---|---|
| Relevant feedback | ฮฑ += lr_ฮฑ ยท ฮฑ |
ฮบ += lr_ฮบ ยท โx โ ฮผโโปยฒ |
ฮผ += lr_ฮผ ยท (x โ ฮผ) (drift toward query) |
| Irrelevant feedback | ฮฑ *= (1 โ lr_ฮฑ) |
ฮบ -= 0.5 ยท lr_ฮบ |
โ |
| Temporal decay | ฮฑ *= exp(-ฮปยทฮt) |
โ | โ |
Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ REST API (FastAPI) โ
โ Collections ยท CRUD ยท Search โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ SemanticMemoryDB / VectorDB โ
โ Hybrid Search ยท Fusion ยท Temporal Decay โ
โโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโค
โ Splats โ HRM2 โ EBM โ SOC โ
โ (ฮผ,ฮบ,ฮฑ) โ Engine โ Energy โ Consolidate โ
โโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโโดโโโโโโโโโโโโโโโโค
โ Backend Layer (pluggable) โ
โโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโค
โ CPU โ CUDA โ Vulkan โ Transformed โ
โโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโค
โ Storage Layer โ
โโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโค
โ WAL โ Persistence โ GPUVectorIndex โ
โโโโโโโโโโโดโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโ
Semantic Memory System
from m2m import SemanticMemoryDB
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("BAAI/bge-small-en-v1.5")
encoder = lambda text: model.encode(text, show_progress_bar=False)
mem = SemanticMemoryDB(
encoder=encoder,
latent_dim=384,
fusion_method="rrf", # Reciprocal Rank Fusion
temporal_decay=True, # Recent memories rank higher
temporal_half_life_days=30.0,
auto_categorize=True,
)
mem.store("User prefers dark mode for coding", metadata={"category": "preference"})
mem.store("We decided to use Qdrant for production", metadata={"category": "decision"})
results = mem.search("what did we decide about databases?", k=5)
Hybrid Search
| Method | Tuning Required | Best For |
|---|---|---|
| RRF | No | General-purpose (recommended) |
| Weighted | Yes | Domain-specific with known priorities |
vector_only |
No | Pure semantic search |
bm25_only |
No | Pure keyword search |
Benchmarks
All data below is measured. No synthetic or estimated numbers.
System: AMD Ryzen 5 3400G, 16 GB RAM, NVIDIA RTX 3090, Python 3.12
CPU Scale Progression (dim=640, k=64)
| Splats (N) | Linear Scan (ms) | M2M HRM2 (ms) | Speedup | QPS |
|---|---|---|---|---|
| 100 | 0.12 | โ | โ | 8,337 |
| 1,000 | 1.45 | โ | โ | 691 |
| 10,000 | 10.04 | โ | โ | 100 |
| 100,000 | 94.79 | 0.99 | 32.4x | 1,013 |
At 100K splats, HRM2 hierarchical routing achieves 32x speedup over linear scan.
Gaussian Scoring Overhead
The Gaussian re-ranking step (ฮฑยทexp(-ฮบยทdยฒ)) adds ~0.1ms on top of the candidate retrieval, and promotes high-ฮฑ splats in the ranking.
Development
git clone https://github.com/schwabauerbriantomas-gif/m2m-vector-search.git
cd m2m-vector-search
pip install -e ".[all]"
# Run tests
pytest tests/ -v # 394 tests
# Code quality
black src/ tests/
flake8 src/ tests/
License
GNU Affero General Public License v3.0 โ see LICENSE for details.