Dakera — Self-Hosted Persistent Memory for AI Agents
URL: https://dakera.ai/ Type: Self-hosted infrastructure / open-core (SDKs MIT, engine proprietary binary) Status: Public alpha; managed cloud offering pending (waitlist) Positioning: Single-binary agent memory infrastructure; “compounding intelligence” over time
Core Argument
Most agent memory setups require 3–5 separate services (vector store, embedding API, knowledge graph, session store). Dakera collapses these into a single 44MB Rust binary with zero external dependencies — including built-in embeddings via Candle (no OpenAI API call required). Self-hosted means no vendor lock-in and no data leaving the infrastructure.
Architecture: Five Rust Crates, One Binary
| Crate | Role |
|---|---|
dakera-api | REST/gRPC layer; auth, rate limiting, observability |
dakera-engine | Six indexing algorithms: HNSW, IVF, SPFresh, BM25, hybrid; Raft consensus clustering |
dakera-inference | Candle-based ML inference for embeddings — no Python runtime |
dakera-storage | Three-tier persistence: memory → filesystem (mmap+LSM) → S3/MinIO; WAL durability; AES-256-GCM encryption |
dakera-common | Shared types, validation, telemetry |
Performance claims: ~27.4M inserts/sec peak; p50 query latency 118µs; sub-10ms end-to-end queries.
Four Memory Types (Agent-Specific)
| Type | Description |
|---|---|
| Episodic | Event and conversation history |
| Semantic | Factual knowledge and user preferences |
| Procedural | Learned behaviors and workflows |
| Working | Short-term, in-session context |
All types include automatic importance decay — recent and frequently accessed memories rank higher.
Retrieval
Hybrid: HNSW vector search + BM25 keyword + temporal re-ranking. Built-in embedding models: MiniLM-L6 (384-dim), BGE-Small, E5-Small (via Candle, no external API). Benchmark: 87.6% on LoCoMo.
Knowledge Graph
Automatic entity extraction and relationship mapping — adds a graph layer on top of vector retrieval for structured reasoning about entities and their connections.
Integration Surface
- MCP native: 83 callable tools for Claude, Cursor, Windsurf
- Framework packages:
langchain-dakera,llama-index-dakera,crewai-dakera,autogen-dakera - SDKs: Python, TypeScript, Go, Rust
- Protocols: REST (JSON/CBOR), gRPC, MCP
Deployment
Single Docker image or binary on Linux/Kubernetes. API key required at init. Claimed “deployable in under a minute.” Clustering via Raft consensus + consistent hashing for horizontal scaling.
Pricing
- Self-hosted: Free, no usage fees, no telemetry
- Dakera Cloud: Managed hosting + SLA + monitoring — waitlist only; “founder pricing” lock-in available
Key Takeaways
- Single-binary approach eliminates the most common reason agent memory setups fail in production: operational complexity of coordinating multiple services
- Built-in Candle inference is the decisive privacy/cost win vs. cloud memory APIs — embeddings never leave the server
- Open-core licensing (MIT SDKs, proprietary engine) is a common VC-backed infra play: community adoption via SDKs, monetization via cloud
- Four agent-specific memory types with decay contrast with Mem0’s ADD-only model — decay means stale facts eventually stop surfacing, which is architecturally sounder for long-lived agents
- Knowledge graph layer on top of vector search mirrors the ontology-aware approach of Supermemory but self-hosted
- Still alpha — benchmark claims and throughput numbers should be verified against independent evaluations before production commitment