Dakera — Self-Hosted Persistent Memory for AI Agents

URL: https://dakera.ai/ Type: Self-hosted infrastructure / open-core (SDKs MIT, engine proprietary binary) Status: Public alpha; managed cloud offering pending (waitlist) Positioning: Single-binary agent memory infrastructure; “compounding intelligence” over time

Core Argument

Most agent memory setups require 3–5 separate services (vector store, embedding API, knowledge graph, session store). Dakera collapses these into a single 44MB Rust binary with zero external dependencies — including built-in embeddings via Candle (no OpenAI API call required). Self-hosted means no vendor lock-in and no data leaving the infrastructure.

Architecture: Five Rust Crates, One Binary

Crate	Role
`dakera-api`	REST/gRPC layer; auth, rate limiting, observability
`dakera-engine`	Six indexing algorithms: HNSW, IVF, SPFresh, BM25, hybrid; Raft consensus clustering
`dakera-inference`	Candle-based ML inference for embeddings — no Python runtime
`dakera-storage`	Three-tier persistence: memory → filesystem (mmap+LSM) → S3/MinIO; WAL durability; AES-256-GCM encryption
`dakera-common`	Shared types, validation, telemetry

Performance claims: ~27.4M inserts/sec peak; p50 query latency 118µs; sub-10ms end-to-end queries.

Four Memory Types (Agent-Specific)

Type	Description
Episodic	Event and conversation history
Semantic	Factual knowledge and user preferences
Procedural	Learned behaviors and workflows
Working	Short-term, in-session context

All types include automatic importance decay — recent and frequently accessed memories rank higher.

Retrieval

Hybrid: HNSW vector search + BM25 keyword + temporal re-ranking. Built-in embedding models: MiniLM-L6 (384-dim), BGE-Small, E5-Small (via Candle, no external API). Benchmark: 87.6% on LoCoMo.

Knowledge Graph

Automatic entity extraction and relationship mapping — adds a graph layer on top of vector retrieval for structured reasoning about entities and their connections.

Integration Surface

MCP native: 83 callable tools for Claude, Cursor, Windsurf
Framework packages: langchain-dakera, llama-index-dakera, crewai-dakera, autogen-dakera
SDKs: Python, TypeScript, Go, Rust
Protocols: REST (JSON/CBOR), gRPC, MCP

Deployment

Single Docker image or binary on Linux/Kubernetes. API key required at init. Claimed “deployable in under a minute.” Clustering via Raft consensus + consistent hashing for horizontal scaling.

Pricing

Self-hosted: Free, no usage fees, no telemetry
Dakera Cloud: Managed hosting + SLA + monitoring — waitlist only; “founder pricing” lock-in available

Key Takeaways

Single-binary approach eliminates the most common reason agent memory setups fail in production: operational complexity of coordinating multiple services
Built-in Candle inference is the decisive privacy/cost win vs. cloud memory APIs — embeddings never leave the server
Open-core licensing (MIT SDKs, proprietary engine) is a common VC-backed infra play: community adoption via SDKs, monetization via cloud
Four agent-specific memory types with decay contrast with Mem0’s ADD-only model — decay means stale facts eventually stop surfacing, which is architecturally sounder for long-lived agents
Knowledge graph layer on top of vector search mirrors the ontology-aware approach of Supermemory but self-hosted
Still alpha — benchmark claims and throughput numbers should be verified against independent evaluations before production commitment

AI Wiki — martplus

Explorer

Dakera: Self-Hosted Persistent Memory for AI Agents