Dakera — Self-Hosted Persistent Memory for AI Agents

URL: https://dakera.ai/ Type: Self-hosted infrastructure / open-core (SDKs MIT, engine proprietary binary) Status: Public alpha; managed cloud offering pending (waitlist) Positioning: Single-binary agent memory infrastructure; “compounding intelligence” over time

Core Argument

Most agent memory setups require 3–5 separate services (vector store, embedding API, knowledge graph, session store). Dakera collapses these into a single 44MB Rust binary with zero external dependencies — including built-in embeddings via Candle (no OpenAI API call required). Self-hosted means no vendor lock-in and no data leaving the infrastructure.

Architecture: Five Rust Crates, One Binary

CrateRole
dakera-apiREST/gRPC layer; auth, rate limiting, observability
dakera-engineSix indexing algorithms: HNSW, IVF, SPFresh, BM25, hybrid; Raft consensus clustering
dakera-inferenceCandle-based ML inference for embeddings — no Python runtime
dakera-storageThree-tier persistence: memory → filesystem (mmap+LSM) → S3/MinIO; WAL durability; AES-256-GCM encryption
dakera-commonShared types, validation, telemetry

Performance claims: ~27.4M inserts/sec peak; p50 query latency 118µs; sub-10ms end-to-end queries.

Four Memory Types (Agent-Specific)

TypeDescription
EpisodicEvent and conversation history
SemanticFactual knowledge and user preferences
ProceduralLearned behaviors and workflows
WorkingShort-term, in-session context

All types include automatic importance decay — recent and frequently accessed memories rank higher.

Retrieval

Hybrid: HNSW vector search + BM25 keyword + temporal re-ranking. Built-in embedding models: MiniLM-L6 (384-dim), BGE-Small, E5-Small (via Candle, no external API). Benchmark: 87.6% on LoCoMo.

Knowledge Graph

Automatic entity extraction and relationship mapping — adds a graph layer on top of vector retrieval for structured reasoning about entities and their connections.

Integration Surface

  • MCP native: 83 callable tools for Claude, Cursor, Windsurf
  • Framework packages: langchain-dakera, llama-index-dakera, crewai-dakera, autogen-dakera
  • SDKs: Python, TypeScript, Go, Rust
  • Protocols: REST (JSON/CBOR), gRPC, MCP

Deployment

Single Docker image or binary on Linux/Kubernetes. API key required at init. Claimed “deployable in under a minute.” Clustering via Raft consensus + consistent hashing for horizontal scaling.

Pricing

  • Self-hosted: Free, no usage fees, no telemetry
  • Dakera Cloud: Managed hosting + SLA + monitoring — waitlist only; “founder pricing” lock-in available

Key Takeaways

  • Single-binary approach eliminates the most common reason agent memory setups fail in production: operational complexity of coordinating multiple services
  • Built-in Candle inference is the decisive privacy/cost win vs. cloud memory APIs — embeddings never leave the server
  • Open-core licensing (MIT SDKs, proprietary engine) is a common VC-backed infra play: community adoption via SDKs, monetization via cloud
  • Four agent-specific memory types with decay contrast with Mem0’s ADD-only model — decay means stale facts eventually stop surfacing, which is architecturally sounder for long-lived agents
  • Knowledge graph layer on top of vector search mirrors the ontology-aware approach of Supermemory but self-hosted
  • Still alpha — benchmark claims and throughput numbers should be verified against independent evaluations before production commitment