Supermemory — Memory Layer for AI Agents
URL: https://supermemory.ai/ Type: Managed cloud platform + API / developer SDK Positioning: “Long-term and short-term memory and context infrastructure for AI agents”
Core Argument
AI agents lack persistent context across sessions and users. Supermemory provides a hosted memory API that builds a semantic graph on top of any entity (user, document, project, org) — enabling agents to understand and recall user intent, preferences, and history without the developer building storage infrastructure from scratch.
Architecture
Storage engine: Custom vector graph engine with ontology-aware edges — goes beyond standard cosine similarity by encoding semantic relationships between entities as typed graph edges.
Retrieval: Hybrid search combining vector embeddings + keyword indexing; sub-300ms response time for real-time agent use.
Shared context pool: All three memory types (below) pull from the same pool when scoped to the same containerTag (user/entity ID).
Three Memory Types
| Type | Description |
|---|---|
| Memory API | Extracts and evolves user-specific facts in real-time; handles knowledge updates, temporal drift, and forgetting |
| User Profiles | Combines static (always-known) and dynamic (episodic, recent conversation) data into a user model |
| RAG System | Full advanced metadata filtering + contextual chunking for document-grounded retrieval |
Data Ingestion
Multi-format processing: text, conversations, PDFs, images, documents, videos. Pre-built connectors for:
- Notion, Slack, Gmail, Google Drive, S3
Integration Surface
- TypeScript + Python SDKs
- REST API (OpenAPI spec available)
- MCP server integration
- Developer console for API key management
Use Cases
- Enterprise API backends needing persistent user context
- Developer plugins requiring memory across sessions
- Personal productivity apps (persistent digital memory)
- Any agent that must understand a specific user over time
Key Takeaways
- Ontology-aware graph edges are the core differentiator vs. plain vector stores — they encode why two pieces of memory are related, not just that they are similar
- Three-mode memory API (facts / profiles / RAG) covers the main retrieval patterns in one platform
- Cloud-only; no self-hosted option documented — same privacy trade-off as mem0-memory-layer, contrasted with local-first link-local-llm-memory
containerTagscoping model is a clean abstraction: one API handles per-user, per-document, or per-org memory with the same calls- Sub-300ms retrieval makes it viable for synchronous, low-latency agent pipelines