How Anuma
Private Memory Works

Memory Engine handles automatic recall. Memory Vault stores your curated knowledge. Together they give every AI model complete context.

Get Started Free Explore Context

Anuma Memory Engine

Automatic recall from conversations. Searches past conversations using semantic similarity. No exact keywords needed. Finds relevant context from days or weeks ago.

Learn more

Anuma Memory Vault

Persistent facts and preferences. A curated knowledge base of saved facts, preferences, and instructions. Explicitly saved. Available across all future conversations.

Learn more

How Anuma AI memory works

Four steps from conversation to recall

Every memory follows the same path. Captured on your device. Encrypted before it leaves. Stored as ciphertext. Decrypted only when you need it.

01
Capture
As you chat, Anuma quietly captures meaningful moments: messages worth recalling and facts worth saving.
Every message is chunked at sentence boundaries (~400 chars, 50-char overlap).
02
Encrypt
Before anything leaves your device, your memory is encrypted with a key derived from your wallet, a key only you hold.
AES-GCM-256 encryption with a wallet-derived key. Keys never touch our servers.
03
Store
Encrypted memory syncs across your devices. We see only opaque ciphertext, never the words inside.
Zero-knowledge storage. Anuma servers cannot read your memory, even if compelled to.
04
Retrieve
When you ask anything in any AI model, your device decrypts the relevant memory and quietly threads it into context.
Semantic search finds meaning, not keywords. Decryption happens client-side, every time.

Anuma Memory Engine

Automatic recall from every conversation you have ever had.

Semantic search.

Every message is embedded as a high-dimensional vector capturing its meaning. Queries find the most similar past messages using cosine similarity.

No exact keyword matches needed. A question about “deployment issues” surfaces past conversations about “CI pipeline failures.”

Vector embeddingsCosine similarity0.3 threshold

Query

“deployment issues”

Embed & compare vectors

Match · 0.87

“CI pipeline failures”

Smart chunking.

Long messages automatically split into overlapping segments using sentence-boundary detection. Each chunk targets roughly 400 characters with 50-character overlap.

Each chunk is embedded independently. Relevant fragments inside long messages still surface in search.

Sentence boundaries~400 char chunks50 char overlap

Input

Long message · 1,200 characters

Output

Chunk 1

Chunk 2

Chunk 3

~400 chars each · 50-char overlap

Diverse results.

Round-robin deduplication ensures results come from multiple conversations, not just one. The system over-fetches 3x to 9x candidates before deduplication for a diverse result pool.

Round-robin dedup3x to 9x over-fetchMulti-conversation

candidates fetched at 9x

Round-robin deduplication

results from 8 different conversations

Context expansion.

Finding a single chunk is not enough. The agent retrieves surrounding messages for full context. Default returns the full conversation session around each match.

Results are grouped by conversation with relevance scores and timestamps.

Full session contextRelevance scoresTimestamps

Previous message

Matched chunk0.91

Following message

Anuma Memory Vault

A knowledge base that grows with you.

Save, update, and organize.

The agent creates, updates, and organizes memories into folders. Each memory is scoped for access control. Mark memories as private or shared.

Memories are filterable by folder or scope. Unfiled entries are queryable separately.

FoldersPrivate or sharedFilterable

Work preferences4 items

Personal7 items

Health goals3 items

Private to you

Semantic search with caching.

Embedding-based retrieval by meaning, not keywords. LRU cache holds up to 5,000 vectors. Evicts least-accessed entries when full.

New memories are embedded eagerly and cached immediately. The system gets faster over time.

Embedding retrievalLRU cache5,000 vectors

Vector capacity

68%

3,412 / 5,000 vectors

12ms

Avg retrieval

5,000

LRU cache limit

Supersession detection.

When two memories are semantically similar (70%+ cosine similarity) and 30+ days apart, the newer one replaces the older. Greedy pairing prevents cascading adjustments.

Original memories stay intact. Ranking is adjusted at query time only.

70% similarity30-day thresholdNon-destructive

Jan 2

“Prefers light mode”

82% similar · 45 days

Feb 16

“Prefers dark mode”

Supersedes older memory

User control.

Confirmation callbacks let users approve what the agent saves. Multi-user support with user-scoped storage enforced at the database layer.

Bulk delete available for full data cleanup.

Approval callbacksUser-scopedBulk delete

Agent wants to save

“User prefers dark mode in all editors”

Approve

Reject

User-scoped · only you can access

Better together.

Memory Engine and Memory Vault share the same embedding infrastructure. They serve different roles but work side by side.

User mentions they prefer dark mode in a conversation.

Memory Engine indexes this conversation automatically.

Agent uses Memory Vault to save “User prefers dark mode” as a persistent fact.

Future conversations: Vault retrieves the preference instantly without searching history.

	Memory Engine	Memory Vault
Source	Conversation history	Explicitly saved memories
Updates	Automatic	Save/update via agent
Organization	By conversation	By folder and scope
Search default	8 results, 0.3 threshold	5 results, 0.1 threshold
Chunking	Sentence-based, ~400 chars	Whole memory as single unit
Staleness handling	Round-robin dedup	Supersession detection
Caching	Optional per-embedding	LRU cache (5,000 entries)
Best for	Recalling past discussions	Persistent facts and preferences

Local-first architecture

Encrypted on your device. Stored as ciphertext. Yours to control.

Most AI tools store your memory in plaintext on their servers, readable by their staff, by their training pipelines, and by anyone who breaches them. Anuma takes a different path. Your memory is end-to-end encrypted with keys that only you hold.

AES-GCM-256Wallet keysZero-knowledgeCross-model

Your keys

Wallet-derived encryption.

Your encryption key is derived from your wallet. Never typed, never emailed, never stored on our servers. You hold it. We never see it.

On-device

Encrypted before anything syncs.

Memory content is encrypted in your browser before a single byte leaves your device. What lands on our infrastructure is ciphertext we cannot read.

Zero-knowledge

We literally cannot read it.

Anuma servers store opaque blobs. Even if someone with the keys to our infrastructure tried, they would see encrypted noise, not your words.

One memory layer

Unified across models and devices.

Your encrypted memory follows you across ChatGPT, Claude, Gemini, and every model on Anuma, and across every device you sign in from.

Data flow

What leaves your device, what stays on it.

Your device

Plaintext memory + your wallet key

Anuma servers

Encrypted ciphertext only. Unreadable.

Your other devices

Decrypted client-side with your key

Encryption happens before sync. Decryption happens after. We are just the pipe in between, and the pipe cannot see what flows through it.

Traditional AI memory vs Anuma

Same job. Two approaches to AI memory.

Traditional AI memory lives on vendor servers, in plaintext, locked to one model. Anuma's private AI memory is encrypted on your device, unified across every AI model, and yours to control.

Traditional AI memory

Cloud-onlyVendor-controlledModel-locked

Anuma memory

Local-firstUser-controlledPortable

Storage location

Traditional AI memory: Vendor cloud

Locked inside one provider, beside billions of other users.

Anuma memory: Local-first

Encrypted on your device, then synced as ciphertext.

Encryption at rest

Traditional AI memory: Vendor-managed

Encrypted, but the vendor holds the keys. They can read it.

Anuma memory: AES-GCM-256

End-to-end encryption with keys derived from your wallet.

Key custody

Traditional AI memory: The vendor

Decryption is in their environment, on their schedule.

Anuma memory: You

Wallet-derived keys. Decryption only happens on your devices.

Cross-model portability

Traditional AI memory: Locked to one model

Memory works in one assistant. Switch tools and it is gone.

Anuma memory: Every model

One memory layer across ChatGPT, Claude, Gemini, and more.

User-controlled deletion

Traditional AI memory: Maybe, sometimes

Deletion requests, retention windows, and backup ambiguity.

Anuma memory: Always

Delete any memory, or all of them, in one tap. Permanent.

Used for training

Traditional AI memory: Often by default

Your conversations feed the next model release. Opt-out only.

Anuma memory: Never

Your encrypted memory cannot be used for training. We cannot read it.

Audit transparency

Traditional AI memory: Opaque

Closed-source pipelines. You take the policy on faith.

Anuma memory: Inspectable

Open architecture. The crypto and data flow are documented.

AI persistent memory privacy

What Anuma never stores.

Privacy is easier to verify by what is absent than by what is promised. Here is what Anuma servers do not, and cannot, hold.

Plaintext memory content.
Your messages, notes, and saved facts are encrypted on your device before they ever touch our servers. We store ciphertext we cannot read.
Your decryption keys.
Keys are derived from your wallet, on your device. They never reach our infrastructure. Even our most senior engineers cannot fetch them.
Behavioral profiles for ad targeting.
We do not build advertising profiles from your conversations. We are not in the ads business and never plan to be.
Training data sourced from your memory.
Your encrypted memory is never used to train models, ours or anyone else’s. We could not extract it if we wanted to.
Cross-user search indexes.
Embeddings are scoped to you. No combined index across users exists. Your memory is searchable by your device, by your key, and no one else’s.
Plaintext backups.
Backups stay encrypted end-to-end with the same wallet-derived key. If you delete a memory, no plaintext copy survives in cold storage.

The proof is structural, not verbal. We cannot read your memory because we never receive a key. The architecture makes the promise, not a policy page.

Memory questions.

Everything you need to know about how Anuma remembers.

Ask us anything

Memory Engine automatically indexes and searches past conversations. Memory Vault stores specific facts and preferences that are explicitly saved.

Memory Engine indexes everything automatically. For Memory Vault, the AI suggests what to save, but you can approve or reject each memory.

Every message and memory is converted into a vector that captures its meaning. Search finds the most similar vectors using cosine similarity, so you do not need exact keyword matches.

Yes. Memory Vault supports folders and scopes. You can filter searches by folder, mark memories as private or shared, and move entries between folders.

Memory Vault uses supersession detection. When a newer memory is semantically similar to an older one and enough time has passed, the newer version is automatically prioritized in search results.

Yes. All memory content is encrypted with your wallet-derived key using AES-GCM-256. Memory content, evidence, and keys are encrypted. Metadata like type and confidence scores are not.

Yes. You can delete individual memories or do a full bulk delete from your account. Deletion is permanent.

Yes. Your memory loads into every conversation regardless of which model you use. Memory Engine and Memory Vault work with ChatGPT, Claude, Gemini, and all other models on Anuma.

There is no hard limit. The Memory Vault cache holds 5,000 vectors for fast access. Memories beyond the cache are still searchable but may take slightly longer on first access.

Give your AI a memory that belongs to you.

Memory Engine and Memory Vault give every AI model complete context. Encrypted, private, and yours.

Get started for free

Memory Engine

Memory Vault

Source

Conversation history

Explicitly saved memories

Updates

Automatic

Save/update via agent

Organization

By conversation

By folder and scope

Search default

8 results, 0.3 threshold

5 results, 0.1 threshold

Chunking

Sentence-based, ~400 chars

Whole memory as single unit

Staleness handling

Round-robin dedup

Supersession detection

Caching

Optional per-embedding

LRU cache (5,000 entries)

Best for

Recalling past discussions

Persistent facts and preferences

How AnumaPrivate Memory Works

Four steps from conversation to recall

Capture

Encrypt

Store

Retrieve

Anuma Memory Engine

Semantic search.

Smart chunking.

Diverse results.

Context expansion.

Anuma Memory Vault

Save, update, and organize.

Semantic search with caching.

Supersession detection.

User control.

Better together.

Encrypted on your device. Stored as ciphertext. Yours to control.

Wallet-derived encryption.

Encrypted before anything syncs.

We literally cannot read it.

Unified across models and devices.

What leaves your device, what stays on it.

Same job. Two approaches to AI memory.

What Anuma never stores.

Memory questions.

Give your AI a memory that belongs to you.

How AnumaPrivate Memory Works

Four steps from conversation to recall

Capture

Encrypt

Store

Retrieve

Anuma Memory Engine

Semantic search.

Smart chunking.

Diverse results.

Context expansion.

Anuma Memory Vault

Save, update, and organize.

Semantic search with caching.

Supersession detection.

User control.

Better together.

Encrypted on your device. Stored as ciphertext. Yours to control.

Wallet-derived encryption.

Encrypted before anything syncs.

We literally cannot read it.

Unified across models and devices.

What leaves your device, what stays on it.

Same job. Two approaches to AI memory.

What Anuma never stores.

Memory questions.

Give your AI a memory that belongs to you.

How Anuma
Private Memory Works

How Anuma
Private Memory Works