Right context.
For every request.
Prompts, summaries, memory, and files assembled automatically. No prompt engineering required.
System Prompt Assembly
Built dynamically from seven prioritized sections. Personality, date, preferences, style, sentiment, language, and formatting.
Progressive Summarization
Older messages compress into a running summary. Recent exchanges stay verbatim. Full context, always within token budget.
Context Injection
Memory, search results, and files are injected as system messages before your prompt reaches the model. No manual engineering required.
History Windowing
A configurable cap on conversation history prevents overflow. Defaults to the 50 most recent messages. Works alongside summarization.
Extended Thinking
For supported models, Anuma captures internal reasoning separately from the response. Thinking budgets and effort levels are configurable.
File Processing
PDFs, Excel, Word, and ZIPs preprocessed automatically. Scanned PDFs fall back to vision. Extraction runs before the model sees anything.
Dynamic system prompts.
The system prompt is not static. Anuma builds it dynamically from up to seven prioritized sections. Each section has a priority that controls the order it renders.
The base section carries core personality instructions. Anuma layers date and time, user preferences, communication style, sentiment, language, and platform formatting on top.
Smart summarization.
Long conversations accumulate tokens fast. Instead of truncating history or hitting context limits, Anuma progressively summarizes older messages while keeping recent ones intact.
The last two exchanges always stay word-for-word. A dedicated lower cost model runs summarization. When the summary itself grows too large, a compaction pass cuts it in half.
Raw history
... 112 more messages
Compressed
Summary (118 msgs)
Project launched Mar 1. Budget $12K. Team of 4. Timeline: 8 weeks. Key decisions: React frontend, Cloudflare deploy...
Verbatim (last 2)
Three layers of context.
On each turn, Anuma injects up to three types of context as system messages before your prompt reaches the model. Memory results from past conversations and your saved knowledge base. Search results from web and document queries with source metadata for citations.
Processed file contents from PDFs, spreadsheets, and documents. Each layer is added only when relevant. The injection order is fixed to keep context predictable across every request.
Summary
Compressed history
Memory
Engine + Vault
Search
Web + documents
Files
PDF, Excel, Word
Conversation history
Windowed to last 50
Your message
Current turn
Adapts to how you communicate.
Anuma analyzes your last 10 messages to build a style profile covering tone, length, vocabulary, and patterns. The first profile generates after 5 messages and refreshes every 20.
Each profile is cached and capped at 200 characters. You can opt out anytime.
Explain the new caching layer.
Without style analysis
The new caching layer introduces a multi-tiered approach to data storage that leverages both in-memory and disk-based caching mechanisms. The system utilizes an LRU eviction policy with configurable TTL values...
Long, formal, verbose
With style analysis
New caching layer:
- LRU with configurable TTL
- In-memory + disk tiers
- 3x faster reads on cache hit
- Auto-evicts at 80% capacity
Matched: casual, concise, bullet points
Files processed before the model sees them.
PDFs are extracted to text first. Scanned pages with no text convert to images for vision model fallback. Excel files become structured JSON preserving sheet names. Word documents become plain text. ZIP archives unpack recursively.
Text extraction first. Scanned pages fall back to vision model.
Converted to structured JSON preserving sheet names and row data.
Extracted to plain text. Full document content preserved.
Unpacked recursively. Each file inside processed by type.
10MB
Size limit
30s
Timeout
Auto
Detection
Context travels across models.
Switch from Claude to GPT to Gemini mid-conversation. Your context follows. Anuma rebuilds the full context window for each model using the same memory, summaries, and conversation history.
Each model receives an optimized payload. Token limits, prompt formats, and capability differences are handled automatically. You get continuity without copy-pasting or re-explaining.
Without Anuma
With Anuma
Context pipeline at a glance.
System Prompt | Summarization | Context Injection | History Window | Extended Thinking | |
|---|---|---|---|---|---|
| Runs on | Every turn | When tokens exceed budget | When context provided | Every turn | When model supports it |
| Default | Base prompt + date | Disabled (opt-in) | No injection | Last 50 messages | Model default |
| Cost impact | Prompt tokens | Separate lower cost model call | System message tokens | Limits history tokens | Budget-controlled |
| Persistence | Rebuilt each turn | Cached in database | Per-request only | None | Stored with message |
| Failure mode | Graceful section dropping | Falls back to verbatim | Skipped if absent | Hard slice | Captured if available |
Context questions.
Everything you need to know about how Anuma builds context.
Ask us anythingAnuma runs a full context pipeline on every turn. It assembles the system prompt from prioritized sections, injects relevant memory and search results, includes processed file contents, and appends conversation history. Each layer is added automatically based on what is available and relevant.
Conversation history is the messages in your current session. Memory is long-term knowledge that persists across all conversations and models. Memory Engine indexes past conversations for semantic search. Memory Vault stores facts you explicitly save. Both are injected into context automatically.
Yes. Anuma rebuilds the full context window for the new model using the same memory, conversation summaries, and history. Each model receives a payload optimized for its token limits and prompt format. No information is lost.
Memory injection is consistent across models. Every model receives the same relevant memory context on each turn. You can control what is stored in your Memory Vault and set entries to private, but all active models access the same pool of context.
Yes. You can edit, delete, or set individual memory entries to private at any time. Private entries are excluded from context injection. You have full control over what the model sees.
Multiple safeguards. History windowing caps messages at 50 by default. Summarization compresses older messages into a running summary. System prompt sections drop by priority. Summary compaction triggers at 80% capacity. Every failure mode falls back gracefully.
No. Memory retrieval uses semantic search with a cached vector index. Results are injected as a single system message. The added token cost is minimal compared to conversation history. Memory lookup typically takes milliseconds.
A small fraction. Memory results are concise and relevant. The system prompt, memory injection, and search results combined typically use less than 15% of the available context window. The majority is reserved for conversation history and your current message.
No. The context pipeline runs automatically on every turn. System prompt assembly, summarization, memory injection, and file processing all happen without configuration. You can customize settings if you want, but the defaults work out of the box.
Every model gets the full picture.
The context pipeline assembles system prompts, memory, summaries, and files on every turn.