How portable
context works.

Anuma assembles the full context window for every AI request, so you never re-explain yourself.

System Prompt Assembly

Built dynamically from seven prioritized sections. Personality, date, preferences, style, sentiment, language, and formatting.

Progressive Summarization

Older messages compress into a running summary. Recent exchanges stay verbatim. Full context, always within token budget.

Context Injection

Memory, search results, and files are injected as system messages before your prompt reaches the model. No manual engineering required.

History Windowing

A configurable cap on conversation history prevents overflow. Defaults to the 50 most recent messages. Works alongside summarization.

Extended Thinking

For supported models, Anuma captures internal reasoning separately from the response. Thinking budgets and effort levels are configurable.

File Processing

PDFs, Excel, Word, and ZIPs preprocessed automatically. Scanned PDFs fall back to vision. Extraction runs before the model sees anything.

System Prompt Assembly

Built dynamically from seven prioritized sections. Personality, date, preferences, style, sentiment, language, and formatting.

Progressive Summarization

Older messages compress into a running summary. Recent exchanges stay verbatim. Full context, always within token budget.

Context Injection

Memory, search results, and files are injected as system messages before your prompt reaches the model. No manual engineering required.

History Windowing

A configurable cap on conversation history prevents overflow. Defaults to the 50 most recent messages. Works alongside summarization.

Extended Thinking

For supported models, Anuma captures internal reasoning separately from the response. Thinking budgets and effort levels are configurable.

File Processing

PDFs, Excel, Word, and ZIPs preprocessed automatically. Scanned PDFs fall back to vision. Extraction runs before the model sees anything.

Dynamic system prompts.

The system prompt is not static. Anuma builds it dynamically from up to seven prioritized sections. Each section has a priority that controls the order it renders.

The base section carries core personality instructions. Anuma layers date and time, user preferences, communication style, sentiment, language, and platform formatting on top.

Priority-orderedMax length enforcementCustom sections

Prompt assembly order

Personality

Date

Preferences

Style

Sentiment

Language

Platform

← Higher priorityLower priority →

Over limit? Lower priority sections drop first

Smart summarization.

Long conversations accumulate tokens fast. Instead of truncating history or hitting context limits, Anuma progressively summarizes older messages while keeping recent ones intact.

The last two exchanges always stay word-for-word. A dedicated lower cost model runs summarization. When the summary itself grows too large, a compaction pass cuts it in half.

Token budgetAuto-compactionCheap model

Summarization flow120 → 24 messages

Raw history

msg 1: Project kickoff...

msg 2: Budget allocation...

msg 3: Timeline draft...

msg 4: Team roles...

... 112 more messages

msg 119: Final review

msg 120: Ship it

Compressed

Summary (118 msgs)

Project launched Mar 1. Budget $12K. Team of 4. Timeline: 8 weeks. Key decisions: React frontend, Cloudflare deploy...

Verbatim (last 2)

Final review complete

Ship it

Model sees summary + last 2 exchanges. Nothing lost.

Three layers of context.

On each turn, Anuma injects up to three types of context as system messages before your prompt reaches the model. Memory results from past conversations and your saved knowledge base. Search results from web and document queries with source metadata for citations.

Processed file contents from PDFs, spreadsheets, and documents. Each layer is added only when relevant. The injection order is fixed to keep context predictable across every request.

Memory contextSearch contextFile context

Injection order

Summary

Compressed history

Memory

Engine + Vault

Web + documents

Files

PDF, Excel, Word

Conversation history

Windowed to last 50

Your message

Current turn

AI Model receives complete request

Adapts to how you communicate.

Anuma analyzes your last 10 messages to build a style profile covering tone, length, vocabulary, and patterns. The first profile generates after 5 messages and refreshes every 20.

Each profile is cached and capped at 200 characters. You can opt out anytime.

Auto-analysis200 char profileOpt-out supported

Style analysis

Explain the new caching layer.

Without style analysis

The new caching layer introduces a multi-tiered approach to data storage that leverages both in-memory and disk-based caching mechanisms. The system utilizes an LRU eviction policy with configurable TTL values...

Long, formal, verbose

With style analysis

New caching layer:

LRU with configurable TTL
In-memory + disk tiers
3x faster reads on cache hit
Auto-evicts at 80% capacity

Matched: casual, concise, bullet points

Style profile: “Casual, concise, technical. Prefers bullets.” (148 chars)

Files processed before the model sees them.

PDFs are extracted to text first. Scanned pages with no text convert to images for vision model fallback. Excel files become structured JSON preserving sheet names. Word documents become plain text. ZIP archives unpack recursively.

PDF + OCR fallbackExcel to JSON10MB limit30s timeout

File processingAutomatic · 30s timeout

PDF

Text extraction first. Scanned pages fall back to vision model.

20 pages maxOCR fallback

Excel

Converted to structured JSON preserving sheet names and row data.

Multi-sheetJSON output

Word

Extracted to plain text. Full document content preserved.

Plain textFull content

ZIP

Unpacked recursively. Each file inside processed by type.

RecursiveAuto-detect

10MB

Size limit

30s

Timeout

Auto

Detection

Context travels across models.

Switch from Claude to GPT to Gemini mid-conversation. Your context follows. Anuma rebuilds the full context window for each model using the same memory, summaries, and conversation history.

Each model receives an optimized payload. Token limits, prompt formats, and capability differences are handled automatically. You get continuity without copy-pasting or re-explaining.

Model-agnosticSeamless switchingFormat-adaptedNo re-prompting

Cross-model context

Without Anuma

Chat in Claude

Switch to GPT

Start overLost

With Anuma

Chat in Claude

Switch to GPT

Full contextKept

Context rebuilds automatically for each model

Context pipeline at a glance.

Context pipeline comparison
	System Prompt	Summarization	Context Injection	History Window	Extended Thinking
Runs on	Every turn	When tokens exceed budget	When context provided	Every turn	When model supports it
Default	Base prompt + date	Disabled (opt-in)	No injection	Last 50 messages	Model default
Cost impact	Prompt tokens	Separate lower cost model call	System message tokens	Limits history tokens	Budget-controlled
Persistence	Rebuilt each turn	Cached in database	Per-request only	None	Stored with message
Failure mode	Graceful section dropping	Falls back to verbatim	Skipped if absent	Hard slice	Captured if available

Context questions.

Everything you need to know about how Anuma builds context.

Ask us anything

Anuma runs a full context pipeline on every turn. It assembles the system prompt from prioritized sections, injects relevant memory and search results, includes processed file contents, and appends conversation history. Each layer is added automatically based on what is available and relevant.

Conversation history is the messages in your current session. Memory is long-term knowledge that persists across all conversations and models. Memory Engine indexes past conversations for semantic search. Memory Vault stores facts you explicitly save. Both are injected into context automatically.

Yes. Anuma rebuilds the full context window for the new model using the same memory, conversation summaries, and history. Each model receives a payload optimized for its token limits and prompt format. No information is lost.

Memory injection is consistent across models. Every model receives the same relevant memory context on each turn. You can control what is stored in your Memory Vault and set entries to private, but all active models access the same pool of context.

Yes. You can edit, delete, or set individual memory entries to private at any time. Private entries are excluded from context injection. You have full control over what the model sees.

Multiple safeguards. History windowing caps messages at 50 by default. Summarization compresses older messages into a running summary. System prompt sections drop by priority. Summary compaction triggers at 80% capacity. Every failure mode falls back gracefully.

No. Memory retrieval uses semantic search with a cached vector index. Results are injected as a single system message. The added token cost is minimal compared to conversation history. Memory lookup typically takes milliseconds.

A small fraction. Memory results are concise and relevant. The system prompt, memory injection, and search results combined typically use less than 15% of the available context window. The majority is reserved for conversation history and your current message.

No. The context pipeline runs automatically on every turn. System prompt assembly, summarization, memory injection, and file processing all happen without configuration. You can customize settings if you want, but the defaults work out of the box.

Every model gets the full picture.

The context pipeline assembles system prompts, memory, summaries, and files on every turn.

Get started for free

System Prompt

Summarization

Context Injection

History Window

Extended Thinking

Runs on

Every turn

When tokens exceed budget

When context provided

Every turn

When model supports it

Default

Base prompt + date

Disabled (opt-in)

No injection

Last 50 messages

Model default

Cost impact

Prompt tokens

Separate lower cost model call

System message tokens

Limits history tokens

Budget-controlled

Persistence

Rebuilt each turn

Cached in database

Per-request only

None

Stored with message

Failure mode

Graceful section dropping

Falls back to verbatim

Skipped if absent

Hard slice

Captured if available

How portable context works.

System Prompt Assembly

Progressive Summarization

Context Injection

History Windowing

Extended Thinking

File Processing

System Prompt Assembly

Progressive Summarization

Context Injection

History Windowing

Extended Thinking

File Processing

Dynamic system prompts.

Smart summarization.

Three layers of context.

Adapts to how you communicate.

Files processed before the model sees them.

Context travels across models.

Context pipeline at a glance.

Context questions.

Every model gets the full picture.

How portable context works.

System Prompt Assembly

Progressive Summarization

Context Injection

History Windowing

Extended Thinking

File Processing

System Prompt Assembly

Progressive Summarization

Context Injection

History Windowing

Extended Thinking

File Processing

Dynamic system prompts.

Smart summarization.

Three layers of context.

Adapts to how you communicate.

Files processed before the model sees them.

Context travels across models.

Context pipeline at a glance.

Context questions.

Every model gets the full picture.

How portable
context works.

How portable
context works.