Skip to content

chopratejas/headroom

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

668 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Headroom

Compress everything your AI agent reads. Same answers, fraction of the tokens.

CI PyPI npm Model: Kompress-base Tokens saved: 60B+ License: Apache 2.0 Docs

Headroom in action

Every tool call, log line, DB read, RAG chunk, and file your agent injects into a prompt is mostly boilerplate. Headroom strips the noise and keeps the signal — losslessly, locally, and without touching accuracy.

100 logs. One FATAL error buried at position 67. Both runs found it. Baseline 10,144 tokens → Headroom 1,260 tokens87% fewer, identical answer. python examples/needle_in_haystack_test.py


Quick start

Works with Anthropic, OpenAI, Google, Bedrock, Vertex, Azure, OpenRouter, and 100+ models via LiteLLM.

Wrap your coding agent — one command:

pip install "headroom-ai[all]"

headroom wrap claude      # Claude Code
headroom wrap codex       # Codex
headroom wrap cursor      # Cursor
headroom wrap aider       # Aider
headroom wrap copilot     # GitHub Copilot CLI

Drop it into your own code — Python or TypeScript:

from headroom import compress

result = compress(messages, model="claude-sonnet-4-5")
response = client.messages.create(model="claude-sonnet-4-5", messages=result.messages)
print(f"Saved {result.tokens_saved} tokens ({result.compression_ratio:.0%})")
import { compress } from 'headroom-ai';
const result = await compress(messages, { model: 'gpt-4o' });

Or run it as a proxy — zero code changes, any language:

headroom proxy --port 8787
ANTHROPIC_BASE_URL=http://localhost:8787 your-app
OPENAI_BASE_URL=http://localhost:8787/v1 your-app

Why Headroom

  • Accuracy-preserving. GSM8K 0.870 → 0.870 (±0.000). TruthfulQA +0.030. SQuAD v2 and BFCL both 97% accuracy after compression. Validated on public OSS benchmarks you can rerun yourself.
  • Runs on your machine. No cloud API, no data egress. Compression latency is milliseconds — faster end-to-end for Sonnet / Opus / GPT-4 class models than a hosted service round-trip.
  • Kompress-base on HuggingFace. Our open-source text compressor, fine-tuned on real agentic traces — tool outputs, logs, RAG chunks, code. Install with pip install "headroom-ai[ml]".
  • Cross-agent memory and learning. Claude Code saves a fact, Codex reads it back. headroom learn mines failed sessions and writes corrections straight to CLAUDE.md / AGENTS.md / GEMINI.md — reliability compounds over time.
  • Reversible (CCR). Compression is not deletion. The model can always call headroom_retrieve to pull the original bytes. Nothing is thrown away.

Bundles the RTK binary for shell-output rewriting — full attribution below.


How it fits

 Your agent / app
   (Claude Code, Cursor, Codex, LangChain, Agno, Strands, your own code…)
        │   prompts · tool outputs · logs · RAG results · files
        ▼
    ┌────────────────────────────────────────────────────┐
    │  Headroom   (runs locally — your data stays here)  │
    │  ───────────────────────────────────────────────   │
    │  CacheAligner  →  ContentRouter  →  CCR             │
    │                    ├─ SmartCrusher   (JSON)         │
    │                    ├─ CodeCompressor (AST)          │
    │                    └─ Kompress-base  (text, HF)     │
    │                                                     │
    │  Cross-agent memory  ·  headroom learn  ·  MCP      │
    └────────────────────────────────────────────────────┘
        │   compressed prompt  +  retrieval tool
        ▼
 LLM provider  (Anthropic · OpenAI · Bedrock · …)

Architecture · CCR reversible compression · Kompress-base model card


Proof

Savings on real agent workloads:

Workload Before After Savings
Code search (100 results) 17,765 1,408 92%
SRE incident debugging 65,694 5,118 92%
GitHub issue triage 54,174 14,761 73%
Codebase exploration 78,502 41,254 47%

Accuracy preserved on standard benchmarks:

Benchmark Category N Baseline Headroom Delta
GSM8K Math 100 0.870 0.870 ±0.000
TruthfulQA Factual 100 0.530 0.560 +0.030
SQuAD v2 QA 100 97% 19% compression
BFCL Tools 100 97% 32% compression

Reproduce:

python -m headroom.evals suite --tier 1

Community, live:

Full benchmarks & methodology


Built for coding agents

Agent One-command wrap Notes
Claude Code headroom wrap claude --memory for cross-agent memory, --code-graph for codebase intel
Codex headroom wrap codex --memory Shares the same memory store as Claude
Cursor headroom wrap cursor Prints Cursor config — paste once, done
Aider headroom wrap aider Starts proxy, launches Aider
Copilot CLI headroom wrap copilot Starts proxy, launches Copilot
OpenClaw headroom wrap openclaw Installs Headroom as ContextEngine plugin

MCP-native too — headroom mcp install exposes headroom_compress, headroom_retrieve, and headroom_stats to any MCP client.

headroom learn in action

Integrations

Drop Headroom into any stack
Your setup Hook in with
Any Python app compress(messages, model=…)
Any TypeScript app await compress(messages, { model })
Anthropic / OpenAI SDK withHeadroom(new Anthropic()) · withHeadroom(new OpenAI())
Vercel AI SDK wrapLanguageModel({ model, middleware: headroomMiddleware() })
LiteLLM litellm.callbacks = [HeadroomCallback()]
LangChain HeadroomChatModel(your_llm)
Agno HeadroomAgnoModel(your_model)
Strands Strands guide
ASGI apps app.add_middleware(CompressionMiddleware)
Multi-agent SharedContext().put / .get
MCP clients headroom mcp install
What's inside
  • SmartCrusher — universal JSON: arrays of dicts, nested objects, mixed types.
  • CodeCompressor — AST-aware for Python, JS, Go, Rust, Java, C++.
  • Kompress-base — our HuggingFace model, trained on agentic traces.
  • Image compression — 40–90% reduction via trained ML router.
  • CacheAligner — stabilizes prefixes so Anthropic/OpenAI KV caches actually hit.
  • IntelligentContext — score-based context fitting with learned importance.
  • CCR — reversible compression; LLM retrieves originals on demand.
  • Cross-agent memory — shared store, agent provenance, auto-dedup.
  • SharedContext — compressed context passing across multi-agent workflows.
  • headroom learn — plugin-based failure mining for Claude, Codex, Gemini.

Install

pip install "headroom-ai[all]"          # Python, everything
npm  install headroom-ai                # TypeScript / Node
docker pull ghcr.io/chopratejas/headroom:latest

Granular extras: [proxy], [mcp], [ml] (Kompress-base), [agno], [langchain], [evals]. Requires Python 3.10+.

Installation guide — Docker tags, persistent service, PowerShell, devcontainers.


Documentation

Start here Go deeper
Quickstart Architecture
Proxy How compression works
MCP tools CCR — reversible compression
Memory Cache optimization
Failure learning Benchmarks
Configuration Limitations

Compared to

Headroom runs locally, covers every content type (not just CLI or text), works with every major framework, and is reversible.

Scope Deploy Local Reversible
Headroom All context — tools, RAG, logs, files, history Proxy · library · middleware · MCP Yes Yes
RTK CLI command outputs CLI wrapper Yes No
Compresr, Token Co. Text sent to their API Hosted API call No No
OpenAI Compaction Conversation history Provider-native No No

Attribution. Headroom ships with the excellent RTK binary for shell-output rewriting — git showgit show --short, noisy ls → scoped, chatty installers → summarized. Huge thanks to the RTK team; their tool is a first-class part of our stack, and Headroom compresses everything downstream of it.


Contributing

git clone https://github.com/chopratejas/headroom.git && cd headroom
pip install -e ".[dev]" && pytest

Devcontainers in .devcontainer/ (default + memory-stack with Qdrant & Neo4j). See CONTRIBUTING.md.


Community

License

Apache 2.0 — see LICENSE.