Skip to content

Headroom compression measures 0% savings in current OSS/unlicensed deployment #100

Description

@lsfera

Background

#84's Phase 1 live validation (see PR #96, comment) confirmed the headroom
integration is wired correctly end-to-end — both the one-shot prompt
compression (context-compressor.ts) and the full-session proxy routing
(sandbox-runner.ts's sandboxEnvANTHROPIC_BASE_URL) reach the
headroom service, and its stats are tracked accurately.

But measured compression savings across every live test run so far — 2 direct
compress() calls plus 14 real proxied API requests — is 0%:

compressions_by_strategy: {}          (empty — never fired)
compression_cache.total_tokens_saved: 0
cost.total_tokens_saved: 0            (of $0.627 total spend)
agent_usage.totals.savings_percent: 0.0

The only nonzero discount anywhere is Anthropic's own native prefix-caching
(prefix_cache.discount_usd: $0.4988) — unrelated to headroom.

Likely root cause

From the headroom container's own startup banner:

License:      OSS (no license key)
Code-Aware:   DISABLED  (install headroom-ai[code] to enable)

Headroom's compression is turn-based/staleness-driven
(HEADROOM_COMPRESSION_STABLE_AFTER_TURN / HEADROOM_STALE_READ_COMPRESS_AFTER_TURNS),
and the code-aware strategies that do most of the actual compression work
aren't available without a license key. Our test sessions were short
(≤14 turns) one-shot or small multi-turn runs — plausibly too short to cross
whatever staleness threshold triggers compression, on top of code-aware
being unavailable at all.

Open questions to resolve before further investment here

  • Is a Headroom Cloud / licensed key available or worth acquiring, and would
    it actually change compressions_by_strategy from empty?
  • Does tuning HEADROOM_COMPRESSION_STABLE_AFTER_TURN /
    HEADROOM_STALE_READ_COMPRESS_AFTER_TURNS down (to trigger compression
    sooner in shorter sessions) produce nonzero savings on the OSS tier alone,
    without a license?
  • If neither moves the needle, is the one-shot context-compressor.ts
    prompt-compression path (which by construction never has multi-turn
    history to compress) worth keeping at all, versus relying solely on the
    proxy-routing path for any future compression gains?

This is a product/spend decision (whether to acquire a license) as much as
an engineering one, so left unlabeled for ready-for-agent pending a human
call on scope and budget.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions