feat(compiler): opt-in top-K retrieval for concepts-plan (bound plan prompt as the KB grows)#92
Conversation
The concepts-plan step injects every existing concept/entity brief, so the prompt grows O(N) with the KB (~2k->18k tokens at a few hundred concepts), hurting speed/cost and the model's ability to reconcile against the right existing pages (-> near-duplicate concepts). Add an opt-in top-K relevance filter (openkb/retrieval.py, dependency-free TF-IDF cosine over brief lines, query = doc summary). Off by default (concepts_plan_retrieval / concepts_plan_retrieval_k in config), so behaviour is unchanged unless enabled. The select_relevant_briefs() interface is swappable for an embedding-based ranker later. Prototype for the O(N)->O(K) plan-context scaling discussion.
Benchmark on a real 335-concept KB (ground truth = summary concept links): TF-IDF recall@40=0.90 vs embeddings 0.79. Dependency-free TF-IDF wins on this LLM-generated corpus (high lexical overlap) and is free per-doc, so it stays the default; embedding ranker kept as an option for higher-drift corpora.
|
Took a close look — the approach is sound and the key safety property checks out, so this is good to land with one cleanup. Verified: filtering the briefs does not shrink the wikilink whitelist. The top-K filter only narrows the plan-step context ( One ask before merge — the embedding ranker is dead code on the prod path. Minor:
Nice work — the diagnosis and the opt-in/default-off rollout are exactly right. |
Motivation
The
concepts-planstep injects every existing concept/entity brief into the prompt (_read_concept_briefs/_read_entity_briefsread the wholeconcepts/andentities/dirs). So the plan prompt grows O(N) with the KB. On a 165-doc KB this was observed climbing from ~2k tokens early to ~15–18k tokens at a few hundred concepts — which hurts cost/latency on every subsequent doc and degrades the model's ability to reconcile a new doc against the right existing pages (it starts creating near-duplicate concepts).What this does (opt-in, default off)
Adds
openkb/retrieval.pyand a small block in_compile_conceptsthat, only when enabled, keeps the top-K briefs most relevant to the current doc's summary instead of all of them. Two config keys (default off → behaviour byte-identical):select_relevant_briefs_embed, provider injected — no SDK dependency in the module) is included for higher-drift corpora / future hybrid use.Benchmark (recall@K vs. ground-truth concept links)
On a real 335-concept / 489-entity KB, using each summary's
[[concepts/X]]links as ground truth and the summary as the query (scripts/bench_retrieval.py):TF-IDF wins here (LLM-generated briefs share heavy lexical overlap with summaries) and is free per-doc, so it's the default. K=40 recovers ~90% of the relevant concepts at 12% of the full-inject prompt size.
Tradeoff
At K=40, ~10% of relevant existing concepts may fall outside the window for a given doc (small risk of a duplicate concept) in exchange for a bounded plan prompt as the KB scales. Off by default; tune
..._khigher to trade prompt size for recall.Tests:
tests/test_retrieval.py(ranker behaviour) + full suite green.