Skip to content

UnloadChunk: bulk probe extraction for chunk batches#782

Open
DAlperin wants to merge 3 commits into
TimelyDataflow:master-nextfrom
DAlperin:unload-chunk
Open

UnloadChunk: bulk probe extraction for chunk batches#782
DAlperin wants to merge 3 commits into
TimelyDataflow:master-nextfrom
DAlperin:unload-chunk

Conversation

@DAlperin

@DAlperin DAlperin commented Jul 3, 2026

Copy link
Copy Markdown

The bulk-read capability for chunks whose bodies may not be resident: probe hits are copied into caller-owned staging, and no borrow of chunk contents crosses the trait boundary. The cursor path cannot say this — Cursor::key<'a>(&self, storage: &'a Storage) -> Key<'a> lets the caller hold borrows for the batch's lifetime, so the only sound paged read under it is ColChunk's fetch-and-cache-forever OnceCell. UnloadChunk is the read surface where eviction is possible: any fetch or pin a paged implementor needs is scoped inside a single method call.

The trait keeps Chunk's discipline: key/val/time/diff opinions stay off it (Staging and Probes are opaque associated types — staging is the chunk family's resident layout, the trie for columnar and the row vec for vec), and the one comparison the batch driver needs is delegated to the chunk via locate, answered from resident metadata like bounds() on the cursor path. Chunk boundaries are carried as a protocol rather than an index: a chunk consumes every probe strictly below its last key and extracts-but-does-not-consume a probe equal to it, so the driver re-offers that probe to the next chunk and staging's append stitches the straddling continuation. Times are copied verbatim; since-advancement stays with the consumer, exactly as on the cursor path.

Contents:

  • trace/chunk: the UnloadChunk trait, a provided ChunkBatch driver (gallop the chunk list by locate, open only the chunks a probe touches, fetch_into as the scan path), and the VecChunk implementation.
  • columnar: UpdatesBuilder::meld_keys, melding a key-index range of a borrowed UpdatesView by bulk column-range copies (meld now delegates to it). Because it works on views, one primitive serves typed tries and wire bytes: ColChunk's paged arm reads a zero-copy view over the loaded bytes (spill::read) for the scope of the call — no trie rebuild, and the cache stays unpopulated.
  • Tests: filter-oracle property tests over random chains for vec and col, resident and paged (asserting caches stay empty); exhaustive (chunk cut × probe placement) boundary coverage including a key spanning several chunks; a meld_keys property test against filtered form; paged extraction and scan round-trips with spill stats.
  • chunk_bench: extraction vs the straddle cursor, both consuming into owned staging. At 1M updates: dense String probes 260ms cursor vs 12.9ms extraction, sparse (1%) probes at 2x better than cursor, fat 4x256B values at the pure-meld copy floor (fetch_into parity — the probe machinery costs nothing). Paged cold reads widen every gap, since the cursor decodes-and-caches while extraction views the spilled bytes.

This is the counterpart of #781 one layer down. That PR's backend presentations (present0, present_input) have exactly this contract — sorted probe keys in, an owned restricted run out, nothing borrowed across the boundary — and leave their implementation to the backend; extract_into and meld_keys are the machinery a chunked, spillable backend would implement them with. Both sit on the same two seams: tactics as the consumption dispatch point (#773), and cursor-less chunks as full trace citizens (#778).

🤖 Generated with Claude Code

DAlperin and others added 3 commits July 3, 2026 02:51
UnloadChunk is the read surface for chunks whose bodies may not be
resident: probe hits are copied into caller-owned staging, and no borrow
of chunk contents crosses the trait boundary. Key/val/time/diff opinions
stay off the trait (Staging and Probes are opaque); the one comparison
the batch driver needs is delegated to the chunk via locate(), answered
from resident metadata like bounds() on the cursor path.

ChunkBatch gains a provided driver: gallop the chunk list by locate,
open only the chunks a probe touches, and re-offer a probe left
unconsumed at a chunk's last key to the next chunk, where staging's
append stitches the straddling continuation (the consume-index
protocol). fetch_into is the scan path.

VecChunk is the worked implementation. Tests: a random-chain property
test against the filter oracle, exhaustive (chunk cut x probe placement)
boundary coverage including a key spanning several chunks, and a fetch
round-trip.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… builder

UpdatesBuilder gains meld_keys: meld a key-index range of a borrowed
UpdatesView -- the keys with their complete val/time/diff subtrees -- by
bulk column-range copies, with the same boundary-stitching semantics as
meld (which now delegates to it over the full range). The view may sit
over typed columns or directly over wire bytes, so one primitive serves
resident and spilled sources.

ColChunk's extraction gallops the probe column against the key column
and melds maximal runs of consecutive hits as single range copies. On
the paged arm a fetch is scoped to the call and reads a zero-copy view
over the loaded bytes (spill::read) -- no trie rebuild, and the OnceCell
cache stays unpopulated, unlike the cursor path's fetch-and-cache-
forever. locate answers from resident metadata (ChunkMeta when paged).

Tests: a filter-oracle property test over random chains for resident
and paged chunks (asserting caches stay empty), a straddle fixture, a
paged extraction + scan test with spill stats, and a meld_keys property
test against filtered form.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A read-side section for the UnloadChunk path: both paths consume the
probed updates of a columnar ChunkBatch into owned staging -- the
straddle cursor by per-probe seeks and owned copies, extraction by one
extract_into over trie staging. Shapes: dense String-keyed probes (the
design doc's headline case), sparse (1%) u64 probes, dense u64 probes,
and fat 4x256B values; the first two also run against paged batches
read cold, where the cursor path decodes-and-caches and extraction
views the spilled bytes.

At 1M updates on an M-series laptop: dense String probes 260ms cursor
vs 12.9ms extraction (20x; the design doc's 21ms-class admission),
sparse probes at 2x better than cursor parity, fat values 3.5x with
extract_into at the pure-meld copy floor (fetch_into parity, so the
probe machinery costs nothing). Paged cold reads widen every gap.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant