UnloadChunk: bulk probe extraction for chunk batches by DAlperin · Pull Request #782 · TimelyDataflow/differential-dataflow

DAlperin · 2026-07-03T18:39:27Z

The bulk-read capability for chunks whose bodies may not be resident: probe hits are copied into caller-owned staging, and no borrow of chunk contents crosses the trait boundary. The cursor path cannot say this — Cursor::key<'a>(&self, storage: &'a Storage) -> Key<'a> lets the caller hold borrows for the batch's lifetime, so the only sound paged read under it is ColChunk's fetch-and-cache-forever OnceCell. UnloadChunk is the read surface where eviction is possible: any fetch or pin a paged implementor needs is scoped inside a single method call.

The trait keeps Chunk's discipline: key/val/time/diff opinions stay off it (Staging and Probes are opaque associated types — staging is the chunk family's resident layout, the trie for columnar and the row vec for vec), and the one comparison the batch driver needs is delegated to the chunk via locate, answered from resident metadata like bounds() on the cursor path. Chunk boundaries are carried as a protocol rather than an index: a chunk consumes every probe strictly below its last key and extracts-but-does-not-consume a probe equal to it, so the driver re-offers that probe to the next chunk and staging's append stitches the straddling continuation. Times are copied verbatim; since-advancement stays with the consumer, exactly as on the cursor path.

Contents:

trace/chunk: the UnloadChunk trait, a provided ChunkBatch driver (gallop the chunk list by locate, open only the chunks a probe touches, fetch_into as the scan path), and the VecChunk implementation.
columnar: UpdatesBuilder::meld_keys, melding a key-index range of a borrowed UpdatesView by bulk column-range copies (meld now delegates to it). Because it works on views, one primitive serves typed tries and wire bytes: ColChunk's paged arm reads a zero-copy view over the loaded bytes (spill::read) for the scope of the call — no trie rebuild, and the cache stays unpopulated.
Tests: filter-oracle property tests over random chains for vec and col, resident and paged (asserting caches stay empty); exhaustive (chunk cut × probe placement) boundary coverage including a key spanning several chunks; a meld_keys property test against filtered form; paged extraction and scan round-trips with spill stats.
chunk_bench: extraction vs the straddle cursor, both consuming into owned staging. At 1M updates: dense String probes 260ms cursor vs 12.9ms extraction, sparse (1%) probes at 2x better than cursor, fat 4x256B values at the pure-meld copy floor (fetch_into parity — the probe machinery costs nothing). Paged cold reads widen every gap, since the cursor decodes-and-caches while extraction views the spilled bytes.

This is the counterpart of #781 one layer down. That PR's backend presentations (present0, present_input) have exactly this contract — sorted probe keys in, an owned restricted run out, nothing borrowed across the boundary — and leave their implementation to the backend; extract_into and meld_keys are the machinery a chunked, spillable backend would implement them with. Both sit on the same two seams: tactics as the consumption dispatch point (#773), and cursor-less chunks as full trace citizens (#778).

🤖 Generated with Claude Code

UnloadChunk is the read surface for chunks whose bodies may not be resident: probe hits are copied into caller-owned staging, and no borrow of chunk contents crosses the trait boundary. Key/val/time/diff opinions stay off the trait (Staging and Probes are opaque); the one comparison the batch driver needs is delegated to the chunk via locate(), answered from resident metadata like bounds() on the cursor path. ChunkBatch gains a provided driver: gallop the chunk list by locate, open only the chunks a probe touches, and re-offer a probe left unconsumed at a chunk's last key to the next chunk, where staging's append stitches the straddling continuation (the consume-index protocol). fetch_into is the scan path. VecChunk is the worked implementation. Tests: a random-chain property test against the filter oracle, exhaustive (chunk cut x probe placement) boundary coverage including a key spanning several chunks, and a fetch round-trip. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

… builder UpdatesBuilder gains meld_keys: meld a key-index range of a borrowed UpdatesView -- the keys with their complete val/time/diff subtrees -- by bulk column-range copies, with the same boundary-stitching semantics as meld (which now delegates to it over the full range). The view may sit over typed columns or directly over wire bytes, so one primitive serves resident and spilled sources. ColChunk's extraction gallops the probe column against the key column and melds maximal runs of consecutive hits as single range copies. On the paged arm a fetch is scoped to the call and reads a zero-copy view over the loaded bytes (spill::read) -- no trie rebuild, and the OnceCell cache stays unpopulated, unlike the cursor path's fetch-and-cache- forever. locate answers from resident metadata (ChunkMeta when paged). Tests: a filter-oracle property test over random chains for resident and paged chunks (asserting caches stay empty), a straddle fixture, a paged extraction + scan test with spill stats, and a meld_keys property test against filtered form. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

A read-side section for the UnloadChunk path: both paths consume the probed updates of a columnar ChunkBatch into owned staging -- the straddle cursor by per-probe seeks and owned copies, extraction by one extract_into over trie staging. Shapes: dense String-keyed probes (the design doc's headline case), sparse (1%) u64 probes, dense u64 probes, and fat 4x256B values; the first two also run against paged batches read cold, where the cursor path decodes-and-caches and extraction views the spilled bytes. At 1M updates on an M-series laptop: dense String probes 260ms cursor vs 12.9ms extraction (20x; the design doc's 21ms-class admission), sparse probes at 2x better than cursor parity, fat values 3.5x with extract_into at the pure-meld copy floor (fetch_into parity, so the probe machinery costs nothing). Paged cold reads widen every gap. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

DAlperin and others added 3 commits July 3, 2026 02:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UnloadChunk: bulk probe extraction for chunk batches#782

UnloadChunk: bulk probe extraction for chunk batches#782
DAlperin wants to merge 3 commits into
TimelyDataflow:master-nextfrom
DAlperin:unload-chunk

DAlperin commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

DAlperin commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant