Skip to content

feat: expose staged index segment transactions#3

Draft
ragnorc wants to merge 179 commits into
mainfrom
ragnorc/two-phase-index-segments
Draft

feat: expose staged index segment transactions#3
ragnorc wants to merge 179 commits into
mainfrom
ragnorc/two-phase-index-segments

Conversation

@ragnorc

@ragnorc ragnorc commented May 16, 2026

Copy link
Copy Markdown

Summary

  • Add DatasetIndexExt::build_existing_index_segments_transaction to prepare an Operation::CreateIndex transaction for existing index segments without committing.
  • Refactor commit_existing_index_segments to reuse the staged transaction path.
  • Add coverage that verifies staging does not publish the index and the returned transaction can be committed via CommitBuilder.

Tests

  • cargo fmt --all
  • cargo test -p lance existing_index_segments -- --nocapture
  • cargo test -p lance test_vector_execute_uncommitted_segments_commit_without_staging -- --nocapture
  • External crate cargo check against the new public API

Fixes lance-format#6666

Open in Web Open in Cursor 

Summary by cubic

Adds a staged transaction builder to publish existing index segments without auto-committing, enabling two-phase index creation; refactors the existing commit path to use it. Also pulls in recent search and tokenizer improvements from main.

  • New Features

    • DatasetIndexExt::build_existing_index_segments_transaction(index_name, column, segments) -> Transaction for staging, then commit via CommitBuilder (and commit_existing_index_segments now uses this path).
    • Search upgrades: batch KNN queries with a query_index column; fast_search for scalar indexes to scan only indexed fragments; ICU-based FTS tokenizer via base_tokenizer="icu".
  • Bug Fixes

    • Correctness: apply LIMIT after refine filters in filtered reads; fix stale reads in mem_wal via dedup-on-scan and deletion vectors; stable-row-id deletion masks honor fragment bitmaps; coerce expressions before simplification to avoid UDF panics.
    • Performance/maintenance: avoid extra stat calls on OpenDAL ranged reads; clean up IVF shuffler temp dirs.

Written for commit 53aa593. Summary will update on new commits.

Review in cubic

beinan and others added 29 commits June 3, 2026 07:04
)

## Issue
In distributed writes, workers can create uncommitted fragments and
defer the final commit until all fragments are ready. The Java
`WriteFragmentBuilder` only exposed `APPEND` mode without a way to pass
the target dataset schema, so lance-core had to open the existing
dataset to infer schema and field IDs before writing each fragment.

That dataset open is unnecessary when the caller already has the target
schema, and it becomes expensive for datasets with very large fragment
counts because opening the dataset has to load/read manifest metadata.
This shows up as fragment writing getting slower as the dataset grows,
even before the final commit step.

## Summary
- Add `WriteFragmentBuilder.schema(Schema)` so Java distributed writers
can pass the target dataset schema when creating uncommitted fragments.
- Pass the optional schema through Arrow FFI/JNI into
`FragmentCreateBuilder.schema(...)`, avoiding the append-mode dataset
open used only for schema inference.
- Preserve current base path / object store write parameters when the
schema override path is used.
- Add Java coverage for append fragment writes with a schema override.

## Benefits
- Lets Java callers avoid an expensive dataset open per fragment write
in distributed append workflows.
- Keeps Lance field IDs from the target dataset schema instead of
inferring from the incoming Arrow batches.
- Makes the Java API match the underlying Rust `FragmentCreateBuilder`
capability.
- Reduces write-time overhead for datasets with high fragment counts,
especially when many workers are writing fragments concurrently.

## Testing
- `cargo check --manifest-path
/tmp/lance-write-fragment-schema-pr/java/lance-jni/Cargo.toml`
- `./mvnw
-Dtest=org.lance.FragmentTest#testWriteFragmentWithSchemaOverride test`

---------

Co-authored-by: Beinan Wang <beinanwang@microsoft.com>
Adds Bitmap support to the existing segment-based distributed index
workflow.

Callers can now build staged Bitmap roots with
`create_index_uncommitted(..., index_type="BITMAP", fragment_ids=...)`,
finalize them through
`create_index_segment_builder().with_index_type("BITMAP").with_segments(...).build_all()`,
and publish them with `commit_existing_index_segments(...)`.

For Bitmap, `execute_uncommitted` now writes canonical
`bitmap_page_lookup.lance` segment roots directly. The old public Python
Bitmap shard workflow through `create_scalar_index(...,
fragment_ids=...)` and `merge_index_metadata(..., "BITMAP")` is no
longer exposed; callers should use the segment workflow instead.

Relates to OSS-971 and OSS-972.
lance-format#7057)

In the OnePartitionMultipleThreads path, io_metrics.record was only
called inside inspect_ok on the output batch stream. When a filter
produces zero matching rows, no batches flow through and inspect_ok
never fires, leaving bytes_read/iops/requests at 0 despite I/O having
occurred. Fix by also recording a final snapshot in the finally handler.

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
## Summary

Closes lance-format#6746. Loading an HNSW partition no longer reconstructs a
per-node `Vec<GraphBuilderNode>` / `Vec<OrderedNode>` graph. The loaded
graph is now backed directly by the on-disk Arrow buffers, with neighbor
adjacency served as zero-copy `&[u32]` slices straight out of the
`__neighbors` `ListArray` value buffer. This unblocks a future zero-copy
`CacheCodec` (lance-format#6745).

## Motivation

Per lance-format#6746, loading an HNSW partition required expensive per-node
reconstruction, which makes a zero-copy IPC `CacheCodec` (lance-format#6745)
infeasible. The fix is to keep the Arrow data and offsets as the graph's
backing store while preserving current search behavior and performance.

## What changed

- `HnswCore` now holds an `HnswGraph` enum instead of
`Arc<Vec<GraphBuilderNode>>`: `Built` (in-memory, produced by the online
builder / `index_vectors` — build path untouched) or `Loaded`
(Arrow-backed, search-only).
- `LoadedHnswGraph` retains the full `RecordBatch` plus per-level
zero-copy `ListArray` neighbor views and a tiny per-upper-level `id ->
row` lookup; the geometrically-shrinking upper levels keep these maps
negligible.
- Level 0 uses a `Dense` lookup (`row == __vector_id`, asserted in
debug); upper levels use a `Sparse` map keyed by `__vector_id` value,
exactly mirroring the old per-node `load` — including the known
`level_offsets` quirk where the entry-point node is written by
`to_batch` at every level but counted only at level 0, so upper-level
slices are off-by-one and duplicate ids resolve last-write-wins.
- The search loop is single-sourced across both backends via a local
macro, keeping the existing `Graph` / `BorrowingGraph` seam; search is
unchanged.
- `to_batch()` on a loaded graph is a verbatim passthrough (re-stamped
metadata only), so the IVF partition cache (`ivf/partition_serde.rs`,
which re-serializes loaded indices) round-trips losslessly and lance-format#6745 can
write/read it through `lance_arrow::ipc` without rebuilding the graph.

## Correctness & compatibility

- Loaded-graph search is bit-identical to the in-memory build across L2
/ Dot / Hamming and graph sizes (single node, pair, multi-level 2048).
- Old `load` semantics are preserved bit-for-bit, including duplicate-id
last-write-wins across a misaligned slice boundary; `build -> to_batch
-> load -> to_batch` is byte-stable (`b1 == b2`).
- No public API signature change. `HNSW::nodes()` now panics on a loaded
graph (documented; `GraphBuilderNode` is internal API and there are no
in-tree callers).

## Benchmarks

`criterion --quick`, 100000×128, L2, k=100, ef=300
(`rust/lance-index/benches/hnsw.rs`). The "before" `load_hnsw` was
measured by running this same bench against the parent commit's
reconstruction-based `builder.rs` (only that file swapped), so it is a
like-for-like `HNSW::load` comparison.

| Benchmark | Before (reconstruction load) | After (Arrow-backed) | Δ |
| --- | --- | --- | --- |
| `load_hnsw(100000x128)` | ~127 ms | ~90.8 µs | ~1,400× faster |
| `search_hnsw100000x128` (built, baseline) | ~700.7 µs | ~700.7 µs |
unchanged |
| `search_hnsw_loaded100000x128` | n/a | ~690.4 µs | on par with built
(within noise) |

Load drops from ~127 ms (allocating 100k `GraphBuilderNode`s + per-node
`OrderedNode` adjacency) to ~91 µs (batch slice + tiny upper-level
sparse maps), while search on the Arrow-backed graph stays on par with
the in-memory build. Numbers are `--quick`/indicative; the
~3-orders-of-magnitude load delta is well outside noise. Re-run a full
`cargo bench` before merge for headline figures.

## Tests

All in `rust/lance-index/src/vector/hnsw/builder.rs`:

- `test_loaded_search_parity_and_recall` (rstest: L2 single / L2 pair /
L2 2048 / Dot 2048) — built vs loaded parity plus recall ≥ 0.5.
- `test_loaded_level_offsets_misalignment_invariant` — pins the
entry-point-written-at-every-level surplus (`batch.num_rows() >
sum(level_count)`), the Dense level-0 precondition, and loaded↔built
search parity despite the misalignment.
- `test_loaded_empty_index` — 0-row `to_batch` → `load` → empty graph
round-trip.
- `test_to_batch_roundtrip_loaded` — the IVF partition-cache path:
`to_batch` on a loaded index is byte-stable and reloads/searches
identically.
- `test_loaded_graph_is_arrow_backed` — loaded graph is strictly lighter
than the built representation.
- Pre-existing `test_builder_write_load` (2048, L2, file round-trip) and
`test_builder_write_load_binary_hamming` (256, Hamming) continue to pass
unchanged.

---------

Co-authored-by: Vova Kolmakov <wombatukun@apache.org>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ce-format#7079)

## Problem

Commit 4de5ce6 ("feat(index): serializable cache for Bitmap and
LabelList scalar indices lance-format#6874") introduced a performance regression in
`BitmapIndexPlugin::get_from_cache`. Every warm-cache hit against a
bitmap scalar index now pays O(N log N) cost where N is the number of
unique values in the column, instead of O(1).

The regression: the new implementation stored only the serializable
`BitmapIndexState` (an Arrow `RecordBatch`) in the cache and
reconstructed the full `BTreeMap<OrderableScalarValue, usize>` on every
cache hit by calling `parse_lookup_batch`. For a column with 10M unique
values this rebuilds the map on every query — including `IS NULL`, whose
actual bitmap lookup is `(*self.null_map).clone()` and is otherwise
O(1).

`parse_lookup_batch` is expensive because:
1. It calls `ScalarValue::try_from_array` for every row — one heap
allocation per unique value.
2. It inserts into a `BTreeMap` — O(log N) comparisons per insert, O(N
log N) total.

## Fix

**`BitmapIndex.index_map`**: Changed from
`BTreeMap<OrderableScalarValue, usize>` to
`Arc<BTreeMap<OrderableScalarValue, usize>>`. The map is immutable after
construction, so sharing it behind an `Arc` is safe, and cloning is
O(1).

**`BitmapIndexState`**: Added an `index_map: Arc<BTreeMap<...>>` field
that is **not serialized** — the wire format is unchanged. It is
populated eagerly:
- `from_index` (called by `put_in_cache`): `Arc::clone`s the map from
the live `BitmapIndex` — O(1).
- `deserialize` (disk-backed cache backends): calls `parse_lookup_batch`
once at deserialization time, which is already paying disk I/O cost.

**`into_bitmap_index`**: Now takes `&self` and simply `Arc::clone`s
`self.index_map` — always O(1), no reconstruction.

**`get_from_cache`**: The intermediate `(*state).clone()` is removed
since `into_bitmap_index` no longer consumes `self`.

`LabelListIndex` had the same dual-entry patch applied in a prior
iteration; that is also reverted to the original single-entry approach
(its `BitmapIndexState` path is unchanged by this PR).

## Test

Added `test_bitmap_cache_fast_path` to `bitmap.rs`:
- Creates a high-cardinality bitmap index (1 000 unique integers + 5
null rows)
- Calls `put_in_cache`, then `get_from_cache`
- Asserts `get_from_cache` returns `Some`
- Runs `IS NULL` and asserts the correct 5 null rows are returned

To measure the end-to-end impact, run the `bitmap / is_null / warm` case
in `python/python/ci_benchmarks/benchmarks/test_count_rows.py` — latency
should be close to `btree / is_null / warm`.

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove redundant trailing slashes and update pylance pre-release pip
install command in README.md
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e-format#7066)

## What

Adds within-tier PK dedup to `LsmFtsSearchPlanner` so an FTS query over
the LSM tiers never surfaces the same primary key twice.

Previously the planner unioned per-source FTS results with **no**
cross-source dedup, so a PK present in multiple tiers — or updated
within the active memtable — surfaced more than once. This ports the
dedup the vector planner (`LsmVectorSearchPlanner`) already does:

- **Flushed sources**: `PkHashFilterExec` block-list
(`compute_source_block_lists`) drops rows superseded by a newer
generation.
- **Active memtable**: emit `_rowid` and wrap in
`WithinSourceDedupExec(KeepMaxRowAddr)` to collapse duplicate-PK appends
— the FTS inverted index is append-only, so an in-memtable update leaves
both versions searchable.
- `with_overfetch_factor` builder so a blocked source fetches `ceil(k *
factor)` and still yields `k` live rows after the block-list filter.

## Known limitation (follow-up)

A *predicate-crossing* update within the active memtable — where the
newest version no longer matches the query — can still leak the stale
version, because `WithinSourceDedupExec` only dedups among rows the
index returned. This is the same gap the vector active arm already
documents ("a fresh version evicted from the over-fetched top-k still
leaks"). The fix — a predicate-independent newest-per-PK recency filter
over the active memtable, shared by the vector + FTS arms — is a
separate PR.

## Context

Enables FTS over the WAL fresh tier in sophon (lancedb/sophon#6146).
Draft pending that integration.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A single-term match query runs ~3x slower as an OR than as the logically
identical AND, returning byte-identical top-k. The conjunctive WAND path
skips blocks whose block-max cannot reach the top-k threshold; the
disjunctive path had the machinery but never used it, so it scored every
posting entry one at a time.

This adds the same skip to the union path: when the block-max upper
bound over every iterator overlapping the current window cannot beat the
threshold, it advances to the next block boundary instead of scoring
each document. The bound includes the `head` iterators (later documents
still inside the window), so a skip only fires when no document in the
window can qualify and results are unchanged.

Single-term OR drops to AND-level latency (20x+ faster on common terms
in a Zipf corpus). Phrase queries run on the AND path and are
unaffected.

Covered by `test_or_single_term_block_skip_matches_and`, which asserts
OR and AND return the same top-k and that pruning skips a block.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ce-format#7062)

Each partition ran WAND from a cold threshold and built its own local
top-k, so a common term paid full block-max work in every partition.
Share an Arc<AtomicU32> floor across a query's partitions: each
publishes its local k-th (fetch_max) and reads it back as a pruning
floor. The k-th of the union is >= any single partition's k-th, so the
shared value is a lower bound on the global k-th and never drops a real
top-k doc.

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
lance-format#7014)

Closes: lance-format#7012 

## What

Distributed `IVF_RQ` builds work in the Rust engine (lance-format#6359) but could
not be driven from Python because the RaBitQ rotation could not be
pinned across workers. Each per-fragment build generated its own random
rotation, so segments rotated vectors differently, their binary codes
were not comparable, and merging corrupted the index.

This adds a way to mint one rotation, broadcast it, and reuse it in
every per-fragment build, mirroring how `pq_codebook` is injected.

  ## Changes

- Add `build_rq_rotation(dimension, num_bits=1, rotation_type="fast",
dtype="float32")` that returns one rotation as a JSON string.
- Add an `rq_rotation` parameter to `create_index_uncommitted`, parsed
into a new transient `RQBuildParams.rotation` field and consumed by
`RabitQuantizer::build`.
- `build()` reuses the supplied rotation instead of generating a random
one, after validating `num_bits`, `code_dim`, and the signs length.
  
  ## Notes

- Only the fast rotation is supported because its sign vector is JSON
serializable.
- The matrix rotation keeps a dense matrix in a binary buffer that the
JSON wire format drops, so it is rejected with a clear error.
- The params proto, the segment builder, and the merge and commit paths
are unchanged.

  ## Tests

- Rust unit tests for shared-rotation reuse, identical codes across
builds, mismatch and bad-input rejection, and the matrix-via-JSON
rejection.
- A Python integration test that builds two `IVF_RQ` segments on
separate fragments with one shared rotation, merges, commits, and
queries.
## What changed

Add Volcengine TOS (`tos://`) object store support through OpenDAL.

- Register a TOS object store provider for `tos://bucket/path`.
- Add the `tos` feature to `lance-io` and enable it by default through
`rust/lance`.
- Support `TOS_` / `VOLCENGINE_` environment variables and `tos_*`
storage options.
- Document TOS configuration.

## Testing

Validated against a real Volcengine TOS object store.
KMeans redos created one RNG before the redo loop, but each random
initialization cloned that same initial RNG state. As a result, redos
greater than one could repeatedly start from the same randomly selected
centroids instead of exploring distinct initializations.

This reuses the RNG mutably across redo attempts so each random
initialization consumes the advanced RNG state.
Fix Python target partition size inference to clamp the derived IVF
partition count to `1..=4096`, matching the Rust path.

The previous helper used `4096` as a lower bound, which produced
oversized partition counts for small datasets and missed the upper bound
for large datasets.
## Fix three flaky tests

### 1. Fix flaky `test_simple_index_nearest_centroid::case_1_f16` (rust)

The `test_simple_index_nearest_centroid::case_1_f16` was flaky because
`HNSW` approximate search with ef=15 could not reliably find the nearest
centroid when querying with 42.1f32 against f16-precision centroids.

The `f16` cast to `f32` introduces subtle precision differences that
alter the `HNSW` graph structure, causing the search to follow incorrect
paths and return ID 45 instead of 42.

Fix by using an exact match query value (42.0f32) for the f16 case,
ensuring zero distance to the target centroid so `HNSW` always finds it.
The f32 case retains the original 42.1f32 query.

Fixes flaky test introduced in a57ec81.

### 2. Fix flaky
`test_create_inverted_index_progress_callback_error_after_completion_is_ignored`
(python)

The test was failing because the `complete:write_metadata` progress
event was being dispatched (and its callback error propagated) during
the pump loop **before** the future completed — the future still had
commit work to do after the builder emitted
`stage_complete("write_metadata")`.

The `block_on_pumping` function only ignores callback errors in the
final pump **after** the future resolves. But since the event arrives in
the channel before the commit step finishes, it gets drained in the loop
where errors propagate.

Fix by making `IndexProgressDispatcher::drain()` tolerate callback
errors on `Complete`-type events. Complete events are purely
informational — the stage's actual work is already done, so a callback
failure should never abort the operation. `Start` and `Progress` events
still propagate errors normally, preserving the "error before completion
propagates" semantics.

### 3. Fix flaky
`test_list_acquires_token_before_starting_underlying_stream` (rust)

The test was flaky on Windows CI because it relies on real-time
assertions with a 5ms timeout, but Windows system timer resolution
(~15.6ms) makes such tight timing unreliable.

The root cause is that `TokenBucketState` used `std::time::Instant`
which is not controllable in tests. When the token bucket has no
available tokens, the test asserts that `stream.next()` should block
(timeout after 5ms), but on Windows the elapsed time measurement is too
coarse.

Fix by:
- Switching `TokenBucketState.last_refill` from `std::time::Instant` to
`tokio::time::Instant`
- Adding `#[tokio::test(start_paused = true)]` to the two
timing-sensitive list throttle tests
- Adding `tokio = { workspace = true, features = ["test-util"] }` to
dev-dependencies

With `start_paused = true`, tokio fully controls time advancement,
making the tests deterministic regardless of OS timer resolution.
FileWriter and IndexWriter now return write summaries from finish so
callers can access the final object size without issuing an extra size
lookup. This lets dataset fragment metadata and index writer callers
propagate file sizes directly from the completed write path while
keeping Python's existing LanceFileWriter.finish row-count behavior.
…ch (lance-format#7026)

## Motivation

Enable exact substring search at scale for AI pretraining data
decontamination — detecting benchmark contamination in trillion-row text
corpora, following the [Infini-gram Mini
paper](https://arxiv.org/abs/2506.12229).

## Summary
- Implement FM-Index following the Infini-gram Mini paper architecture
for exact substring search
- Huffman-shaped wavelet tree for entropy-compressed BWT rank queries
(~0.26N bytes)
- Sampled suffix array (D=32) with LF-mapping locate for document
resolution (~0.25N bytes)
- Partitioned index (10K docs/partition) with blocked storage (32KB
blocks) and lazy loading
- Wire up `IndexType::FMIndex` in Lance's `create_index` and query paths
(`contains()` filter)
- Index size ~0.95x of text (paper claims 0.44x; gap is Lance row
overhead per block)

## Benchmark (100K gitlake source code files, 1.59 GB text)

| Metric | FM-Index | N-Gram |
|--------|----------|--------|
| Index size | 1,513 MB (0.95x) | 84 MB (0.05x) |
| Build time | 132s | 9s |
| Short queries (e.g. `fn `) | 9034ms/q | 448ms/q |
| Medium queries (e.g. `fn main()`) | **29ms/q** | 480ms/q |
| Long queries (~80 chars) | **34ms/q** | 206ms/q |

FM-Index is 17x faster than N-Gram on medium queries and returns exact
results (N-Gram returns approximate candidates needing recheck). N-Gram
cannot find queries shorter than 3 characters (e.g. `fn ` returns 0).

## Test plan
- [x] 9 unit tests covering search, locate, wavelet access,
serialization, multi-document
- [x] End-to-end benchmark through Lance dataset API
(`dataset.create_index`, `dataset.count_rows(filter)`)
- [x] Verified correct match counts against full-scan baseline on real
source code

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Beinan Wang <beinanwang@microsoft.com>
Bumps [idna](https://github.com/kjd/idna) from 3.10 to 3.15.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/kjd/idna/blob/master/HISTORY.md">idna's
changelog</a>.</em></p>
<blockquote>
<h2>3.15 (2026-05-12)</h2>
<ul>
<li>Enforce DNS-length cap on individual labels early in
<code>check_label</code>,
short-circuiting contextual-rule processing for oversized input
while staying compatible with UTS 46 usage.</li>
<li>Tidy core helpers: hoist bidi category sets to module-level
frozensets (avoiding per-codepoint list construction), simplify
length checks, and reuse the shared <code>_unicode_dots_re</code> from
<code>idna.core</code> in the codec module.</li>
<li>Use <code>raise ... from err</code> for proper exception chaining
and
switch internal string formatting to f-strings.</li>
<li>Allow <code>flit_core</code> 4.x in the build backend.</li>
<li>Expand the ruff lint set (flake8-bugbear, flake8-simplify,
pyupgrade, perflint) and apply the surfaced fixes; pin lint CI
to Python 3.14.</li>
<li>Add Dependabot configuration for GitHub Actions.</li>
<li>Convert README and HISTORY from reStructuredText to Markdown.</li>
<li>Reference CVE-2026-45409 for the 3.14 advisory in place of the
initial GHSA identifier.</li>
</ul>
<p>Thanks to Felix Yan, Stan Ulbrych, and metsw24-max for
contributions to this release.</p>
<h2>3.14 (2026-05-10)</h2>
<ul>
<li>Removed opportunity to process long inputs into quadratic
time by rejecting oversize inputs up-front. Closes a bypass
of the CVE-2024-3651 mitigation. [CVE-2026-45409]</li>
</ul>
<p>Thanks to Stan Ulbrych for reporting the issue.</p>
<h2>3.13 (2026-04-22)</h2>
<ul>
<li>Correct classification error for codepoint U+A7F1</li>
</ul>
<h2>3.12 (2026-04-21)</h2>
<ul>
<li>Update to Unicode 17.0.0.</li>
<li>Issue a deprecation warning for the transitional argument.</li>
<li>Added lazy-loading to provide some performance improvements.</li>
<li>Removed vestiges of code related to Python 2 support, including
segmentation of data structures specific to Jython.</li>
</ul>
<p>Thanks to Rodrigo Nogueira for contributions to this release.</p>
<h2>3.11 (2025-10-12)</h2>
<ul>
<li>Update to Unicode 16.0.0, including significant changes to UTS46
processing. As a result of Unicode ending support for it, transitional
processing no longer has an effect and returns the same result.</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/kjd/idna/commit/af30a092e158181d0b35ac66dfa813788126bdd8"><code>af30a09</code></a>
Release 3.15</li>
<li><a
href="https://github.com/kjd/idna/commit/30314d4628744ca14cf2b5820564e5127a9f86f2"><code>30314d4</code></a>
Pre-release 3.15rc0</li>
<li><a
href="https://github.com/kjd/idna/commit/05d4b219aa9eddc47371fcbd2000f0301016f3e9"><code>05d4b21</code></a>
Merge pull request <a
href="https://redirect.github.com/kjd/idna/issues/237">#237</a> from
kjd/convert-docs-to-markdown</li>
<li><a
href="https://github.com/kjd/idna/commit/2987fdba1962bbb2358399e0084ba062b98a0bee"><code>2987fdb</code></a>
Convert README and HISTORY from reStructuredText to Markdown</li>
<li><a
href="https://github.com/kjd/idna/commit/59fa8002d514bf4a5ce7b58f67b9ec587d53fa9c"><code>59fa800</code></a>
Merge pull request <a
href="https://redirect.github.com/kjd/idna/issues/236">#236</a> from
kjd/dependabot/github_actions/actions-f3e34333ea</li>
<li><a
href="https://github.com/kjd/idna/commit/def69834ced5d4b3c50439d8b99c4c856ec19ca2"><code>def6983</code></a>
Merge branch 'master' into
dependabot/github_actions/actions-f3e34333ea</li>
<li><a
href="https://github.com/kjd/idna/commit/bbd8004a797185d8c56bb555cd5c88fde05e0631"><code>bbd8004</code></a>
Merge pull request <a
href="https://redirect.github.com/kjd/idna/issues/234">#234</a> from
StanFromIreland/patch-1</li>
<li><a
href="https://github.com/kjd/idna/commit/edd07c05024344a6ccb517414ccb36683aee99fc"><code>edd07c0</code></a>
Bump github/codeql-action from 3.35.2 to 4.35.2 in the actions
group</li>
<li><a
href="https://github.com/kjd/idna/commit/5557db030c11bdec50d62aa5f631d705d33ba123"><code>5557db0</code></a>
Merge branch 'master' into patch-1</li>
<li><a
href="https://github.com/kjd/idna/commit/f11746cf4981d25123ef7830d3ee60f07de8ae3d"><code>f11746c</code></a>
Merge pull request <a
href="https://redirect.github.com/kjd/idna/issues/235">#235</a> from
StanFromIreland/patch-2</li>
<li>Additional commits viewable in <a
href="https://github.com/kjd/idna/compare/v3.10...v3.15">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=idna&package-manager=uv&previous-version=3.10&new-version=3.15)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/lance-format/lance/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ance-format#7093)

ProductQuantizer stores `l2_targets: Option<Vec<L2Prepared>>` — a
pre-transposed SoA copy of the PQ codebook (~768 KB for a 768-dim, 48
sub-vector, 256-centroid index). Every call to `PQIndex::load()` (one
per cached IVF partition) clones the entire ProductQuantizer,
deep-copying this Vec. The data is logically identical across all
partitions (derived from the same global codebook) but was physically
duplicated N times.

With 8000 partitions fully warmed, this wastes ~6 GB (768 KB × 8000). We
measured this empirically: lance 4.0.1 used 2.9× more memory per cached
partition than lance 0.39 (1894 KB vs 443 KB), with the excess tracing
directly to l2_targets.

Fix: change the field type to `Option<Arc<Vec<L2Prepared>>>`. Cloning a
ProductQuantizer now bumps a reference count instead of copying
megabytes; all partitions share one allocation.

Changes (5 lines in one file):
- Field type: Option<Vec<L2Prepared>> → Option<Arc<Vec<L2Prepared>>>
- build_l2_targets return type updated to match
- Construction: wrap collected Vec with Arc::new
- DeepSizeOf: deref Arc before iterating (**v).iter()
- build_l2_distance_table: targets.as_slice() to pass &[L2Prepared]

---------

Co-authored-by: Yu-Ju Huang <yuju.huang@databricks.com>
…-format#7087)

## Summary

`FilteredReadExec`, `LanceScanExec`, and `ANNIvfPartitionExec` each
managed their own concurrency using process-wide CPU counts
(`get_num_compute_intensive_cpus()`, `io_parallelism`), ignoring
DataFusion's `target_partitions` session config. This makes it
impossible to constrain query CPU usage in multi-tenant scenarios even
when the caller sets `target_partitions` on the session.

Changes:
- **`FilteredReadExec`**: cap `OnePartitionMultipleThreads` num_threads
with `target_partitions` in `obtain_stream()`
- **`LanceScanExec`**: add `parallelism_cap: Option<usize>` to
`LanceScanConfig`; set it from `target_partitions` in `execute()`; apply
it to the CPU decode `try_buffered` (v2) and `batch_readahead` (v1) —
IO-bound paths (`frag_parallelism`, `fragment_readahead`) are
intentionally not capped
- **`ANNIvfPartitionExec`**: cap delta index fan-out `.buffered()` with
`target_partitions`
- **`ANNIvfSubIndexExec`**: thread `target_partitions` through
`initial_search`/`late_search` into `effective_query_parallelism()`,
where it caps `get_num_compute_intensive_cpus()` before computing
partition search parallelism

## Does this change default parallelism?

No. `get_num_compute_intensive_cpus()` returns `num_cpus::get() -
IO_CORE_RESERVATION` (default reservation = 2), where `num_cpus::get()`
on Linux already reads cgroup CPU limits. DataFusion's default
`target_partitions` is `available_parallelism()`, which is also
cgroup-aware and returns the same logical CPU count. Since `cpus - 2 ≤
cpus`, `min(get_num_compute_intensive_cpus(), target_partitions)` equals
`get_num_compute_intensive_cpus()` — the existing value — in all
configurations.

The cap only takes effect when a caller explicitly lowers
`target_partitions` below the default, which is exactly the multi-tenant
use case this change is intended to support.

Closes lance-format#7082

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…mat#7090)

[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=aiohttp&package-manager=uv&previous-version=3.13.4&new-version=3.14.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/lance-format/lance/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ormat#7110)

## Problem

A newly-claiming writer only bumps the manifest `writer_epoch` — it
writes **nothing** into the WAL, and WAL slot keys are position-only (no
epoch in the path). So until the successor actually writes a slot, a
fenced predecessor's next `append` lands in the empty next slot, the
`PUT-IF-NOT-EXISTS` succeeds, and `append` returns `Ok` **without** a
fence check (the `check_fenced` call only fires on the `AlreadyExists`
branch).

That window is a correctness hole: the predecessor false-acks a write
that later dies at the seal-time manifest CAS — and if the successor
already replayed past that position, the entry is orphaned (data loss
for an acked write).

## Fix

On claim (`epoch >= 2`), drop a **data-less sentinel** WAL entry at the
current tip, **before** replay:

- The predecessor's next `append` now collides at that slot and surfaces
the fence via the existing `AlreadyExists -> check_fenced` path.
- Writing the sentinel *before* replay guarantees any predecessor entry
that landed *below* the sentinel is recovered by replay rather than
orphaned.
- A lost slot race (a predecessor/concurrent claimer wins the probed
slot) re-probes one past the winner; that entry then sits below the
sentinel and is still replayed.
- Sentinels carry zero batches (empty-schema Arrow IPC +
`writer_epoch`/marker metadata) and are skipped by replay's existing
empty-batch guard.

Epoch 1 (a fresh shard) has no predecessor, so the sentinel is skipped
there.

This lets writers rely on collision for fencing instead of issuing a
per-put fence-check GET on every append.

## Test

Adds `test_fence_sentinel_fences_predecessor_without_successor_write`,
which exercises the exact race: a successor claims a higher epoch and
drops a sentinel **without writing any data batch**, and the
predecessor's next append is fenced. Also asserts the sentinel reads
back as zero batches and the successor's own writes land after it.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ance-format#7111)

Adds Dependabot configuration to automatically open PRs for patch-level
dependency updates every Wednesday. Uses `lockfile-only` mode for Cargo
and uv so only the lockfiles are touched — no manifest version specs are
modified.

Covers:
- `Cargo.lock` at `/`, `/python`, `/java/lance-jni`
- `python/uv.lock`

Maven is excluded since it has no lockfile concept (version specs live
directly in `pom.xml`).

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…at#6983)

Make `InvertedPartition::load` defer the per-partition `DocSet` work
(row_id + num_tokens) until the wand walk actually needs it, instead
of materializing the entire DocSet up front.

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…format#7115)

During an FTS index build, merge_existing_segments() can panic with
something like "index out of bounds: the len is 936 but the index is
1077". This happens when there has been an update/delete on the table
that causes some token to be evicted from the posting lists. Fixed up
updating the associated bookkeeping
lixmgl and others added 13 commits June 22, 2026 12:15
Fixes lance-format#7197.

## Summary

- `object_store.copy()` is called unconditionally on the staging→final
manifest copy path. This routes to S3's `CopyObject` API, which has a ~5
GB hard cap. Manifests above this fail with `EntityTooLarge` —
production case was a ~14 GB manifest with `ProposedSize 14961429442` on
`_versions/...manifest`.
- Add `copy_size_aware`: keeps the cheap server-side `store.copy()` for
sources below the limit, falls back to read+rewrite via a multipart
upload for larger sources. The required `size` argument lets the caller
skip an extra `head()` round-trip.
- The 5 GiB threshold is backend-agnostic, not S3-specific: S3's
`CopyObject` and GCS's single-shot `Objects.copy` both cap at ~5 GiB, so
the constant is named `MAX_SERVER_SIDE_COPY_BYTES`. Stores without such
a cap (e.g. local FS) take the read+rewrite fallback above 5 GiB too;
correctness is preserved, only the rare large copy is slower.
- Also tighten `MAX_UPLOAD_PART_SIZE` from `5 GiB` to `5 GiB - 1` so
`LANCE_INITIAL_UPLOAD_SIZE=5368709120` can't trigger a single PUT of
exactly 5 GiB on shutdown — which S3 also rejects.

Same bug class as lance-format#6750 (multipart-aware put for txn file writes),
different code path.

## Test plan

New tests in `rust/lance/src/io/commit/external_manifest.rs` covering
both the >5 GB read+rewrite fallback and the small-file fast path.

Related downstream issue:
lance-format/lance-spark#529

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nce-format#7373)

A DataReplacement rewrites a column's data file positionally against the
fragments it targets. The conflict resolver returned Ok unconditionally
for a concurrent Update, Delete, or Merge, so a DataReplacement
committed at a read version those operations had superseded was applied
silently -- dropping or misaligning the rows the concurrent op moved or
deleted, with no error raised. Merge was additionally asymmetric:
check_merge_txn already treats a concurrent DataReplacement as a
conflict, but not the reverse.

For Update/Delete, conflict when the other transaction's updated/removed
fragment ids overlap our replacement fragment ids (mirrors the existing
Rewrite handling). For Merge, which rewrites the entire fragment list,
conflict unconditionally (mirrors check_merge_txn). All are retryable,
so the committer rebuilds against the new layout.

Adds DataReplacement vs Update, Delete, and Merge cases (same and
different fragment) to test_conflicts_data_replacement.

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
## Performance Improvement

### What is the performance issue or bottleneck?

For FTS conjunction searches, once the top-k threshold is established,
the AND path can still fully validate and score aligned candidate
documents even when a cheap upper bound proves they cannot enter the
heap. That pays for full BM25 scoring, phrase checks, and frequency
collection for candidates that are already below the competitive
threshold.

### How does this PR improve performance?

This adds an AND-only score-first candidate prune in `Wand::search`.
After all conjunction postings are aligned, the scorer first computes
the exact contribution of one lead posting, then adds the remaining
postings' current block-max scores as a safe upper bound. If that upper
bound cannot beat the threshold, the candidate is skipped before phrase
validation, full scoring, and term-frequency collection.

The change is intentionally narrow:

- OR and flat-search paths are unchanged.
- Missing-term and fuzzy AND semantics are unchanged.
- The bound uses existing block-max scores, so exact top-k behavior is
preserved for `wand_factor == 1.0`.
- Phrase queries still use the prune only when the BM25 upper bound is
already non-competitive.

### Benchmark or measurement results

No end-to-end benchmark was run for this draft. The new regression
coverage includes a counting scorer case that verifies low-scoring AND
candidates avoid full scoring, plus a top-k correctness case that keeps
a later high-scoring candidate.

## Validation

- `cargo fmt --all --check`
- `git diff --check`
- `CARGO_TARGET_DIR=/tmp/lance-target-fts-and-prune-main cargo test -p
lance-index scalar::inverted::wand::tests -- --nocapture`
- `CARGO_TARGET_DIR=/tmp/lance-target-fts-and-prune-clippy cargo clippy
--all --tests --benches -- -D warnings`
## Bug Fix

What is the bug?

FTS AND/conjunction block-max pruning could only ask a posting for its
current block max score. When the lead posting defines a wider `up_to`
window, another posting can have a higher block max later in that same
window, so using only its current block can understate the safe upper
bound for Lucene-style `getMaxScore(upTo)`.

What issues or incorrect behavior does the bug cause?

The understated upper bound can make `and_advance_target` skip a lead
block window even though a later document in that window could still
beat the current top-k threshold. For exact BM25 search, pruning must
use a safe upper bound so possible top-k documents are not dropped.

How does this PR fix the problem?

This adds a query-time `BlockMaxWindow` to compressed posting iterators.
The window lazily maintains a monotonic deque of block max scores over
`[current shallow block, block containing up_to]`. AND/conjunction now
lets the lead posting choose `up_to` and asks each follower for a range
max that safely covers that same `up_to`. Plain postings still fall back
to their existing list-level upper bound. This does not change the index
format or posting-list build path.

## Tests

- `cargo fmt --all --check`
- `git diff --check`
- `CARGO_TARGET_DIR=/tmp/lance-target-fts-and-rangemax-main cargo test
-p lance-index scalar::inverted::wand::tests -- --nocapture`
- `CARGO_TARGET_DIR=/tmp/lance-target-fts-and-rangemax-clippy cargo
clippy --all --tests --benches -- -D warnings`

---------

Co-authored-by: Lu Qiu <luqiujob@gmail.com>
## Summary

This PR improves FTS search throughput by avoiding repeated metadata
reads on hot search paths:

- caches immutable corpus-level BM25 stats on the loaded `InvertedIndex`
- caches per-token posting metadata (`max_score`, posting length) in the
existing partition-prefixed Lance cache
- keeps token sets resident behavior unchanged and does not cache
posting list bodies

The main target is global QPS under concurrent full-text search,
especially when the index is stored on object storage.

## S3 Performance

Benchmark shape for both datasets:

- query set: `the`, `data`, `learning`, `world`, `machine learning`,
`artificial intelligence`, `中国`, `人工智能`
- `limit=10`, projected columns: `_rowid`, `_score`
- warmup: query set x3
- each concurrency point runs for 20s
- baseline and patched results returned identical row ids and 6-decimal
scores for the query set

### 1M S3 Dataset

Dataset:
`s3://xuanwo-fts-bench-use1/datasets/mmlb_1m_all_columns_no_image_en_zh_icu_bench_icu-1m-perf-opt-20260619T143109Z.lance`

| concurrency | baseline QPS | patched QPS | QPS delta | baseline p95 |
patched p95 |
|---:|---:|---:|---:|---:|---:|
| 1 | 7.73 | 13.43 | +73.7% | 202.45ms | 108.64ms |
| 2 | 15.51 | 24.49 | +57.9% | 210.17ms | 122.68ms |
| 4 | 34.45 | 53.01 | +53.9% | 184.70ms | 125.08ms |
| 8 | 71.74 | 96.25 | +34.2% | 171.57ms | 129.44ms |
| 16 | 120.33 | 199.30 | +65.6% | 226.07ms | 125.03ms |
| 32 | 214.90 | 242.96 | +13.1% | 279.15ms | 283.01ms |

The 32-concurrency point is saturated/noisy; the improvement is stable
at 1-16 concurrency.

### 10M S3 Dataset

Dataset:
`s3://xuanwo-fts-bench-use1/datasets/mmlb_10m_full_content_icu_s3_search_20260623T000000Z.lance`

- 10,000,000 rows, 10 fragments
- 19 S3 objects, 69,994,744,162 bytes total
- FTS index size: 7.76 GiB

| concurrency | baseline QPS | patched QPS | QPS delta | baseline p95 |
patched p95 |
|---:|---:|---:|---:|---:|---:|
| 1 | 7.40 | 11.55 | +56.1% | 290.69ms | 152.26ms |
| 2 | 14.45 | 22.90 | +58.5% | 236.91ms | 132.04ms |
| 4 | 35.20 | 44.90 | +27.6% | 175.41ms | 145.16ms |
| 8 | 64.10 | 86.55 | +35.0% | 198.30ms | 142.27ms |
| 16 | 132.55 | 160.40 | +21.0% | 185.01ms | 163.47ms |
| 32 | 211.95 | 235.70 | +11.2% | 283.94ms | 305.04ms |

The 10M S3 result confirms the object-store improvement at larger index
scale. The 32-concurrency point remains saturated/noisy and has a p95
regression despite higher QPS.

## Validation

- `cargo fmt --all`
- `git diff --check`
- `cargo test -p lance-index scalar::inverted::index::tests::`
- `cargo clippy --all --tests --benches -- -D warnings`

Co-authored-by: LuQQiu <luqiujob@gmail.com>
## What

Adds `ShardWriter::put_no_wait`, a variant of `put` that performs the
visible in-memory insert and triggers the durable WAL flush, then
returns the `BatchDurableWatcher` **without** awaiting it. A thin
wrapper restores `put`'s behavior (await the watcher), so existing
callers are unchanged.

`put_memtable` is split into:
- `put_memtable_no_wait` — the in-memory critical section (insert under
`state_lock` + `track_batch_for_wal` + flush triggers) followed by
`trigger_flush`, returning `(WriteResult, Option<BatchDurableWatcher>)`.
- `put_memtable` — calls the above, then `watcher.wait()`.

`BatchDurableWatcher` and `WriteResult` are re-exported from `mem_wal`.

## Why

Lets an external caller hold its own serialization lock across only the
in-memory read-merge-insert critical section and await durability
**after** releasing it, so concurrent durable flushes still coalesce.
The in-memory insert stays guarded by the writer's `state_lock`, so
`BatchStore`'s single-writer invariant holds regardless of the external
lock — `state_lock` is intentionally **not** skipped.

This is the lance-side primitive for sophon's WAL partial-column-update
path (read fresh tier → merge → insert under a per-bucket lock,
durability awaited outside it).

## Tests

`test_put_no_wait_durable_visible_then_durable` (row visible before
durability, watcher resolves) and
`test_put_no_wait_non_durable_returns_no_watcher`.

---------

Co-authored-by: Lance Release Bot <dev+gha@lance.org>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…mat#7414)

## Summary

`ObjectStore::io_parallelism()` returns the `LANCE_IO_THREADS` override
verbatim when that env var is set, so `LANCE_IO_THREADS=0` yields `0`.
Callers feed this value straight into `buffered` / `buffer_unordered`,
and a window of **0** makes those streams **never poll their input** —
so a plain metadata-only `count_rows` (and ~8 other fan-out sites in
`dataset.rs`) would hang instead of returning.

```diff
 pub fn io_parallelism(&self) -> usize {
     std::env::var("LANCE_IO_THREADS")
         .map(|val| val.parse::<usize>().unwrap())
         .unwrap_or(self.io_parallelism)
+        .max(1)
 }
```

Clamping at the source covers every caller in one place. The store's
configured default is already `>= 1`, so this only changes the explicit
`LANCE_IO_THREADS=0` case.

## Context

Follow-up to lance-format#7076, where the same hang was fixed locally in the count
path (per review). This addresses the root cause so the other unguarded
`io_parallelism()` → `buffered`/`buffer_unordered` sites are covered
too.

## Test

`test_io_parallelism_clamped_to_nonzero` asserts `LANCE_IO_THREADS=0`
clamps to `1`, a positive override (`8`) passes through unchanged, and
the default is `>= 1`.
Add `Dataset::build_existing_index_segments_transaction`, which builds the
`Operation::CreateIndex` transaction for existing physical index segments
without committing it. Callers commit the returned transaction via
`CommitBuilder` for a strict stage-then-commit workflow, mirroring
`InsertBuilder::execute_uncommitted`. `commit_existing_index_segments` now
delegates to it.

The method is inherent on `Dataset` rather than a new required method on the
public `DatasetIndexExt` trait, so it is non-breaking for downstream trait
implementors.

Fixes lance-format#6666
…olumn (lance-format#7412)

The update API rejects nested column references (`set` errors on a `.`
in the column name), so a nested field can only be changed by setting
its whole **top-level struct column** (e.g. `SET s = named_struct('x',
…)`).

`commit_impl` built `fields_for_preserving_frag_bitmap` from the struct
column's own (parent) field id via `field_id(column_name)`, so an index
on a nested **child** field (a different leaf id) was absent from the
set. `register_pure_rewrite_rows_update_frags_in_indices` then wrongly
extended that child-field index over the rewritten fragment, which was
therefore treated as indexed and never re-scanned — so the updated rows
were silently dropped from queries on the index (false negatives).

The fix collects the full field subtree of each updated column, so a
struct-column update marks all descendant field ids as modified. Flat
columns are unaffected (no children).

This is the SQL `UPDATE` counterpart of the merge_insert fix in
lance-format#7410. Adds a regression test
(`test_update_struct_column_keeps_nested_index`): a BTree index on
`s.x`, an update of the struct column `s`, and an assertion that the
index's effective fragment bitmap is not extended over the rewritten
fragment so the updated value is still found.
…ance-format#7295)

## Summary

- Replaces `rechunk_stream_by_size` + `concat_batches` + `take` (two
full-data copies, peak ~3–4× `batch_size_bytes`) with a single-pass sort
over the UInt32 part-id columns only, producing `(batch_idx, row_idx)`
interleave indices.
- Sorted output is streamed to the data file via `interleave_batches` in
8 Ki-row chunks, so the interleave output adds only a small constant
overhead above the accumulated source data.
- Peak memory drops to **~1× `batch_size_bytes`**, which enables setting
`LANCE_SHUFFLE_BATCH_BYTES` much larger to reduce flush-group count and
improve read-time I/O locality.

Closes lance-format#7299.


🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
@ragnorc ragnorc force-pushed the ragnorc/two-phase-index-segments branch from 53aa593 to 5cfc088 Compare June 23, 2026 16:31
@github-actions

Copy link
Copy Markdown

Important

This PR touches the Lance format specification.

Substantive changes to the format specification — the .proto definitions
and the spec docs under docs/src/format/ — require a PMC vote before merge.
Minor edits such as typo fixes, wording, or formatting are excluded; use your
judgment.

If this is a meaningful format change:

  • Start a vote following the Lance community voting process.
    Format specification modifications need 3 binding +1 votes (excluding the
    proposer), held on GitHub Discussions, with a minimum voting period of 1 week.
  • Once the vote passes, link the completed vote in this PR. It should not be
    merged until the vote is linked.

@github-actions github-actions Bot added the enhancement New feature or request label Jun 23, 2026
wkalt and others added 3 commits June 23, 2026 09:46
lance-format#7359)

Under stable row ids an update deletes a row's old copy and rewrites it
to a new fragment under the same row id. optimize_indices kept the old
value->row_id entry, so queries for the old value returned the updated
row and BTree optimize errored ("from_sorted_iter called with non-sorted
input").

- build_stable_row_id_filter now subtracts each fragment's deletion
vector so the old-row allow-list holds only live rows (fixes BTree).
- BitmapIndex::update applies that filter to old postings via
OldIndexDataFilter::retain_old_rows.
- optimize routes FTS through InvertedIndex::merge_segments (which
filters old partitions) instead of the reference-only update path.

Adds a regression test covering all three index types.

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This clarifies the repository agent instructions for Rust local
workflows.

Local development, debugging, and performance testing should avoid LTO
so those workflows do not accidentally pay release-artifact build costs
or benchmark a build mode that was not explicitly requested. LTO remains
available when release artifact validation explicitly asks for it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expose build_index_metadata_from_segments (or commit_existing_index_segments) for two-phase vector-index commits outside the lance crate