Skip to content

[core] introducing row-range-keyed DeletionVectors for DataEvolution tables#8344

Open
steFaiz wants to merge 6 commits into
apache:masterfrom
steFaiz:dv_for_de_core_support
Open

[core] introducing row-range-keyed DeletionVectors for DataEvolution tables#8344
steFaiz wants to merge 6 commits into
apache:masterfrom
steFaiz:dv_for_de_core_support

Conversation

@steFaiz

@steFaiz steFaiz commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Purpose

The first part of #8322

This is a very BIG pr, I tried to split it into several sub PRs, but found it very hard.

The PR only focuses on core module, leaving Spark/Flink module untouched, mainly includes:

  1. RowRange DeletionFiles are fully decoupled with data files, they are just to split one big deletion vector into many small files.
  2. Introduce a DeletionFileKey to replace current filename as the unique identifier of each DeletionFile
  3. Introduce a DataEvolutionApplyDvReader to filter out deleted row ids. This class refers to IndexedSplitRecordReader: First enrich readType by _ROW_ID, then filter rows by RowRange DeletionVectors.
  4. Slightly changed SnapshotReaderImpl, assigning overlapping row range DeletionVectors to each DataSplit.

Besides, most changes are about changing many String fields to DeletionFileKey.

Tests

Unit Tests only.

@steFaiz steFaiz marked this pull request as draft June 24, 2026 09:15
@steFaiz steFaiz marked this pull request as ready for review June 25, 2026 03:07
return false;
}

return currentDv.isDeleted(rowId);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reader only evaluates the first row-range DV whose end has not passed the current row id. If two row-range keyed DVs overlap, for example [0, 10] and [5, 15], rows 5-10 are never checked against the second DV, so deletions recorded only there will be returned. I don't see the writer/manifest combiner rejecting overlapping RowIdRangeKey values; they only deduplicate exact keys. Please either enforce non-overlapping row-range DVs when writing/combining metadata, or make this reader evaluate all DVs that may contain the row.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your remind! I've added the check in GlobalCombiner and added a test

? null
: IndexFileMetaSerializer.dvMetasToRowArrayData(dvMetas.values()),
: IndexFileMetaSerializer.metasToRowArrayData(
dvMetas, DeletionFileKey.Type.FILE_NAME),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For row-range keyed DVs this renders the compatibility marker row instead of the real row-id ranges, because metasToRowArrayData(..., FILE_NAME) intentionally fast-fails old readers. That makes table_indexes.dv_ranges misleading for data-evolution tables. Can we expose the row-range DV schema here as well, or render the ranges in a separate column, instead of showing the legacy marker?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very helpful! I've unified FileNameKey and RowRangeKey:

  1. Reusing the same field
  2. For FileNames, display the file name
  3. For RowRanges, display the formated range. e.g. [0, 100]

@steFaiz

steFaiz commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

I thought about the design and I think that making row ranged-DV aligned with normal file ranges is much more simple and easy to manage, compared to fully decoupled mode. Just like filename-based implementation.
The main logic:

  1. At first DV generation, generate DVs aligned with existing normal data file ranges
  2. During compaction, if some data files triggered row-level compaction and are merge into larger data file range, we merge related DVs too.

This is because it's complicated to update DVs if we let DVs can be fully decouple, consider this scenario:
Current DV ranges:

// Current DV:
[0, 100] , [101, 200], [230, 300]

Update a new DV range [50, 250]
This will involves:

  1. among all DVs, binary search to find all overlapping DVs. here we find: [0, 100] , [101, 200] , [230, 300]
  2. check each DV:
    a. For [0, 100], it's only partial overlapping with [50, 250], so we may have to split the new DV into [50, 100], [101, 250]
    b. For [101, 200] and [230, 300], we need to merge it with current splitted new DV: [101, 250], but how to merge them? Merge into a single big range [101, 300]? or split it into some selected size?

If we don't design sophiscated merge/split mechanism, there might be some DVs' size too big or too small.

I think we could:

  1. At read path: capable of reading any decoupled DV ranges (already implemented in this PR)
  2. At write/compaction path: force DV ranges be aligned with existing normal file ranges.

@JingsongLi Could you please review this proposal and share your thoughts? Any feedback would be greatly appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants