Skip to content

[core] Support reading shared-shredding map#8364

Open
lszskye wants to merge 2 commits into
apache:masterfrom
lszskye:shared_shredding_reader
Open

[core] Support reading shared-shredding map#8364
lszskye wants to merge 2 commits into
apache:masterfrom
lszskye:shared_shredding_reader

Conversation

@lszskye

@lszskye lszskye commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Purpose

Add read support for MAP shared-shredding data in paimon-common.

This change introduces a reader wrapper that rebuilds logical MAP<STRING, T> values from shared-shredding physical ROW values.

Changes

  • Add MapSharedShreddingReader

    • Wraps a physical FileRecordReader<InternalRow>.
    • Lazily rebuilds shared-shredding MAP fields when InternalRow#getMap(pos) is called.
    • Handles overflow fields.
    • Adds validation for invalid physical data:
      • null fieldMapping
      • null fieldMapping element
      • unknown field id in metadata
      • unknown overflow field id
  • Add shared-shredding utility methods

    • getPhysicalColumnIndices
    • isOverflowField
    • buildSpecificPhysicalStructType

Limitation

This PR currently only supports reading the whole shared-shredding MAP field.

It does not yet support selecting / projecting specific MAP keys during read. Because of this, rebuilt map entries currently follow the physical metadata layout order instead of the user requested key order. A TODO is left in the reader for future key-level projection support.

Tests

Added unit coverage for:

  • Logical MAP reconstruction from shared-shredding physical rows.
  • Null logical MAP rows.
  • Null MAP values retained in rebuilt results.
  • Invalid fieldMapping and unknown field id handling.
  • Overflow field reconstruction and validation.
  • New MapSharedShreddingUtils helper methods.

Verification

mvn -pl paimon-common -Pfast-build -Dtest=MapSharedShreddingReaderTest,MapSharedShreddingUtilsTest test
git diff --check

* original logical schema to upper layers by lazily converting only shared-shredding MAP fields
* when {@link InternalRow#getMap(int)} is called.
*/
public class MapSharedShreddingReader implements FileRecordReader<InternalRow> {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wrapper is never used by the actual data-file read path. I only see references from this class and its unit test; RawFileSplitRead, DataEvolutionSplitRead, and FormatTableRead still pass the format reader directly into DataFileRecordReader, and nothing reads SupportsReaderFieldMetadata to build these metas before returning rows. As a result, a table containing a shared-shredding MAP would still expose the physical ROW from the format reader instead of this logical MAP wrapper, so the PR does not yet provide real read support outside the unit test. Please wire this reader into the real read paths after recovering the field metadata, and add an end-to-end read/write test that reads a shared-shredding map through the table API.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. This is a standalone PR that extracts a wrapper for converting physical columns to logical columns. Once this wrapper is merged, I will submit the previously mentioned append read/write end-to-end changes together in #8355. We mainly split this out to keep the PR size manageable and make the review easier. Also, the write-side changes have not been merged yet, so this will not produce data that cannot be read.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants