[core] Support append writes for MAP shared-shredding by lxy-9602 · Pull Request #8355 · apache/paimon

lxy-9602 · 2026-06-25T09:38:23Z

Purpose

This is a subtask of MAP shared-shredding support.

This PR adds the append write path for MAP shared-shredding. When a MAP column is configured with map.storage-layout=shared-shredding, append writers now build a per-file physical write plan, convert logical rows to the shared-shredding physical layout, and write field-level shared-shredding metadata into file metadata before closing the format writer.

Main changes:

Add RowDataFileWritePlan, RowDataFileWritePlanFactory, RowDataTransform, and FileMetadataFinalizer to support per-file physical row conversion and metadata finalization.
Add MapSharedShreddingWritePlanFactory for append shared-shredding writes.
Add MapSharedShreddingCoreUtils to create and restore shared-shredding write context from recent data files.
Wire shared-shredding append writes into BaseAppendFileStoreWrite, AppendOnlyWriter, RowDataRollingFileWriter, RowDataFileWriter, and SingleFileWriter.
Keep file index and sequence tracking based on logical rows while writing physical rows to data files.
Restore adaptive shared-shredding stats from existing file metadata using each data file's actual file format.
Reject unsupported blob/vector append writes for shared-shredding for now.
Reject nested MAP file index when shared-shredding is enabled.
Extend format metadata field-id support for ORC via paimon.id.

This PR currently focuses on append tables. Blob writes, PK tables, and other more complex write paths will be supported in follow-up PRs. The row write plan is kept generic so these scenarios can reuse the same logical-to-physical write flow later.

Tests

mvn -pl paimon-core -am -Pfast-build -DfailIfNoTests=false -Dtest=AppendOnlyWriterTest,MapSharedShreddingCoreUtilsTest,SchemaValidationTest test
mvn -pl paimon-core -am -Pfast-build -DfailIfNoTests=false -Dtest=KeyValueFileReadWriteTest test
mvn -pl paimon-format -Pfast-build -DfailIfNoTests=false -Dtest=FormatMetadataUtilsTest,OrcFormatReadWriteTest test
mvn -pl paimon-arrow -am -Pfast-build -DfailIfNoTests=false -Dtest=ArrowSchemaMetadataCompatibilityTest test
mvn -pl paimon-core -DskipTests spotless:check
mvn -pl paimon-format -DskipTests spotless:check
mvn -pl paimon-arrow -DskipTests spotless:check
git diff --cached --check

JingsongLi · 2026-06-25T14:26:23Z

-                writeCols);
+                writeCols,
+                sharedShredding
+                        ? new MapSharedShreddingWritePlanFactory(


This enables writing the shared-shredding physical ROW layout, but I do not see a matching logical read/unshredding path. The normal table scan / format reader still requests the logical MAP schema; for Parquet, clipParquetType(MAP) expects a Parquet map group, so it will not be able to read this physical ROW. Please add an end-to-end test that writes through this path and reads the table back with the logical schema, and wire the reader conversion before enabling the write path.

Thanks for the careful review. I agree this is an important gap. This PR currently focuses on introducing the write-side physical layout, but the normal scan path still reads with the logical MAP schema, so we should not expose this as a usable end-to-end feature before the reader-side unshredding path is in place.

My original thought was to keep this PR relatively small and reviewable by treating it as internal groundwork. To avoid users accidentally enabling an incomplete path, I can add schema validation to reject the shared-shredding option for now, and then enable the option together with the reader conversion, documentation, and end-to-end logical read tests in a follow-up PR.

While, I am also fine with making this PR larger and adding the reader-side conversion plus the end-to-end test here if you think the feature should only land when the full read/write path is included. Which approach would you prefer?

I’ve added e2e coverage for the append-table read/write paths, including basic shared-shredding MAP read/write, multiple maps, projection, null values including overflow, metadata restore, and schema/data evolution cases. Unsupported paths such as rewrite are guarded to fail fast for now; would appreciate a review when you have time.

JingsongLi · 2026-06-25T14:26:25Z

                options.dataEvolutionEnabled(),
-                blobContext);
+                blobContext,
+                sharedShreddingContext);


Passing the shared-shredding context only to AppendOnlyWriter leaves the rewrite paths inconsistent. compactRewrite/clusterRewrite below still create a plain RowDataRollingFileWriter, so compaction can rewrite shared-shredded append files back into the default MAP layout without shared-shredding metadata. Please either route those rewrite writers through the same write plan or explicitly disable/guard compaction for shared-shredding append tables until it is supported.

Thanks for catching this! I added an explicit guard for now: compact rewrite and cluster rewrite will fail fast when MAP shared-shredding is configured.

JingsongLi · 2026-06-25T14:26:27Z

                        columnName,
                        keyType);
+                checkArgument(
+                        options.mapStorageLayout(columnName) != MapStorageLayout.SHARED_SHREDDING,


This validation catches nested map indexes, but shared-shredding can still be configured with formats whose writers do not implement SupportsWriterMetadata. In that case the table is accepted and the write fails only when the file is closing. Please reject unsupported file formats during schema validation, or otherwise fail before any data file write starts.

Agreed. I added schema validation for MAP shared-shredding to reject unsupported file formats early.

JingsongLi reviewed Jun 25, 2026

View reviewed changes

lxy-9602 mentioned this pull request Jun 27, 2026

[core] Support reading shared-shredding map #8364

Open

lxy-9602 added 5 commits June 27, 2026 10:25

[core] Support append writes for MAP shared-shredding

4bcdc1e

fix close

9fbb100

add validate and failed with rewrite

7f554e7

add read path for append shared-shredding and e2e tests

490befe

fix code style

dc94d68

lxy-9602 force-pushed the append-shredding-write branch from 2db0402 to dc94d68 Compare June 27, 2026 02:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[core] Support append writes for MAP shared-shredding#8355

[core] Support append writes for MAP shared-shredding#8355
lxy-9602 wants to merge 5 commits into
apache:masterfrom
lxy-9602:append-shredding-write

lxy-9602 commented Jun 25, 2026 •

edited

Loading

Uh oh!

JingsongLi Jun 25, 2026

Uh oh!

lxy-9602 Jun 26, 2026

Uh oh!

lxy-9602 Jun 27, 2026

Uh oh!

JingsongLi Jun 25, 2026

Uh oh!

lxy-9602 Jun 26, 2026

Uh oh!

JingsongLi Jun 25, 2026

Uh oh!

lxy-9602 Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

lxy-9602 commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Tests

Uh oh!

JingsongLi Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

lxy-9602 Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

lxy-9602 Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

JingsongLi Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

lxy-9602 Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

JingsongLi Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

lxy-9602 Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lxy-9602 commented Jun 25, 2026 •

edited

Loading