Skip to content

[core][flink] Support external sort for manifest sort#8357

Open
discivigour wants to merge 12 commits into
apache:masterfrom
discivigour:j/manifestExternalSort
Open

[core][flink] Support external sort for manifest sort#8357
discivigour wants to merge 12 commits into
apache:masterfrom
discivigour:j/manifestExternalSort

Conversation

@discivigour

Copy link
Copy Markdown
Contributor

Purpose

Improve manifest file sort compaction to avoid heap-heavy in-memory sorting for large manifest entries by introducing spillable external sorting. The change also propagates a Paimon IOManager through commit paths so manifest sort can use configured spill directories in Flink, while non-Flink/default contexts can keep IOManager creation local to the sorter.

Tests

  • mvn -pl paimon-core -Pfast-build -Dtest=ManifestFileMetaTest#testManifestSortUsesExternalIOManagerWithoutClosingIt test
  • mvn -pl paimon-flink/paimon-flink-common -am -Pfast-build -DskipTests compile
  • mvn -pl paimon-core,paimon-flink/paimon-flink-common -am -Pfast-build -DfailIfNoTests=false -Dtest=CommitterOperatorTest test
  • mvn -pl paimon-core,paimon-flink/paimon-flink-common -am -Pfast-build -DfailIfNoTests=false -Dtest=StoreMultiCommitterTest test

@discivigour discivigour changed the title [core][flink] Improve manifest sorter external sorting [core][flink] Support external sort for manifest sort Jun 25, 2026
meta -> manifestFile.read(meta.fileName(), meta.fileSize());
for (ManifestEntry entry :
sequentialBatchedExecute(reader, section, manifestReadParallelism)) {
sorter.write(entry);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the minor-compaction path, DELETE entries are also inserted into the external sorter, but writeSurvivingAddsToManifest later skips every DELETE row and the remaining deletes are sorted/written from deleteEntries. For delete-heavy manifests this duplicates serialization and spill IO. Could we feed only ADD entries into the external sorter and keep DELETE entries only in the map?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will change it.

@discivigour discivigour marked this pull request as ready for review June 26, 2026 08:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants