Skip to content

[FLUSS-3483][coordinator] Reject lake snapshot commits for dropped tables#3494

Open
Jackeyzhe wants to merge 1 commit into
apache:mainfrom
Jackeyzhe:fluss-3483
Open

[FLUSS-3483][coordinator] Reject lake snapshot commits for dropped tables#3494
Jackeyzhe wants to merge 1 commit into
apache:mainfrom
Jackeyzhe:fluss-3483

Conversation

@Jackeyzhe

Copy link
Copy Markdown
Contributor

Purpose

Linked issue: close #3483

This PR fixes a race where a stale V2 lake snapshot commit can be accepted after a table is dropped and recreated. In that case, the recreated table may get a /laketable znode pointing to an offsets metadata file generated for the old table id, causing lake tiering to repeatedly fail when reading the stale offsets path.

Brief change log

  • Reject V2 lake snapshot commits when the target table id no longer exists or is queued for deletion.
  • Validate the V2 lake snapshot offsets metadata before writing it to ZK:
    • the offsets file must be readable;
    • the file-level tableId must match the commit table id;
    • every TableBucket in the offsets file must also belong to the commit table id.
  • Validate both tiered offsets and readable offsets metadata paths when they differ.
  • Add regression coverage for:
    • non-existent table id commits;
    • table queued for deletion;
    • recreated table id with stale offsets metadata from the old table id;
    • RPC-level V2 commit after table drop.

Tests

mvn -pl fluss-server -DskipITs -Dcheckstyle.skip -Drat.skip -Dspotless.check.skip=true -Dtest=CoordinatorEventProcessorTest#testCommitLakeTableSnapshotV2RejectsMismatchedOffsetsFileTableId+testCommitLakeTableSnapshotV2RejectsNonExistentTable+testCommitLakeTableSnapshotV2RejectsQueuedForDeletionTable test
mvn -pl fluss-server -DskipITs -Dcheckstyle.skip -Drat.skip -Dspotless.check.skip=true -Dtest=CommitLakeTableSnapshotITCase#testCommitLakeTableSnapshotV2RejectedAfterDropTable test

API and Format

no

Documentation

no

@Jackeyzhe Jackeyzhe changed the title [FLUSS-3483][coordinator] Validate lake snapshot offsets before commit [FLUSS-3483][coordinator] Reject lake snapshot commits for dropped tables Jun 24, 2026
@luoyuxia luoyuxia requested a review from Copilot June 24, 2026 02:03

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a coordinator race where stale V2 lake snapshot commits could be accepted after a table drop/recreate, potentially writing an offsets-metadata pointer for an old tableId and permanently breaking lake tiering for the recreated table.

Changes:

  • Add coordinator-side rejection for V2 lake snapshot commits targeting a tableId that no longer exists or is queued for deletion.
  • Add regression tests (unit + integration) covering non-existent/queued-for-deletion tableIds and an RPC-level V2 commit after a drop.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
fluss-server/src/main/java/org/apache/fluss/server/coordinator/CoordinatorEventProcessor.java Adds a guard to reject V2 snapshot commits for dropped / queued-for-deletion tables.
fluss-server/src/test/java/org/apache/fluss/server/coordinator/CoordinatorEventProcessorTest.java Adds unit tests verifying V2 commit rejection for non-existent and queued-for-deletion tables.
fluss-server/src/test/java/org/apache/fluss/server/replica/CommitLakeTableSnapshotITCase.java Adds an IT case verifying V2 snapshot commit is rejected after dropping a table.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 2113 to 2117
throw new FlussRuntimeException(
"Lake snapshot metadata is null for table " + tableId);
}
ensureTableCanAcceptLakeSnapshot(tableId);
lakeTableHelper.registerLakeTableSnapshotV2(

@luoyuxia luoyuxia left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jackeyzhe Thanks for the pr. LGTM overall. Left minor comments. PTAL

}

@Test
void testCommitLakeTableSnapshotV2RejectsNonExistentTable() throws Exception {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we can remove the ut since testCommitLakeTableSnapshotV2RejectedAfterDropTable alreay cover it. I want to keep this test class short if possible

});
}

private void ensureTableCanAcceptLakeSnapshot(long tableId) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename to ensureTableNotDeleted? I can image maybe other event maywell need this method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

commitLakeTableSnapshot writes stale snapshot path after dropTable, leaving the tiering pipeline permanently stuck

3 participants