[FLUSS-3483][coordinator] Reject lake snapshot commits for dropped tables#3494
[FLUSS-3483][coordinator] Reject lake snapshot commits for dropped tables#3494Jackeyzhe wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses a coordinator race where stale V2 lake snapshot commits could be accepted after a table drop/recreate, potentially writing an offsets-metadata pointer for an old tableId and permanently breaking lake tiering for the recreated table.
Changes:
- Add coordinator-side rejection for V2 lake snapshot commits targeting a tableId that no longer exists or is queued for deletion.
- Add regression tests (unit + integration) covering non-existent/queued-for-deletion tableIds and an RPC-level V2 commit after a drop.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| fluss-server/src/main/java/org/apache/fluss/server/coordinator/CoordinatorEventProcessor.java | Adds a guard to reject V2 snapshot commits for dropped / queued-for-deletion tables. |
| fluss-server/src/test/java/org/apache/fluss/server/coordinator/CoordinatorEventProcessorTest.java | Adds unit tests verifying V2 commit rejection for non-existent and queued-for-deletion tables. |
| fluss-server/src/test/java/org/apache/fluss/server/replica/CommitLakeTableSnapshotITCase.java | Adds an IT case verifying V2 snapshot commit is rejected after dropping a table. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| throw new FlussRuntimeException( | ||
| "Lake snapshot metadata is null for table " + tableId); | ||
| } | ||
| ensureTableCanAcceptLakeSnapshot(tableId); | ||
| lakeTableHelper.registerLakeTableSnapshotV2( |
luoyuxia
left a comment
There was a problem hiding this comment.
@Jackeyzhe Thanks for the pr. LGTM overall. Left minor comments. PTAL
| } | ||
|
|
||
| @Test | ||
| void testCommitLakeTableSnapshotV2RejectsNonExistentTable() throws Exception { |
There was a problem hiding this comment.
nit: I think we can remove the ut since testCommitLakeTableSnapshotV2RejectedAfterDropTable alreay cover it. I want to keep this test class short if possible
| }); | ||
| } | ||
|
|
||
| private void ensureTableCanAcceptLakeSnapshot(long tableId) { |
There was a problem hiding this comment.
nit: rename to ensureTableNotDeleted? I can image maybe other event maywell need this method.
Purpose
Linked issue: close #3483
This PR fixes a race where a stale V2 lake snapshot commit can be accepted after a table is dropped and recreated. In that case, the recreated table may get a
/laketableznode pointing to an offsets metadata file generated for the old table id, causing lake tiering to repeatedly fail when reading the stale offsets path.Brief change log
tableIdmust match the commit table id;TableBucketin the offsets file must also belong to the commit table id.Tests
mvn -pl fluss-server -DskipITs -Dcheckstyle.skip -Drat.skip -Dspotless.check.skip=true -Dtest=CoordinatorEventProcessorTest#testCommitLakeTableSnapshotV2RejectsMismatchedOffsetsFileTableId+testCommitLakeTableSnapshotV2RejectsNonExistentTable+testCommitLakeTableSnapshotV2RejectsQueuedForDeletionTable testmvn -pl fluss-server -DskipITs -Dcheckstyle.skip -Drat.skip -Dspotless.check.skip=true -Dtest=CommitLakeTableSnapshotITCase#testCommitLakeTableSnapshotV2RejectedAfterDropTable testAPI and Format
no
Documentation
no