feat(server-ng): commit-time metadata validation with result codes#3553
Open
krishvishal wants to merge 4 commits into
Open
feat(server-ng): commit-time metadata validation with result codes#3553krishvishal wants to merge 4 commits into
krishvishal wants to merge 4 commits into
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #3553 +/- ##
============================================
- Coverage 74.44% 72.48% -1.97%
Complexity 937 937
============================================
Files 1243 1244 +1
Lines 125987 121816 -4171
Branches 101856 97730 -4126
============================================
- Hits 93795 88293 -5502
- Misses 29180 30250 +1070
- Partials 3012 3273 +261
🚀 New features to boost your workflow:
|
2a75d00 to
da77592
Compare
Contributor
|
I think is worth addressing as this PR is using the custom decoder in the SDK (hidden behind the |
numinnex
requested changes
Jun 24, 2026
numinnex
left a comment
Contributor
There was a problem hiding this comment.
Add support to SDK to parse the new status codes
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
server-ngpreviously validated metadata business rules during preflight: primary-only, before replication. Invalid requests were then silently dropped.This caused two issues:
1. VSR correctness. A rejection must be a deterministic function of the committed log. It should be computed during apply on every replica and recorded so that all replicas agree. Preflight validation uses a primary-local snapshot, and its verdict is never replicated. After a view change, the new primary is not guaranteed to reach the same decision.
2. Infinite retry. The in-process path mapped the preflight
ErrtoCanceled, leaving the home shard silent. As a result, the SDK kept retrying permanently invalid requests forever, such asCreatePartitionson a missing topic.What Changed
Result taxonomy. Added
core/metadata/src/stm/result.rs. There is now one closed#[repr(u32)]enum per operation:Ok = 0, with other discriminants reused from the existingIggyErrorcode space. This keeps SDK error mapping compatible with existing codes.Apply now returns committed results.
StateHandler::applynow returnsApplyReply { code, body }instead ofBytes. Every previous silentBytes::new()no-op is now represented as a committed result code.MuxStateMachine::updatenow reservesErrfor decode/corruption failures only.Result is encoded in the reply body. The result rides in the reply body. The reply body now starts with a sparse result section,
[count][index, result]*, followed by the payload. This is written in place viabuild_reply_message_with, using one allocation and one payload copy. There is noReplyHeaderlayout change.Eviction instead of silent drop. Structurally invalid requests are now evicted rather than silently dropped:
not-client-allowedmaps toInvalidRequestOperation, while undecodable/overflow maps toInvalidRequestBody.Removed preflight partition-count read. Deleted the
current_partition_countpreflight read. Parent existence is now checked at commit time and returned asCreatePartitionsResult::{Stream, Topic}NotFound, removing the TOCTOU window.Consensus frame size floor. Every consensus frame’s
sizemust now span its header. This is asserted duringconsensus_messageconstruction,PrepareOkprojection, and in the simulator. This prevents reply-body slicing from underflowing.Simulator changes. The simulator now uses outcome-first generation: operations target a chosen outcome, including error outcomes;
on_replydecodes the committed code from the body; and the decoded code is asserted to be a declared result code. The strict targeted-equals-committed oracle is gated to serial runs only:client_count == 1 && CLIENT_REQUEST_QUEUE_MAX == 1.Wire / SDK Impact
ReplyHeaderis unchanged. This is not a#[repr(C)]layout change. However, the reply body format changes. A result section now precedes the payload, so body decoding changes for success replies as well. There is no releasedserver-ngwire format to break.