[GatewayV2] Fix over-spanning issue when partial partition keys are provided.#49688
[GatewayV2] Fix over-spanning issue when partial partition keys are provided.#49688jeet1995 wants to merge 13 commits into
Conversation
For a partial (prefix) hierarchical (MULTI_HASH) partition key, the thin-client (Gateway V2) store model previously sent only StartEpkHash/EndEpkHash routing headers and no doc-level EPK filter. The proxy therefore resolved the request to the owning physical partition and returned every co-located document (an over-span) instead of only those matching the prefix. ThinClientStoreModel.wrapInHttpRequest now detects a prefix MULTI_HASH key and sets READ_FEED_KEY_TYPE=EffectivePartitionKeyRange, START_EPK and END_EPK on the request headers before RntbdRequest.from(), so RntbdRequestHeaders serializes the prefix EPK sub-range [hash(prefix), hash(prefix) + "FF") as the backend doc-level filter (hex string as UTF-8 bytes), mirroring the Direct/RNTBD FeedRangeEpkImpl path. StartEpkHash/EndEpkHash routing headers are retained. QueryPlan requests and full (non-prefix) keys are unchanged. Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
Ports ThinClientQueryE2ETest and its ThinClientTestBase helper. The suite self-baselines every query against a Direct (RNTBD) client and the thin-client (:10250 HTTP/2) path, asserting identical results. This directly guards the MULTI_HASH prefix partition-key EPK over-span fix: prefix (single-component) hierarchical-partition-key queries must return only the narrow matching set, matching Direct, rather than over-spanning to all co-located documents. Both files are runtime self-enabling (enableThinClientForTest() + builder-driven HTTP/2 via setEnabled(true)); no module property or TestSuiteBase changes needed. Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
The earlier fix (499f928) only scoped the prefix MULTI_HASH read when the request carried a non-null partition key. The parallel prefix-query path intentionally nulls the partial hierarchical partition key and instead carries the narrow prefix effective-partition-key range on the request's feedRange, so that guard never fired there and the thin-client read still over-spanned to the whole physical partition. Mark such requests (prefixPartitionKeyQuery) in ParallelDocumentQueryExecutionContextBase and, in ThinClientStoreModel, reuse the already-computed FeedRangeEpkImpl range as the doc-level EPK filter (START_EPK/END_EPK + StartEpkHash/EndEpkHash) instead of recomputing getEPKRangeForPrefixPartitionKey from a partition key we no longer have. This mirrors the Direct/GatewayV1 decision points and honors DRY/SRP. Also relax assertThinClientEndpointUsed so an all-QueryPlan diagnostics context passes: in thin-client mode QueryPlan calls resolve via the classic gateway and are the only requests permitted on a non-thin-client endpoint; any non-QueryPlan (data) request on a non-thin-client endpoint still fails the assertion. Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
…ag on clone, add readAllItems/readMany prefix HPK tests ThinClientStoreModel.wrapInHttpRequest now sets all five EPK-specific RNTBD headers (ReadFeedKeyType, StartEpk/EndEpk, StartEpkHash/EndEpkHash) in a single block directly on the RNTBD request, replacing the split request-header-map + post-construction approach. Wire encoding is unchanged (hex-string UTF-8 bytes for StartEpk/EndEpk, decoded hash bytes for the hash headers). RxDocumentServiceRequest.clone() now copies prefixPartitionKeyQuery so hedged/availability-strategy/retry clones keep the prefix signal and do not over-span. ThinClientQueryE2ETest adds prefix-HPK regression coverage for readAllItems (ReadFeed) and readManyByPartitionKeys, asserting Direct-vs-thin-client parity and per-tenant scoping (no over-span).
…onContextBase Assign isPrefixPartitionKeyQuery directly from isPartialPartitionKeyQuery(...) and guard only the PARTITION_KEY-setting branch with !isPrefixPartitionKeyQuery, removing the if/else. Behavior is unchanged: a partial (prefix) key still leaves partitionKeyInternal null so the prefix FeedRangeEpkImpl survives createDocumentServiceRequestWithFeedRange.
…cutionContextBase Keep main's single 'if (!isPartialPartitionKeyQuery)' structure; only capture the predicate in isPrefixPartitionKeyQuery and pass it to request.setPrefixPartitionKeyQuery. Removes the if/else.
There was a problem hiding this comment.
Pull request overview
Fixes a thin-client (Gateway V2 / :10250 proxy) routing bug where prefix hierarchical (MULTI_HASH) partition keys could over-span to all co-located documents on the physical partition, potentially returning documents from other logical partitions. The change scopes these requests by sending the correct RNTBD EPK-range headers and adds E2E coverage to prevent regression.
Changes:
- Compute the prefix effective-partition-key (EPK) range for MULTI_HASH prefix keys and apply it via RNTBD headers (
ReadFeedKeyType=EffectivePartitionKeyRange,Start/EndEpk,Start/EndEpkHash) inThinClientStoreModel. - Introduce a request flag (
prefixPartitionKeyQuery) to preserve prefix-query intent through the parallel query initialization path when the partition key is intentionally nulled. - Add thin-client E2E tests validating parity with Direct mode for hierarchical prefix queries, plus supporting test infrastructure and changelog entry.
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/ThinClientStoreModel.java | Detect MULTI_HASH prefix PK requests and set EPK-range RNTBD headers to prevent over-span. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/RxDocumentServiceRequest.java | Add prefixPartitionKeyQuery flag with accessors + clone propagation. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/ParallelDocumentQueryExecutionContextBase.java | Set prefixPartitionKeyQuery when a partial hierarchical key is detected and PK header is suppressed. |
| sdk/cosmos/azure-cosmos/CHANGELOG.md | Document the thin-client prefix HPK over-span fix. |
| sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/rx/ThinClientTestBase.java | New shared thin-client test base + endpoint-routing assertions. |
| sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/rx/ThinClientQueryE2ETest.java | New Direct-vs-thin-client query E2E suite including hierarchical prefix regression tests. |
| sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/rx/TestSuiteBase.java | Update thin-client endpoint assertion behavior (noting QueryPlan can use classic gateway). |
In ThinClientStoreModel.wrapInHttpRequest, check the cached prefix feed-range path (isPrefixPartitionKeyQuery + FeedRangeEpkImpl) first and fall back to recomputing from the partition key only for the PK-bearing prefix case. Behavior is unchanged (the two cases are mutually exclusive).
… assertion TestSuiteBase.assertThinClientEndpointUsed now validates every request instead of early-returning on the first thin-client match, so a mixed scenario (some data requests via the classic gateway) fails; QueryPlan requests remain exempt and null endpoints are handled. ThinClientTestBase.assertThinClientEndpointUsed delegates to the shared TestSuiteBase implementation, removing duplicated logic that could NPE on a null endpoint and reject valid QueryPlan-via-gateway scenarios.
|
@sdkReviewAgent |
FabianMeiswinkel
left a comment
There was a problem hiding this comment.
LGTM except for one possible gap - if the conversion to FeedRangeEpk has happened always before already at least add a comment along those lines - if not, then I think the conversion would have to happen here.
|
✅ Review complete (57:48) Posted 5 inline comment(s). Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage |
Replace the explicit MULTI_HASH prefix-detection and pre-serialization range computation with a header-presence branch: when the request already carries START_EPK/END_EPK HTTP headers (populated upstream by FeedRangeEpkImpl for prefix hierarchical partition keys and explicit EPK-range reads), set only the additive StartEpkHash/EndEpkHash RNTBD tokens. The ReadFeedKeyType/StartEpk/ EndEpk string tokens are copied verbatim from those HTTP headers by RntbdRequestHeaders#addStartAndEndKeys during RntbdRequest.from(), so they no longer need to be set here. Removes now-unused imports. Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
Following the simplification of the prefix EPK fix to reuse the existing StartEpk/EndEpk HTTP headers (a56db9a), the RxDocumentServiceRequest prefixPartitionKeyQuery flag is no longer read anywhere. Remove the field, its accessors, the clone copy, and the caller assignment in ParallelDocumentQueryExecutionContextBase. Test changes: - Retrofit CosmosMultiHashTest into the thinclient TestNG group so the MULTI_HASH prefix over-span coverage also runs against GatewayV2 (proxy :10250) over HTTP/2, not just the emulator gateway. - ThinClientQueryE2ETest: correct the prefix-scope Javadoc to describe the actual header-copy behavior and add a deterministic prefix-partition-key readAllItems regression. Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
|
/azp run java - cosmos - tests |
|
/azp run java - cosmos - spark |
|
/azp run java - cosmos - kafka |
|
Azure Pipelines successfully started running 1 pipeline(s). |
2 similar comments
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Azure Pipelines successfully started running 1 pipeline(s). |
The prefix-query flag on RxDocumentServiceRequest was removed in 03a7c67 after the fix was simplified to reuse the StartEpk/EndEpk HTTP headers. The only functional edit in this file (the setter call) went away with it, leaving behind unnecessary refactor noise. Restore the file to its upstream/main state so the hotfix touches no query execution code. Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
|
/azp run java - cosmos - tests |
|
/azp run java - cosmos - spark |
|
/azp run java - cosmos - kafka |
|
Azure Pipelines successfully started running 1 pipeline(s). |
2 similar comments
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@sdkReviewAgent |
| if (partitionKey != null) { | ||
| String startEpk = request.getHeaders().get(HttpConstants.HttpHeaders.START_EPK); | ||
| String endEpk = request.getHeaders().get(HttpConstants.HttpHeaders.END_EPK); | ||
| if (startEpk != null && endEpk != null) { |
There was a problem hiding this comment.
🟢 Suggestion · Robustness · Also require READ_FEED_KEY_TYPE for this branch
if (startEpk != null && endEpk != null) {The correctness of this branch depends on three RNTBD tokens being emitted together — ReadFeedKeyType=EffectivePartitionKeyRange, StartEpk/EndEpk (string), and the new StartEpkHash/EndEpkHash (binary). The first two are populated inside RntbdRequest.from() by RntbdRequestHeaders#addStartAndEndKeys, which reads the READ_FEED_KEY_TYPE, START_EPK, and END_EPK HTTP headers independently (RntbdRequestHeaders.java:1349-1401). Each is null-guarded on its own.
Today FeedRangeEpkImpl#populateFeedRangeFilteringHeaders always sets all three together in case 3, so the invariant holds. But this branch only guards on START_EPK/END_EPK. If any future code path (or a bug in header plumbing / a request-cloning helper that drops a subset of headers) sets START_EPK/END_EPK without also setting READ_FEED_KEY_TYPE, the RNTBD frame goes out with the two Epk/EpkHash pairs but no ReadFeedKeyType token — and per the PR description itself:
ReadFeedKeyType = EffectivePartitionKeyRange— tells the backend to interpret the EPK tokens as a range filter; without it the pair is ignored and the read falls back to the whole partition.
That is silent over-span — exactly the bug this PR fixes, in a different guise, with the same "returns other tenants' documents" failure mode the tests are designed to catch. Given this fix guards against a data-safety regression, a small defensive check is cheap insurance:
String readFeedKeyType = request.getHeaders().get(HttpConstants.HttpHeaders.READ_FEED_KEY_TYPE);
if (startEpk != null && endEpk != null
&& ReadFeedKeyType.EffectivePartitionKeyRange.toString().equalsIgnoreCase(readFeedKeyType)) {
// ... emit StartEpkHash / EndEpkHash
}The stricter guard also documents the coupling in code, so a future reader touching either side sees the dependency.
| * negative guard. | ||
| */ | ||
| @Test(groups = {"thinclient"}, timeOut = TIMEOUT) | ||
| public void testHierarchicalReadAllItemsPrefixPartitionKey() { |
There was a problem hiding this comment.
🟢 Suggestion · Test Coverage · No test exercises the new branch via explicit user-supplied FeedRange
The fix's own comment on the new branch explicitly says it covers two entry points:
"This covers a partial (prefix) hierarchical partition key query as well as any explicit EPK-range read."
All three regression tests in this file (testHierarchicalPrefixHalfOpenRange, testHierarchicalReadAllItemsPrefixPartitionKey, testHierarchicalReadManyByPartitionKeysPrefixPartitionKey) exercise only the prefix-HPK entry point. The "any explicit EPK-range read" side — the branch also fires for options.setFeedRange(FeedRange.forEpkRange(...)) or a range obtained from container.getFeedRanges() — is not exercised anywhere in the thin-client test surface (grep for forEpkRange / getFeedRanges / setFeedRange in the thinclient group returns zero matches).
That's a real production path: applications commonly iterate container.getFeedRanges() to shard a scan across physical partitions. Every per-range query in that pattern will take the new branch, but with a bounds source (the user-supplied FeedRange) that goes through a completely different upstream code path than the prefix-HPK derivation. A Direct-vs-thin parity test on the prefix path would not catch a regression here.
Suggested test (in this file): shard the existing directCrossPartitionContainer by iterating container.getFeedRanges(), run SELECT * FROM c per range against both Direct and thin client, assert (a) union matches full-fanout results, (b) each per-range result is identical to the Direct baseline for that range, (c) assertThinClientEndpointUsed(...) on each page's diagnostics. This closes the second half of the branch's stated coverage promise for the cost of one new test method.
|
✅ Review complete (59:52) Posted 2 inline comment(s). Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage |
Fix thin-client over-span on MULTI_HASH prefix partition-key queries
Problem
Thin-client (GatewayV2, proxy
:10250) requests against a hierarchical (MULTI_HASH) container using only a prefix of the key paths (e.g. PK[tenantId, userId], supply onlytenantId) over-spanned to every co-located document on the physical partition, returning other tenants' documents. Full partition keys and QueryPlan requests were unaffected.Fix (
ThinClientStoreModel.java)On the prefix case (
kind == MULTI_HASH && components < paths), compute the prefix EPK range and set these RNTBD headers on the request:ReadFeedKeyType = EffectivePartitionKeyRange— tells the backend to interpret the EPK tokens as a range filter; without it the pair is ignored and the read falls back to the whole partition.StartEpk/EndEpk— hex EPK bounds of the prefix range; the backend's doc-level filter that trims out-of-prefix documents.StartEpkHash/EndEpkHash— binary hash bounds that steer to the correct physical partition.Core HPK tests (
ThinClientQueryE2ETest)Seed = 4 tenants × 6 users on
[/tenantId, /userId]; every test is parity-checked against a Direct-mode baseline.testHierarchicalPrefixHalfOpenRange— the query guard: runsSELECT * FROM c ORDER BY c.idxwith a prefix PK (tenantIdonly) → asserts exactly 6 documents (the tenant's users); with the full key → exactly 1.testHierarchicalReadAllItemsPrefixPartitionKey—readAllItemsprefix → exactly the tenant's 6 documents; a per-document assertion fails if any foreign tenant leaks in.testHierarchicalReadManyByPartitionKeysPrefixPartitionKey—readManyprefix → stays scoped to 6.testHierarchicalReadAllItemsFullPartitionKey— full key → 1 document (no regression).