Skip to content

[GatewayV2] Fix over-spanning issue when partial partition keys are provided.#49688

Open
jeet1995 wants to merge 13 commits into
Azure:mainfrom
jeet1995:fix/thinclient-multihash-prefix-epk-overspan
Open

[GatewayV2] Fix over-spanning issue when partial partition keys are provided.#49688
jeet1995 wants to merge 13 commits into
Azure:mainfrom
jeet1995:fix/thinclient-multihash-prefix-epk-overspan

Conversation

@jeet1995

@jeet1995 jeet1995 commented Jul 1, 2026

Copy link
Copy Markdown
Member

Fix thin-client over-span on MULTI_HASH prefix partition-key queries

Problem

Thin-client (GatewayV2, proxy :10250) requests against a hierarchical (MULTI_HASH) container using only a prefix of the key paths (e.g. PK [tenantId, userId], supply only tenantId) over-spanned to every co-located document on the physical partition, returning other tenants' documents. Full partition keys and QueryPlan requests were unaffected.

Fix (ThinClientStoreModel.java)

On the prefix case (kind == MULTI_HASH && components < paths), compute the prefix EPK range and set these RNTBD headers on the request:

  • ReadFeedKeyType = EffectivePartitionKeyRange — tells the backend to interpret the EPK tokens as a range filter; without it the pair is ignored and the read falls back to the whole partition.
  • StartEpk / EndEpk — hex EPK bounds of the prefix range; the backend's doc-level filter that trims out-of-prefix documents.
  • StartEpkHash / EndEpkHash — binary hash bounds that steer to the correct physical partition.

Core HPK tests (ThinClientQueryE2ETest)

Seed = 4 tenants × 6 users on [/tenantId, /userId]; every test is parity-checked against a Direct-mode baseline.

  • testHierarchicalPrefixHalfOpenRange — the query guard: runs SELECT * FROM c ORDER BY c.idx with a prefix PK (tenantId only) → asserts exactly 6 documents (the tenant's users); with the full key → exactly 1.
  • testHierarchicalReadAllItemsPrefixPartitionKeyreadAllItems prefix → exactly the tenant's 6 documents; a per-document assertion fails if any foreign tenant leaks in.
  • testHierarchicalReadManyByPartitionKeysPrefixPartitionKeyreadMany prefix → stays scoped to 6.
  • testHierarchicalReadAllItemsFullPartitionKey — full key → 1 document (no regression).

jeet1995 and others added 2 commits July 1, 2026 09:48
For a partial (prefix) hierarchical (MULTI_HASH) partition key, the thin-client
(Gateway V2) store model previously sent only StartEpkHash/EndEpkHash routing
headers and no doc-level EPK filter. The proxy therefore resolved the request to
the owning physical partition and returned every co-located document (an
over-span) instead of only those matching the prefix.

ThinClientStoreModel.wrapInHttpRequest now detects a prefix MULTI_HASH key and
sets READ_FEED_KEY_TYPE=EffectivePartitionKeyRange, START_EPK and END_EPK on the
request headers before RntbdRequest.from(), so RntbdRequestHeaders serializes the
prefix EPK sub-range [hash(prefix), hash(prefix) + "FF") as the backend doc-level
filter (hex string as UTF-8 bytes), mirroring the Direct/RNTBD FeedRangeEpkImpl
path. StartEpkHash/EndEpkHash routing headers are retained. QueryPlan requests
and full (non-prefix) keys are unchanged.

Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
Ports ThinClientQueryE2ETest and its ThinClientTestBase helper. The suite
self-baselines every query against a Direct (RNTBD) client and the thin-client
(:10250 HTTP/2) path, asserting identical results. This directly guards the
MULTI_HASH prefix partition-key EPK over-span fix: prefix (single-component)
hierarchical-partition-key queries must return only the narrow matching set,
matching Direct, rather than over-spanning to all co-located documents.

Both files are runtime self-enabling (enableThinClientForTest() + builder-driven
HTTP/2 via setEnabled(true)); no module property or TestSuiteBase changes needed.

Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
@github-actions github-actions Bot added the Cosmos label Jul 1, 2026
jeet1995 and others added 5 commits July 1, 2026 14:34
The earlier fix (499f928) only scoped the prefix MULTI_HASH read when the
request carried a non-null partition key. The parallel prefix-query path
intentionally nulls the partial hierarchical partition key and instead carries
the narrow prefix effective-partition-key range on the request's feedRange, so
that guard never fired there and the thin-client read still over-spanned to the
whole physical partition.

Mark such requests (prefixPartitionKeyQuery) in
ParallelDocumentQueryExecutionContextBase and, in ThinClientStoreModel, reuse
the already-computed FeedRangeEpkImpl range as the doc-level EPK filter
(START_EPK/END_EPK + StartEpkHash/EndEpkHash) instead of recomputing
getEPKRangeForPrefixPartitionKey from a partition key we no longer have. This
mirrors the Direct/GatewayV1 decision points and honors DRY/SRP.

Also relax assertThinClientEndpointUsed so an all-QueryPlan diagnostics context
passes: in thin-client mode QueryPlan calls resolve via the classic gateway and
are the only requests permitted on a non-thin-client endpoint; any non-QueryPlan
(data) request on a non-thin-client endpoint still fails the assertion.

Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
…ag on clone, add readAllItems/readMany prefix HPK tests

ThinClientStoreModel.wrapInHttpRequest now sets all five EPK-specific RNTBD headers (ReadFeedKeyType, StartEpk/EndEpk, StartEpkHash/EndEpkHash) in a single block directly on the RNTBD request, replacing the split request-header-map + post-construction approach. Wire encoding is unchanged (hex-string UTF-8 bytes for StartEpk/EndEpk, decoded hash bytes for the hash headers).

RxDocumentServiceRequest.clone() now copies prefixPartitionKeyQuery so hedged/availability-strategy/retry clones keep the prefix signal and do not over-span.

ThinClientQueryE2ETest adds prefix-HPK regression coverage for readAllItems (ReadFeed) and readManyByPartitionKeys, asserting Direct-vs-thin-client parity and per-tenant scoping (no over-span).
…onContextBase

Assign isPrefixPartitionKeyQuery directly from isPartialPartitionKeyQuery(...) and guard only the PARTITION_KEY-setting branch with !isPrefixPartitionKeyQuery, removing the if/else. Behavior is unchanged: a partial (prefix) key still leaves partitionKeyInternal null so the prefix FeedRangeEpkImpl survives createDocumentServiceRequestWithFeedRange.
…cutionContextBase

Keep main's single 'if (!isPartialPartitionKeyQuery)' structure; only capture the predicate in isPrefixPartitionKeyQuery and pass it to request.setPrefixPartitionKeyQuery. Removes the if/else.
@jeet1995 jeet1995 marked this pull request as ready for review July 1, 2026 21:49
@jeet1995 jeet1995 requested review from a team and kirankumarkolli as code owners July 1, 2026 21:49
Copilot AI review requested due to automatic review settings July 1, 2026 21:49
@jeet1995 jeet1995 requested a review from a team as a code owner July 1, 2026 21:49

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a thin-client (Gateway V2 / :10250 proxy) routing bug where prefix hierarchical (MULTI_HASH) partition keys could over-span to all co-located documents on the physical partition, potentially returning documents from other logical partitions. The change scopes these requests by sending the correct RNTBD EPK-range headers and adds E2E coverage to prevent regression.

Changes:

  • Compute the prefix effective-partition-key (EPK) range for MULTI_HASH prefix keys and apply it via RNTBD headers (ReadFeedKeyType=EffectivePartitionKeyRange, Start/EndEpk, Start/EndEpkHash) in ThinClientStoreModel.
  • Introduce a request flag (prefixPartitionKeyQuery) to preserve prefix-query intent through the parallel query initialization path when the partition key is intentionally nulled.
  • Add thin-client E2E tests validating parity with Direct mode for hierarchical prefix queries, plus supporting test infrastructure and changelog entry.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/ThinClientStoreModel.java Detect MULTI_HASH prefix PK requests and set EPK-range RNTBD headers to prevent over-span.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/RxDocumentServiceRequest.java Add prefixPartitionKeyQuery flag with accessors + clone propagation.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/ParallelDocumentQueryExecutionContextBase.java Set prefixPartitionKeyQuery when a partial hierarchical key is detected and PK header is suppressed.
sdk/cosmos/azure-cosmos/CHANGELOG.md Document the thin-client prefix HPK over-span fix.
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/rx/ThinClientTestBase.java New shared thin-client test base + endpoint-routing assertions.
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/rx/ThinClientQueryE2ETest.java New Direct-vs-thin-client query E2E suite including hierarchical prefix regression tests.
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/rx/TestSuiteBase.java Update thin-client endpoint assertion behavior (noting QueryPlan can use classic gateway).

In ThinClientStoreModel.wrapInHttpRequest, check the cached prefix feed-range path (isPrefixPartitionKeyQuery + FeedRangeEpkImpl) first and fall back to recomputing from the partition key only for the PK-bearing prefix case. Behavior is unchanged (the two cases are mutually exclusive).
@jeet1995 jeet1995 changed the title [Cosmos] Fix thin-client MULTI_HASH prefix partition key EPK over-span [GatewayV2] Fix over-spanning issue when partial partition keys are provided. Jul 1, 2026
… assertion

TestSuiteBase.assertThinClientEndpointUsed now validates every request instead of early-returning on the first thin-client match, so a mixed scenario (some data requests via the classic gateway) fails; QueryPlan requests remain exempt and null endpoints are handled. ThinClientTestBase.assertThinClientEndpointUsed delegates to the shared TestSuiteBase implementation, removing duplicated logic that could NPE on a null endpoint and reject valid QueryPlan-via-gateway scenarios.
@jeet1995

jeet1995 commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

@sdkReviewAgent

@tvaron3 tvaron3 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@FabianMeiswinkel FabianMeiswinkel left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for one possible gap - if the conversion to FeedRangeEpk has happened always before already at least add a comment along those lines - if not, then I think the conversion would have to happen here.

@xinlian12

Copy link
Copy Markdown
Member

Review complete (57:48)

Posted 5 inline comment(s).

Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage

jeet1995 and others added 3 commits July 2, 2026 17:21
Replace the explicit MULTI_HASH prefix-detection and pre-serialization range
computation with a header-presence branch: when the request already carries
START_EPK/END_EPK HTTP headers (populated upstream by FeedRangeEpkImpl for
prefix hierarchical partition keys and explicit EPK-range reads), set only the
additive StartEpkHash/EndEpkHash RNTBD tokens. The ReadFeedKeyType/StartEpk/
EndEpk string tokens are copied verbatim from those HTTP headers by
RntbdRequestHeaders#addStartAndEndKeys during RntbdRequest.from(), so they no
longer need to be set here. Removes now-unused imports.

Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
Following the simplification of the prefix EPK fix to reuse the existing
StartEpk/EndEpk HTTP headers (a56db9a), the RxDocumentServiceRequest
prefixPartitionKeyQuery flag is no longer read anywhere. Remove the field,
its accessors, the clone copy, and the caller assignment in
ParallelDocumentQueryExecutionContextBase.

Test changes:
- Retrofit CosmosMultiHashTest into the thinclient TestNG group so the
  MULTI_HASH prefix over-span coverage also runs against GatewayV2 (proxy
  :10250) over HTTP/2, not just the emulator gateway.
- ThinClientQueryE2ETest: correct the prefix-scope Javadoc to describe the
  actual header-copy behavior and add a deterministic prefix-partition-key
  readAllItems regression.

Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
@jeet1995

jeet1995 commented Jul 3, 2026

Copy link
Copy Markdown
Member Author

/azp run java - cosmos - tests

@jeet1995

jeet1995 commented Jul 3, 2026

Copy link
Copy Markdown
Member Author

/azp run java - cosmos - spark

@jeet1995

jeet1995 commented Jul 3, 2026

Copy link
Copy Markdown
Member Author

/azp run java - cosmos - kafka

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

2 similar comments
@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

The prefix-query flag on RxDocumentServiceRequest was removed in
03a7c67 after the fix was simplified to reuse the StartEpk/EndEpk
HTTP headers. The only functional edit in this file (the setter call)
went away with it, leaving behind unnecessary refactor noise. Restore
the file to its upstream/main state so the hotfix touches no query
execution code.

Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
@jeet1995

jeet1995 commented Jul 3, 2026

Copy link
Copy Markdown
Member Author

/azp run java - cosmos - tests

@jeet1995

jeet1995 commented Jul 3, 2026

Copy link
Copy Markdown
Member Author

/azp run java - cosmos - spark

@jeet1995

jeet1995 commented Jul 3, 2026

Copy link
Copy Markdown
Member Author

/azp run java - cosmos - kafka

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

2 similar comments
@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@jeet1995

jeet1995 commented Jul 3, 2026

Copy link
Copy Markdown
Member Author

@sdkReviewAgent

if (partitionKey != null) {
String startEpk = request.getHeaders().get(HttpConstants.HttpHeaders.START_EPK);
String endEpk = request.getHeaders().get(HttpConstants.HttpHeaders.END_EPK);
if (startEpk != null && endEpk != null) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Suggestion · Robustness · Also require READ_FEED_KEY_TYPE for this branch

if (startEpk != null && endEpk != null) {

The correctness of this branch depends on three RNTBD tokens being emitted together — ReadFeedKeyType=EffectivePartitionKeyRange, StartEpk/EndEpk (string), and the new StartEpkHash/EndEpkHash (binary). The first two are populated inside RntbdRequest.from() by RntbdRequestHeaders#addStartAndEndKeys, which reads the READ_FEED_KEY_TYPE, START_EPK, and END_EPK HTTP headers independently (RntbdRequestHeaders.java:1349-1401). Each is null-guarded on its own.

Today FeedRangeEpkImpl#populateFeedRangeFilteringHeaders always sets all three together in case 3, so the invariant holds. But this branch only guards on START_EPK/END_EPK. If any future code path (or a bug in header plumbing / a request-cloning helper that drops a subset of headers) sets START_EPK/END_EPK without also setting READ_FEED_KEY_TYPE, the RNTBD frame goes out with the two Epk/EpkHash pairs but no ReadFeedKeyType token — and per the PR description itself:

ReadFeedKeyType = EffectivePartitionKeyRange — tells the backend to interpret the EPK tokens as a range filter; without it the pair is ignored and the read falls back to the whole partition.

That is silent over-span — exactly the bug this PR fixes, in a different guise, with the same "returns other tenants' documents" failure mode the tests are designed to catch. Given this fix guards against a data-safety regression, a small defensive check is cheap insurance:

String readFeedKeyType = request.getHeaders().get(HttpConstants.HttpHeaders.READ_FEED_KEY_TYPE);
if (startEpk != null && endEpk != null
    && ReadFeedKeyType.EffectivePartitionKeyRange.toString().equalsIgnoreCase(readFeedKeyType)) {
    // ... emit StartEpkHash / EndEpkHash
}

The stricter guard also documents the coupling in code, so a future reader touching either side sees the dependency.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

* negative guard.
*/
@Test(groups = {"thinclient"}, timeOut = TIMEOUT)
public void testHierarchicalReadAllItemsPrefixPartitionKey() {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Suggestion · Test Coverage · No test exercises the new branch via explicit user-supplied FeedRange

The fix's own comment on the new branch explicitly says it covers two entry points:

"This covers a partial (prefix) hierarchical partition key query as well as any explicit EPK-range read."

All three regression tests in this file (testHierarchicalPrefixHalfOpenRange, testHierarchicalReadAllItemsPrefixPartitionKey, testHierarchicalReadManyByPartitionKeysPrefixPartitionKey) exercise only the prefix-HPK entry point. The "any explicit EPK-range read" side — the branch also fires for options.setFeedRange(FeedRange.forEpkRange(...)) or a range obtained from container.getFeedRanges() — is not exercised anywhere in the thin-client test surface (grep for forEpkRange / getFeedRanges / setFeedRange in the thinclient group returns zero matches).

That's a real production path: applications commonly iterate container.getFeedRanges() to shard a scan across physical partitions. Every per-range query in that pattern will take the new branch, but with a bounds source (the user-supplied FeedRange) that goes through a completely different upstream code path than the prefix-HPK derivation. A Direct-vs-thin parity test on the prefix path would not catch a regression here.

Suggested test (in this file): shard the existing directCrossPartitionContainer by iterating container.getFeedRanges(), run SELECT * FROM c per range against both Direct and thin client, assert (a) union matches full-fanout results, (b) each per-range result is identical to the Direct baseline for that range, (c) assertThinClientEndpointUsed(...) on each page's diagnostics. This closes the second half of the branch's stated coverage promise for the cost of one new test method.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

@xinlian12

Copy link
Copy Markdown
Member

Review complete (59:52)

Posted 2 inline comment(s).

Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants