Ignore 412s for Cosmos Spark ItemPatch with filter predicate#49700
Open
tvaron3 wants to merge 4 commits into
Open
Ignore 412s for Cosmos Spark ItemPatch with filter predicate#49700tvaron3 wants to merge 4 commits into
tvaron3 wants to merge 4 commits into
Conversation
Adds an opt-in config `spark.cosmos.write.patch.filterPredicateIgnorePreconditionFailures` (default false) that treats a 412 Precondition Failed as a successful no-op skip when using the ItemPatch/ItemPatchIfExists write strategy together with a conditional spark.cosmos.write.patch.filter, in both bulk and point write paths. This mirrors the existing graceful-skip behavior of ItemOverwriteIfNotModified/ItemDeleteIfNotModified. Fixes Azure#49594 Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
Member
Author
|
/azp run java - cosmos - spark |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds an opt-in write configuration for the Cosmos DB Spark connector to treat 412 Precondition Failed responses as successful no-op skips when using ItemPatch / ItemPatchIfExists with a conditional patch filter, aligning behavior with existing “skip” semantics for other conditional write strategies.
Changes:
- Introduces
spark.cosmos.write.patch.filterPredicateIgnorePreconditionFailures(defaultfalse) and wires it through write config parsing intoCosmosPatchConfigs. - Updates bulk and point write paths to ignore
412only when (a) patch strategy is used, (b) a filter predicate is configured, and (c) the new flag is enabled. - Adds unit/integration tests for config parsing and the skip-on-412 behavior, plus documentation and changelog updates across Spark artifacts.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/cosmos/azure-cosmos-spark_3/src/main/scala/com/azure/cosmos/spark/CosmosConfig.scala | Adds the new config key, registers it, extends CosmosPatchConfigs, and parses the flag for patch write strategies. |
| sdk/cosmos/azure-cosmos-spark_3/src/main/scala/com/azure/cosmos/spark/BulkWriter.scala | Extends shouldIgnore to skip 412 for ItemPatch/ItemPatchIfExists when the new flag is enabled and a filter predicate is present. |
| sdk/cosmos/azure-cosmos-spark_3/src/main/scala/com/azure/cosmos/spark/PointWriter.scala | Adds a targeted 412 skip branch in patchWithRetry under the same guard conditions as bulk. |
| sdk/cosmos/azure-cosmos-spark_3/src/test/scala/com/azure/cosmos/spark/CosmosConfigSpec.scala | Adds a unit test to validate parsing/defaulting of the new patch config flag. |
| sdk/cosmos/azure-cosmos-spark_3/src/test/scala/com/azure/cosmos/spark/BulkWriterITest.scala | Adds an integration test covering skip-on-412 behavior for bulk patch with an always-false filter when flag is enabled. |
| sdk/cosmos/azure-cosmos-spark_3/src/test/scala/com/azure/cosmos/spark/PointWriterITest.scala | Adds an integration test covering skip-on-412 behavior for point patch with an always-false filter when flag is enabled. |
| sdk/cosmos/azure-cosmos-spark_3/src/test/scala/com/azure/cosmos/spark/utils/CosmosPatchTestHelper.scala | Extends patch writer helpers to accept and pass through the new flag. |
| sdk/cosmos/azure-cosmos-spark_3/docs/configuration-reference.md | Documents the new configuration option and its applicability constraints. |
| sdk/cosmos/azure-cosmos-spark_3-3_2-12/CHANGELOG.md | Adds a “Features Added” changelog entry describing the new config flag. |
| sdk/cosmos/azure-cosmos-spark_3-4_2-12/CHANGELOG.md | Adds a “Features Added” changelog entry describing the new config flag. |
| sdk/cosmos/azure-cosmos-spark_3-5_2-12/CHANGELOG.md | Adds a “Features Added” changelog entry describing the new config flag. |
| sdk/cosmos/azure-cosmos-spark_3-5_2-13/CHANGELOG.md | Adds a “Features Added” changelog entry describing the new config flag. |
| sdk/cosmos/azure-cosmos-spark_4-0_2-13/CHANGELOG.md | Adds a “Features Added” changelog entry describing the new config flag. |
| sdk/cosmos/azure-cosmos-spark_4-1_2-13/CHANGELOG.md | Adds a “Features Added” changelog entry describing the new config flag. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements the feature requested in #49594. Adds an opt-in config option
spark.cosmos.write.patch.filterPredicateIgnorePreconditionFailures(boolean, defaultfalse) for the Cosmos DB Spark connector.When enabled with the
ItemPatch/ItemPatchIfExistswrite strategy together with a conditional patch filter (spark.cosmos.write.patch.filter), an HTTP412 Precondition Failed— which the Cosmos service returns for documents excluded by the filter predicate — is treated as a successful no-op skip instead of failing the whole (bulk or point) write. This mirrors the existing graceful-skip behavior ofItemOverwriteIfNotModified/ItemDeleteIfNotModified. Defaultfalsepreserves the current fail-fast behavior.Customer scenario
Kafka -> Spark -> Cosmos ingestion using server-side
incrementpatch operations guarded by an idempotency filter (e.g.NOT IS_DEFINED(last_batch_id) OR last_batch_id < <batchId>). On replays, the filter legitimately excludes already-applied documents, producing 412s that today fail the entire batch. This flag lets those 412s be skipped so idempotent replays succeed.Changes
CosmosConfig.scala— new config name constant, registration in the known-config list,CosmosConfigEntry[Boolean](defaultfalse), newfilterPredicateIgnorePreconditionFailuresfield onCosmosPatchConfigs, and parsing inparseWriteConfigfor theItemPatch/ItemPatchIfExistsbranch.BulkWriter.scala—shouldIgnorenow skips a 412 forItemPatch/ItemPatchIfExistswhen the flag is enabled and a filter predicate is configured (read None-safely). ExistingItemPatchIfExistsnot-found skip is preserved.PointWriter.scala—patchWithRetryadds a catch case that skips a 412 (logs skip, tracks a 0-count op, returns) under the same gate. 412 is not classified transient, so ordering is safe.CosmosConfigSpecparse test;BulkWriterITest+PointWriterITestskip-on-412 scenarios; newfilterPredicateIgnorePreconditionFailuresparam onCosmosPatchTestHelper.configuration-reference.mdrow; "Features Added" entry in all 6 released Spark modules that build from the shared source tree.Verification
CosmosConfigSpecpasses (56/56, including the new parse test).Fixes #49594