Skip to content

[FEATURE REQ] Allow ignoring 412s in Spark connector when using ItemPatch #49594

Description

@vaindil

Is your feature request related to a problem? Please describe.
We ingest data into Cosmos using Spark. A Spark job reads from a Kafka topic, does a small amount of aggregation and deduping, then patches documents in Cosmos using the ItemPatch write strategy. Each Kafka message carries values to increment Cosmos fields by, not new totals to set.

For idempotence in case Spark replays a batch, we stamp each updated document with the current batch ID (last_batch_id) and use a patch filter (NOT IS_DEFINED(last_batch_id) OR last_batch_id < <batchId>) so a replay skips documents that were already updated by this batch or a later one.

That filter makes Cosmos return a 412, which is fine, it just means this increment has already been applied there. The problem is that the connector treats the 412 as a fatal error and fails the entire bulk write rather than skipping the one document.

Describe the solution you'd like
A config option like spark.cosmos.write.patch.ignorePreconditionFailure, allowing us to silently ignore 412s when using ItemPatch, would completely solve this. A new write mode would work too, I'm not familiar with the internals here. The connector already ignores 412s for ItemOverwriteIfNotModified and ItemDeleteIfNotModified, so there's some precedent.

(I'm new to this data streaming area, so I apologize if something here is inaccurate or worded poorly!)

Describe alternatives you've considered
The counters being incremented are too costly to calculate the total each time we add a message to the Kafka topic, which is why we're only incrementing. None of the other write modes support incrementing. ItemBulkUpdate is the closest, but it explicitly only supports SET operations in the local "patch" step.

Additional context
N/A

Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

  • Description Added
  • Expected solution specified

Metadata

Metadata

Assignees

No one assigned

    Labels

    ClientThis issue points to a problem in the data-plane of the library.CosmosService AttentionWorkflow: This issue is responsible by Azure service team.customer-reportedIssues that are reported by GitHub users external to the Azure organization.questionThe issue doesn't require a change to the product in order to be resolved. Most issues start as that

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions