Is your feature request related to a problem? Please describe.
We ingest data into Cosmos using Spark. A Spark job reads from a Kafka topic, does a small amount of aggregation and deduping, then patches documents in Cosmos using the ItemPatch write strategy. Each Kafka message carries values to increment Cosmos fields by, not new totals to set.
For idempotence in case Spark replays a batch, we stamp each updated document with the current batch ID (last_batch_id) and use a patch filter (NOT IS_DEFINED(last_batch_id) OR last_batch_id < <batchId>) so a replay skips documents that were already updated by this batch or a later one.
That filter makes Cosmos return a 412, which is fine, it just means this increment has already been applied there. The problem is that the connector treats the 412 as a fatal error and fails the entire bulk write rather than skipping the one document.
Describe the solution you'd like
A config option like spark.cosmos.write.patch.ignorePreconditionFailure, allowing us to silently ignore 412s when using ItemPatch, would completely solve this. A new write mode would work too, I'm not familiar with the internals here. The connector already ignores 412s for ItemOverwriteIfNotModified and ItemDeleteIfNotModified, so there's some precedent.
(I'm new to this data streaming area, so I apologize if something here is inaccurate or worded poorly!)
Describe alternatives you've considered
The counters being incremented are too costly to calculate the total each time we add a message to the Kafka topic, which is why we're only incrementing. None of the other write modes support incrementing. ItemBulkUpdate is the closest, but it explicitly only supports SET operations in the local "patch" step.
Additional context
N/A
Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report
Is your feature request related to a problem? Please describe.
We ingest data into Cosmos using Spark. A Spark job reads from a Kafka topic, does a small amount of aggregation and deduping, then patches documents in Cosmos using the
ItemPatchwrite strategy. Each Kafka message carries values to increment Cosmos fields by, not new totals to set.For idempotence in case Spark replays a batch, we stamp each updated document with the current batch ID (
last_batch_id) and use a patch filter (NOT IS_DEFINED(last_batch_id) OR last_batch_id < <batchId>) so a replay skips documents that were already updated by this batch or a later one.That filter makes Cosmos return a 412, which is fine, it just means this increment has already been applied there. The problem is that the connector treats the 412 as a fatal error and fails the entire bulk write rather than skipping the one document.
Describe the solution you'd like
A config option like
spark.cosmos.write.patch.ignorePreconditionFailure, allowing us to silently ignore 412s when usingItemPatch, would completely solve this. A new write mode would work too, I'm not familiar with the internals here. The connector already ignores 412s forItemOverwriteIfNotModifiedandItemDeleteIfNotModified, so there's some precedent.(I'm new to this data streaming area, so I apologize if something here is inaccurate or worded poorly!)
Describe alternatives you've considered
The counters being incremented are too costly to calculate the total each time we add a message to the Kafka topic, which is why we're only incrementing. None of the other write modes support incrementing.
ItemBulkUpdateis the closest, but it explicitly only supports SET operations in the local "patch" step.Additional context
N/A
Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report