Skip to content

resumable: clarify handling of conflicts and retry policies#3319

Open
danielresnick wants to merge 7 commits intohttpwg:mainfrom
danielresnick:patch-1
Open

resumable: clarify handling of conflicts and retry policies#3319
danielresnick wants to merge 7 commits intohttpwg:mainfrom
danielresnick:patch-1

Conversation

@danielresnick
Copy link

See #3275 for discussion.

Copy link
Contributor

@guoye-zhang guoye-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this PR, the text looks good. Left a few minor comments.

- For a `5xx (Server Error)` status code, the client MAY automatically attempt upload resumption by retrieving the current offset ({{offset-retrieving}}).

If no final response was received at all due to connectivity issues, the client MAY automatically attempt upload resumption by retrieving the current offset ({{offset-retrieving}}).
- For a `409 (Conflict)` status code, the client SHOULD attempt to resume the upload by using the offset from the `Upload-Offset` response header field.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to keep this flow the same as 5xx: recommending doing an offset retrieval before resuming. Otherwise this creates a new code path, and the text in "Upload Append" about picking the offset is also inconsistent with the recommendation here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, how does this look? It feels wrong to completely omit the option to leverage the Upload-Offset from the response since it's called out as a MUST for the server to return it just below

3. **Optimistic Concurrency Control**: By maintaining the upload's state in a strongly consistent datastore, the server can atomically check if the `Upload-Offset` in a request matches the resource's current state. If they do not match, the server can reject the request with a `409 (Conflict)` status code.

Since implementing this approach is not always technically possible or feasible, other measures can be considered as well. A simpler approach is that the server only processes a new request to retrieve the offset ({{offset-retrieving}}), append representation data ({{upload-appending}}), or cancellation ({{upload-cancellation}}) once all previous requests have been processed. This effectively implements exclusive access to the upload resource through an access lock. However, since network interruptions can occur in ways that cause the request to hang from the server's perspective, it might take the server significant time to realize the interruption and time out the request. During this period, the client will be unable to access the resource and resume the upload, causing friction for the end users. Therefore, the recommended approach is to terminate previous requests to enable quick resumption of uploads.
Servers SHOULD choose a strategy that best fits their architecture while fulfilling the requirements of this section. Regardless of the chosen strategy, clients MUST be prepared to handle a `409 (Conflict)` response as a recoverable error, as described in {{upload-appending}}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All retry behaviors are MAY/SHOULD, do we need a MUST here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed to be consistent (SHOULD)

@guoye-zhang guoye-zhang requested a review from Acconut October 29, 2025 16:19
- Addressed feedback around inconsistencies (HAVE vs MUST).

- Removed example 409 conflict recovery.
danielresnick added a commit to danielresnick/http-extensions that referenced this pull request Oct 30, 2025

2. **Pessimistic Locking**: The server processes requests for a given upload resource sequentially, effectively creating an exclusive lock on that resource. A new request is only processed after the previous one completes. This can be simpler to implement but may lead to delays if a request hangs from the server's perspective.

3. **Optimistic Concurrency Control**: By maintaining the upload's state in a strongly consistent datastore, the server can atomically check if the `Upload-Offset` in a request matches the resource's current state. If they do not match, the server can reject the request with a `409 (Conflict)` status code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is enough to guarantee issue-free uploads. What if the server receives two requests for the same (correct) offset? They both match, but which content is appended to the upload resource? Did you mean to say that the content should be appended in one transaction together with the offset check? But that would make this the same as the pessimistic locking, I think.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Acconut here. To me approach 2 and 3 seem very similar in how they would work in practice. The server would receive two requests with the same offset (why though, really?) and one is slightly faster than the other i.e. it will get the exclusive lock and the thus the second request would fail. I think the wording of point 2 hints that the requests are queued and processed after each other which I guess is not the case (the offset would most likely never match for the second request)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a go at rewording this to be clearer as it was both too vague and overly specific in different areas - is this any better? The key distinction between 2 & 3 is that servers don't need to acquire an exclusive lock for the resource which means they're able to achieve higher availability during network partitions

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes much better, thanks!


2. **Pessimistic Locking**: The server processes requests for a given upload resource sequentially, effectively creating an exclusive lock on that resource. A new request is only processed after the previous one completes. This can be simpler to implement but may lead to delays if a request hangs from the server's perspective.

3. **Optimistic Concurrency Control**: The server detects concurrent modifications by atomically checking the resource's state before applying changes. If a conflict is detected, the server rejects the request with a 409 (Conflict) status code, ensuring that only one of multiple parallel requests can succeed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering how this would be achieved in practice on the server-side. Does the server read the entire request content first and then tries to apply it in one atomic operation that checks the offset and appends the content? The offset would then be validated after receiving the request content, not when the headers are sent. Not a problem, but a different angle to the task.

Small suggestion:

Suggested change
3. **Optimistic Concurrency Control**: The server detects concurrent modifications by atomically checking the resource's state before applying changes. If a conflict is detected, the server rejects the request with a 409 (Conflict) status code, ensuring that only one of multiple parallel requests can succeed.
3. **Optimistic Concurrency Control**: The server handles concurrent modifications by checking the resource's state and applying changes together in one atomic operation. If a conflict is detected, the server rejects the request with a 409 (Conflict) status code, ensuring that only one of multiple parallel requests can succeed. Upload cancellation is implemented through a similar atomic operation.

3. **Optimistic Concurrency Control**: The server detects concurrent modifications by atomically checking the resource's state before applying changes. If a conflict is detected, the server rejects the request with a 409 (Conflict) status code, ensuring that only one of multiple parallel requests can succeed.

Since implementing this approach is not always technically possible or feasible, other measures can be considered as well. A simpler approach is that the server only processes a new request to retrieve the offset ({{offset-retrieving}}), append representation data ({{upload-appending}}), or cancellation ({{upload-cancellation}}) once all previous requests have been processed. This effectively implements exclusive access to the upload resource through an access lock. However, since network interruptions can occur in ways that cause the request to hang from the server's perspective, it might take the server significant time to realize the interruption and time out the request. During this period, the client will be unable to access the resource and resume the upload, causing friction for the end users. Therefore, the recommended approach is to terminate previous requests to enable quick resumption of uploads.
Servers SHOULD choose a strategy that best fits their architecture while fulfilling the requirements of this section. Regardless of the chosen strategy, clients SHOULD be prepared to handle a `409 (Conflict)` response as a recoverable error, as described in {{upload-appending}}.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this paragraph isn't necessary as it restates normative language that appeared previously in the section without adding new information.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point, removed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try to push changes to this PR? I'm not seeing any yet :)

The RECOMMENDED approach is as follows: If an upload resource receives a new request to retrieve the offset ({{offset-retrieving}}), append representation data ({{upload-appending}}), or cancel the upload ({{upload-cancellation}}) while a previous request for creating the upload ({{upload-creation}}) or appending representation data ({{upload-appending}}) is still ongoing, the resource SHOULD prevent race conditions, data loss, and corruption by terminating the previous request before processing the new request. Due to network delay and reordering, the resource might still be receiving representation data from an ongoing transfer for the same upload resource, which in the client's perspective has failed. Since the client is not allowed to perform multiple transfers in parallel, the upload resource can assume that the previous attempt has already failed. Therefore, the server MAY abruptly terminate the previous HTTP connection or stream.
To meet these requirements, a server can use various strategies. Three common approaches are:

1. **Preemptive Cancellation**: If the server receives a new request for an upload resource while a previous request for the same resource is in-flight, it abruptly terminates the previous HTTP connection or stream before processing the new request.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep more context from the original wording that explains why this works:

Since the client is not allowed to perform multiple transfers in parallel, the upload resource can assume that the previous attempt has already failed. Therefore, the server MAY abruptly terminate the previous HTTP connection or stream.


1. **Preemptive Cancellation**: If the server receives a new request for an upload resource while a previous request for the same resource is in-flight, it abruptly terminates the previous HTTP connection or stream before processing the new request.

2. **Pessimistic Locking**: The server processes requests for a given upload resource sequentially, effectively creating an exclusive lock on that resource. A new request is only processed after the previous one completes. This can be simpler to implement but may lead to delays if a request hangs from the server's perspective.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, the text would benefit from emphasizing the downsides of this (simple) approach:

However, since network interruptions can occur in ways that cause the request to hang from the server's perspective, it might take the server significant time to realize the interruption and time out the request. During this period, the client will be unable to access the resource and resume the upload, causing friction for the end users.

- For a `5xx (Server Error)` status code, the client MAY automatically attempt upload resumption by retrieving the current offset ({{offset-retrieving}}).

If no final response was received at all due to connectivity issues, the client MAY automatically attempt upload resumption by retrieving the current offset ({{offset-retrieving}}).
- For a `409 (Conflict)` status code, the client SHOULD attempt to resume the upload by either retrieving the current offset ({{offset-retrieving}}) or using the value from the response's `Upload-Offset` header field.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my concern here: #3275 (comment)

But I think that's an underlying design concern rather than a concern with your specific change towards that design, so feel free to resolve this as off-topic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's sufficient to tie the retries to the presence of Upload-Complete: ?0 in this case. It would indicate that the 409 comes from the resumable upload layer.

Suggested change
- For a `409 (Conflict)` status code, the client SHOULD attempt to resume the upload by either retrieving the current offset ({{offset-retrieving}}) or using the value from the response's `Upload-Offset` header field.
- For a `409 (Conflict)` status code with the `Upload-Complete: ?0` header field, the client SHOULD attempt to resume the upload by either retrieving the current offset ({{offset-retrieving}}) or using the value from the response's `Upload-Offset` header field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

5 participants