Fix premature COMMIT in stream-rows-to-copy under on error stop#1748
Merged
Conversation
When a data error occurs inside stream-rows-to-copy, the outer handler-case handler calls ROLLBACK. In Common Lisp, handler-case handlers run *after* the unwind-protect cleanup forms have finished, so the original code sequenced: 1. COMMIT (inside unwind-protect cleanup) 2. ROLLBACK (inside handler-case handler — now a no-op) This meant any rows already streamed to PostgreSQL were committed even when the load failed, and the ROLLBACK was silently ignored with a 'there is no transaction in progress' warning. Under on error stop with concurrency > 1 the subsequent inconsistent connection state could also cause writer threads to hang (GitHub issue #1622, comment by fluca1978 on 2026-06-22). Fix: move the COMMIT after the unwind-protect form. It is now only reached when the row loop and close-db-writer both complete without signalling. When either signals, unwind-protect closes the COPY writer (cl-postgres with-syncing reads until ReadyForQuery, leaving the connection in SQL mode), and then the handler-case handler correctly finds a clean connection on which ROLLBACK works. Also guard the incf of bytes/seconds with (when row-bytes ...) so that nil returned by stream-row for a filtered row does not raise a TYPE-ERROR that was itself masking the COMMIT sequencing bug. Add regression test: test/sqlite-on-error-stop.load uses a SQLite database (test/sqlite/type-mismatch.db) that has a TEXT value in an INTEGER column. pgloader must exit cleanly without hanging. Fixes: #1622
The WITH on error stop / on error resume next options were already parsed by the grammar and surfaced as :on-error-stop in with-options by ast.clj, but run-command in core.clj never read them. Only the --on-error-stop CLI flag set copy/*on-error-stop*. Fix: extract :on-error-stop from with-options and add it to the binding block alongside batch-rows, batch-size, etc. Use (some? v) instead of (or v ...) since the option is boolean: a false value from a future explicit on-error-resume path must not be ignored. Add parser tests verifying that: - WITH on error stop → :on-error-stop true in with-options - WITH on error resume next → :on-error-stop absent from with-options
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #1622 (specifically the hang scenario reported in the 2026-06-22 comment by
fluca1978).Root cause
In
stream-rows-to-copy(used whenon error stopis set), the original code placedCOMMITinside theunwind-protectcleanup alongsideclose-db-writer:In Common Lisp,
handler-casehandlers run after the dynamic extent of the protected form — which means afterunwind-protectcleanup forms have completed. So when a data error occurred:handler-casehandler (no-op / warning: no transaction in progress)This left rows partially committed on error and put the connection into an inconsistent state. Under
on error stopwithconcurrency > 1, the inconsistent connection state could cause the remaining writer threads to hang indefinitely (PG sessions stuck inClientReadonCOPY … FROM STDIN).Fix
Move
(pomo:execute "COMMIT")to after theunwind-protectso it only executes on the success path (loop andclose-db-writerboth complete without signalling):When either the loop or
close-db-writersignals, thehandler-casehandler now finds a clean SQL-mode connection (cl-postgres'swith-syncinginclose-db-writerreads untilReadyForQuerybefore re-signalling) on whichROLLBACKworks correctly.A secondary fix guards
(incf bytes row-bytes)with(when row-bytes …)so that anilreturned bystream-rowfor a filtered row does not raise aTYPE-ERRORthat was itself masking theCOMMITsequencing issue.Test
Added
test/sqlite-on-error-stop.loadwhich migratestest/sqlite/type-mismatch.db— a SQLite database whoseproductstable has aTEXTvalue ('lots-of-it') stored in anINTEGERcolumn. pgloader must exit cleanly without hanging. The test database is regenerated bytest/sqlite/create-type-mismatch.py.