Skip to content

fix(sqlite): report pkey and column context on UTF-8 decoding error#1747

Merged
dimitri merged 2 commits into
mainfrom
fix/sqlite-decoding-error-pkey-context
Jun 27, 2026
Merged

fix(sqlite): report pkey and column context on UTF-8 decoding error#1747
dimitri merged 2 commits into
mainfrom
fix/sqlite-decoding-error-pkey-context

Conversation

@dimitri

@dimitri dimitri commented Jun 27, 2026

Copy link
Copy Markdown
Owner

Problem

When a SQLite TEXT column contains bytes that are not valid UTF-8, pgloader emitted a terse error and aborted the entire table load:

ERROR Illegal :UTF-8 character starting at position 34.

No table name, no row identifier, no column name — making it very difficult for users to find the offending record (see #1250).

Fix

Catches babel-encodings:character-decoding-error per column (inside the inner column-reading loop) rather than via the outer table-level handler. When the error fires the message now includes:

  • table name
  • encoding name and byte position (matching MySQL/MSSQL style)
  • primary-key value of the offending row — when the PK column was already read before the failing column, which is the common case (id in col 0, text in col 1)
  • column name

The bad column is substituted with NULL so the row is still inserted and the rest of the table continues loading — the same behaviour the MySQL and MSSQL sources provide via their use-nil restarts.

Why handler-case instead of handler-bind + restart?

Babel defines no restarts for character-decoding-error (unlike qmynd/mssql which expose a use-nil restart). handler-case at the column-read site gives the same outcome (substitute nil, continue) without requiring a restart.

Sample error output (after fix)

ERROR While decoding text from SQLite table files:
Illegal UTF-8 character at byte position 4, pkey 2, column filename.

Test artifacts

  • test/sqlite/bad-utf8.db — SQLite database with one row containing a Windows-1252 em-dash (0x96) stored in a TEXT column; rows 1 and 3 are valid UTF-8
  • test/sqlite/create-bad-utf8.py — script to regenerate that database
  • test/sqlite-bad-utf8.load — pgloader load file for manual verification

The SQLite LOAD DATABASE command cannot be wired into the standard --regress framework (which expects a single ?tablename in the URI), so this is provided as a manual test artifact rather than a REGRESS entry.

Closes #1250

dimitri added 2 commits June 27, 2026 16:26
…1250)

When a SQLite TEXT column contains bytes that are not valid UTF-8,
pgloader now catches the babel decoding error per-column instead of
aborting the entire table load.  The error message includes:

  - table name
  - the invalid encoding name and byte position (matching MySQL style)
  - primary-key value of the offending row (when the PK column was
    already read before the failing column)
  - column name

The bad column is substituted with NULL so the row is still inserted
and loading continues with the next row — the same behaviour the MySQL
and MSSQL sources have via their qmynd/mssql use-nil restarts.

Since babel defines no restarts for character-decoding-error (unlike
qmynd which exposes use-nil), the fix uses handler-case at the
column-read site rather than handler-bind + invoke-restart.

Added:
  test/sqlite/bad-utf8.db         — pre-built SQLite DB with one row
                                     containing a Windows-1252 em-dash
                                     (0x96) stored as BLOB in a TEXT column
  test/sqlite/create-bad-utf8.py  — script to regenerate that database
  test/sqlite-bad-utf8.load       — pgloader load file for manual testing

Closes #1250
…error

The dead-code 'with row-num = 0' / 'do (incf row-num)' pair violated
CL loop grammar: a do-clause cannot precede a for-variable clause.
SBCL 2.1.11 (used in the debian-build CI job) rejects this ordering
at compile time, causing all CI jobs to fail.

Remove both lines entirely; row-num was never referenced anywhere.
@dimitri dimitri merged commit 887125e into main Jun 27, 2026
37 checks passed
@dimitri dimitri deleted the fix/sqlite-decoding-error-pkey-context branch June 27, 2026 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ERROR Illegal :UTF-8 character needs more reporting, please

1 participant