Detect column name collisions after PostgreSQL's 63-char identifier truncation by dimitri · Pull Request #1749 · dimitri/pgloader

dimitri · 2026-06-28T00:24:15Z

Problem (fixes #353)

PostgreSQL limits identifier names to NAMEDATALEN-1 = 63 bytes. When two columns in the same table share the same first 63 characters they would both map to the same pg_attribute entry. Depending on timing this causes:

CREATE TABLE to fail with column "x" of relation "t" specified more than once, or
COPY to silently load data into the wrong column (if PostgreSQL truncates both names consistently and picks the first match).

pgloader never detected this case — it only forwarded the long names to PostgreSQL and let it fail opaquely. The original reporter (paracord55, 2016) could not reproduce the error; it has been confirmed again in 2026 with MSSQL → PostgreSQL migration.

The collision is per-table (PostgreSQL's uniqueness constraint for attname is (attrelid, attname)), not global — the same truncated name in two different tables is fine.

Fix

After the catalog has been fully transformed (cast rules applied, apply-identifier-case run, alter-table / alter-schema applied), but before any DDL or COPY is attempted:

Compute the effective PostgreSQL name for every column: strip outer double-quotes then take the first min(63, len) characters.
Group columns by effective name within each table.
Any group with more than one member is a collision.
Accumulate every instance across all tables, log them all at ERROR level (naming schema, table, the effective identifier, and the conflicting column names), then abort with a single fatal error that invites the user to rename the affected columns in the source database before migrating.

Changes

v3 (Common Lisp)

src/utils/catalog.lisp — pg-effective-name, check-catalog-identifier-collisions
src/package.lisp — export the new symbols
src/load/migrate-database.lisp — call the check in process-catalog right after (cast catalog)

v4 (Clojure)

clojure/src/pgloader/ddl/common.clj — pg-max-identifier-length (now public) + check-identifier-collisions
clojure/src/pgloader/core.clj — call the check after the full catalog transformation pipeline, before truncate / DDL
clojure/test/pgloader/ddl_test.clj — 7 new assertions (empty catalog, short names, exactly-63 names, no-collision with divergent prefixes, 2-column collision, 3-column collision, two-table collisions, no-cross-table-collision invariant)

Tests

test/sqlite/create-collision.py + test/sqlite/collision.db — SQLite database with a products table whose two long column names share the same first 63 characters
test/sqlite-collision.load — regression load file (in REGRESS)
test/mysql-collision.load — manual fixture for MySQL (not in REGRESS; requires a running MySQL server)
test/Makefile — sqlite-collision.load added to REGRESS with a special rule (LOAD DATABASE is incompatible with --regress)

MSSQL uses the same process-catalog / copy-database code path as MySQL and SQLite, so the fix covers MSSQL without a dedicated fixture (no SQL Server in local CI).

Example error output

LOG  Migrating from sqlite:///path/to/collision.db
LOG  Migrating into pgsql://localhost/pgloader
ERROR public.products: column name collision — "col_very_long..._aaax", "col_very_long..._aaay" all truncate to "col_very_long..._aaa"
FATAL Failed to process catalogs: 1 column name collision found in source catalog.

PostgreSQL limits identifier names to 63 bytes (NAMEDATALEN-1). The tables
listed above contain multiple columns whose names become identical after
truncation. This would cause CREATE TABLE to fail or COPY to load data into
the wrong column.

Please rename the affected columns in the source database before migrating.

…runcation (#353) PostgreSQL limits identifier names to NAMEDATALEN-1 = 63 bytes. When two columns in the same table share the same first 63 characters they would both truncate to the same pg_attribute entry, causing CREATE TABLE to fail with a duplicate-column error or COPY to silently load data into the wrong column. Changes ------- v3 (Common Lisp): - src/utils/catalog.lisp: add pg-effective-name and check-catalog-identifier-collisions. After cast() has applied apply-identifier-case to every column name, iterate over all tables and collect every (table, truncated-name) pair that maps to more than one column. - src/package.lisp: export the new symbols. - src/load/migrate-database.lisp: call check-catalog-identifier-collisions inside process-catalog, right after (cast catalog). All collisions are accumulated and logged at ERROR level before a single fatal condition is signalled, so users can fix every instance in one pass. v4 (Clojure): - clojure/src/pgloader/ddl/common.clj: make pg-max-identifier-length public and add check-identifier-collisions that mirrors the v3 logic. - clojure/src/pgloader/core.clj: call check-identifier-collisions after the full catalog transformation pipeline (apply-identifier-case, alter-schema, alter-table, …) and before any DDL or COPY is attempted. - clojure/test/pgloader/ddl_test.clj: seven new assertions covering: empty catalog, short names (no collision), exactly-63-char names, names that differ before 63 chars, two-column collision, three-column collision, cross-table collisions, and the no-cross-table-collision invariant. Tests: - test/sqlite/create-collision.py: generate a SQLite database with a products table that has two columns sharing the same first 63 characters. - test/sqlite/collision.db: pre-built binary (regenerate with the script). - test/sqlite-collision.load: LOAD DATABASE load file for the regression test. pgloader is expected to exit non-zero with a clear error message. - test/mysql-collision.load: manual test fixture for MySQL (not in REGRESS; requires a running MySQL server). - test/Makefile: add sqlite-collision.load to REGRESS and a special rule (LOAD DATABASE is incompatible with --regress). The error message names every colliding (table, effective-name, columns) triple and instructs users to rename the affected columns in the source database before migrating. Closes #353.

dimitri merged commit 9c66579 into main Jun 28, 2026
37 checks passed

dimitri deleted the fix/identifier-collision-detection branch June 28, 2026 11:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Detect column name collisions after PostgreSQL's 63-char identifier truncation#1749

Detect column name collisions after PostgreSQL's 63-char identifier truncation#1749
dimitri merged 1 commit into
mainfrom
fix/identifier-collision-detection

dimitri commented Jun 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

dimitri commented Jun 28, 2026

Problem (fixes #353)

Fix

Changes

v3 (Common Lisp)

v4 (Clojure)

Tests

Example error output

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant