Skip to content

Detect column name collisions after PostgreSQL's 63-char identifier truncation#1749

Merged
dimitri merged 1 commit into
mainfrom
fix/identifier-collision-detection
Jun 28, 2026
Merged

Detect column name collisions after PostgreSQL's 63-char identifier truncation#1749
dimitri merged 1 commit into
mainfrom
fix/identifier-collision-detection

Conversation

@dimitri

@dimitri dimitri commented Jun 28, 2026

Copy link
Copy Markdown
Owner

Problem (fixes #353)

PostgreSQL limits identifier names to NAMEDATALEN-1 = 63 bytes. When two columns in the same table share the same first 63 characters they would both map to the same pg_attribute entry. Depending on timing this causes:

  • CREATE TABLE to fail with column "x" of relation "t" specified more than once, or
  • COPY to silently load data into the wrong column (if PostgreSQL truncates both names consistently and picks the first match).

pgloader never detected this case — it only forwarded the long names to PostgreSQL and let it fail opaquely. The original reporter (paracord55, 2016) could not reproduce the error; it has been confirmed again in 2026 with MSSQL → PostgreSQL migration.

The collision is per-table (PostgreSQL's uniqueness constraint for attname is (attrelid, attname)), not global — the same truncated name in two different tables is fine.


Fix

After the catalog has been fully transformed (cast rules applied, apply-identifier-case run, alter-table / alter-schema applied), but before any DDL or COPY is attempted:

  1. Compute the effective PostgreSQL name for every column: strip outer double-quotes then take the first min(63, len) characters.
  2. Group columns by effective name within each table.
  3. Any group with more than one member is a collision.
  4. Accumulate every instance across all tables, log them all at ERROR level (naming schema, table, the effective identifier, and the conflicting column names), then abort with a single fatal error that invites the user to rename the affected columns in the source database before migrating.

Changes

v3 (Common Lisp)

  • src/utils/catalog.lisppg-effective-name, check-catalog-identifier-collisions
  • src/package.lisp — export the new symbols
  • src/load/migrate-database.lisp — call the check in process-catalog right after (cast catalog)

v4 (Clojure)

  • clojure/src/pgloader/ddl/common.cljpg-max-identifier-length (now public) + check-identifier-collisions
  • clojure/src/pgloader/core.clj — call the check after the full catalog transformation pipeline, before truncate / DDL
  • clojure/test/pgloader/ddl_test.clj — 7 new assertions (empty catalog, short names, exactly-63 names, no-collision with divergent prefixes, 2-column collision, 3-column collision, two-table collisions, no-cross-table-collision invariant)

Tests

  • test/sqlite/create-collision.py + test/sqlite/collision.db — SQLite database with a products table whose two long column names share the same first 63 characters
  • test/sqlite-collision.load — regression load file (in REGRESS)
  • test/mysql-collision.load — manual fixture for MySQL (not in REGRESS; requires a running MySQL server)
  • test/Makefilesqlite-collision.load added to REGRESS with a special rule (LOAD DATABASE is incompatible with --regress)

MSSQL uses the same process-catalog / copy-database code path as MySQL and SQLite, so the fix covers MSSQL without a dedicated fixture (no SQL Server in local CI).


Example error output

LOG  Migrating from sqlite:///path/to/collision.db
LOG  Migrating into pgsql://localhost/pgloader
ERROR public.products: column name collision — "col_very_long..._aaax", "col_very_long..._aaay" all truncate to "col_very_long..._aaa"
FATAL Failed to process catalogs: 1 column name collision found in source catalog.

PostgreSQL limits identifier names to 63 bytes (NAMEDATALEN-1). The tables
listed above contain multiple columns whose names become identical after
truncation. This would cause CREATE TABLE to fail or COPY to load data into
the wrong column.

Please rename the affected columns in the source database before migrating.

…runcation (#353)

PostgreSQL limits identifier names to NAMEDATALEN-1 = 63 bytes.  When two
columns in the same table share the same first 63 characters they would both
truncate to the same pg_attribute entry, causing CREATE TABLE to fail with a
duplicate-column error or COPY to silently load data into the wrong column.

Changes
-------

v3 (Common Lisp):
- src/utils/catalog.lisp: add pg-effective-name and
  check-catalog-identifier-collisions.  After cast() has applied
  apply-identifier-case to every column name, iterate over all tables and
  collect every (table, truncated-name) pair that maps to more than one
  column.
- src/package.lisp: export the new symbols.
- src/load/migrate-database.lisp: call check-catalog-identifier-collisions
  inside process-catalog, right after (cast catalog).  All collisions are
  accumulated and logged at ERROR level before a single fatal condition is
  signalled, so users can fix every instance in one pass.

v4 (Clojure):
- clojure/src/pgloader/ddl/common.clj: make pg-max-identifier-length public
  and add check-identifier-collisions that mirrors the v3 logic.
- clojure/src/pgloader/core.clj: call check-identifier-collisions after the
  full catalog transformation pipeline (apply-identifier-case, alter-schema,
  alter-table, …) and before any DDL or COPY is attempted.
- clojure/test/pgloader/ddl_test.clj: seven new assertions covering: empty
  catalog, short names (no collision), exactly-63-char names, names that
  differ before 63 chars, two-column collision, three-column collision,
  cross-table collisions, and the no-cross-table-collision invariant.

Tests:
- test/sqlite/create-collision.py: generate a SQLite database with a
  products table that has two columns sharing the same first 63 characters.
- test/sqlite/collision.db: pre-built binary (regenerate with the script).
- test/sqlite-collision.load: LOAD DATABASE load file for the regression
  test.  pgloader is expected to exit non-zero with a clear error message.
- test/mysql-collision.load: manual test fixture for MySQL (not in REGRESS;
  requires a running MySQL server).
- test/Makefile: add sqlite-collision.load to REGRESS and a special rule
  (LOAD DATABASE is incompatible with --regress).

The error message names every colliding (table, effective-name, columns)
triple and instructs users to rename the affected columns in the source
database before migrating.

Closes #353.
@dimitri dimitri merged commit 9c66579 into main Jun 28, 2026
37 checks passed
@dimitri dimitri deleted the fix/identifier-collision-detection branch June 28, 2026 11:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Truncated Column-Names

1 participant