Detect column name collisions after PostgreSQL's 63-char identifier truncation#1749
Merged
Conversation
…runcation (#353) PostgreSQL limits identifier names to NAMEDATALEN-1 = 63 bytes. When two columns in the same table share the same first 63 characters they would both truncate to the same pg_attribute entry, causing CREATE TABLE to fail with a duplicate-column error or COPY to silently load data into the wrong column. Changes ------- v3 (Common Lisp): - src/utils/catalog.lisp: add pg-effective-name and check-catalog-identifier-collisions. After cast() has applied apply-identifier-case to every column name, iterate over all tables and collect every (table, truncated-name) pair that maps to more than one column. - src/package.lisp: export the new symbols. - src/load/migrate-database.lisp: call check-catalog-identifier-collisions inside process-catalog, right after (cast catalog). All collisions are accumulated and logged at ERROR level before a single fatal condition is signalled, so users can fix every instance in one pass. v4 (Clojure): - clojure/src/pgloader/ddl/common.clj: make pg-max-identifier-length public and add check-identifier-collisions that mirrors the v3 logic. - clojure/src/pgloader/core.clj: call check-identifier-collisions after the full catalog transformation pipeline (apply-identifier-case, alter-schema, alter-table, …) and before any DDL or COPY is attempted. - clojure/test/pgloader/ddl_test.clj: seven new assertions covering: empty catalog, short names (no collision), exactly-63-char names, names that differ before 63 chars, two-column collision, three-column collision, cross-table collisions, and the no-cross-table-collision invariant. Tests: - test/sqlite/create-collision.py: generate a SQLite database with a products table that has two columns sharing the same first 63 characters. - test/sqlite/collision.db: pre-built binary (regenerate with the script). - test/sqlite-collision.load: LOAD DATABASE load file for the regression test. pgloader is expected to exit non-zero with a clear error message. - test/mysql-collision.load: manual test fixture for MySQL (not in REGRESS; requires a running MySQL server). - test/Makefile: add sqlite-collision.load to REGRESS and a special rule (LOAD DATABASE is incompatible with --regress). The error message names every colliding (table, effective-name, columns) triple and instructs users to rename the affected columns in the source database before migrating. Closes #353.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem (fixes #353)
PostgreSQL limits identifier names to NAMEDATALEN-1 = 63 bytes. When two columns in the same table share the same first 63 characters they would both map to the same
pg_attributeentry. Depending on timing this causes:CREATE TABLEto fail with column "x" of relation "t" specified more than once, orCOPYto silently load data into the wrong column (if PostgreSQL truncates both names consistently and picks the first match).pgloader never detected this case — it only forwarded the long names to PostgreSQL and let it fail opaquely. The original reporter (
paracord55, 2016) could not reproduce the error; it has been confirmed again in 2026 with MSSQL → PostgreSQL migration.The collision is per-table (PostgreSQL's uniqueness constraint for
attnameis(attrelid, attname)), not global — the same truncated name in two different tables is fine.Fix
After the catalog has been fully transformed (cast rules applied,
apply-identifier-caserun, alter-table / alter-schema applied), but before any DDL or COPY is attempted:Changes
v3 (Common Lisp)
src/utils/catalog.lisp—pg-effective-name,check-catalog-identifier-collisionssrc/package.lisp— export the new symbolssrc/load/migrate-database.lisp— call the check inprocess-catalogright after(cast catalog)v4 (Clojure)
clojure/src/pgloader/ddl/common.clj—pg-max-identifier-length(now public) +check-identifier-collisionsclojure/src/pgloader/core.clj— call the check after the full catalog transformation pipeline, before truncate / DDLclojure/test/pgloader/ddl_test.clj— 7 new assertions (empty catalog, short names, exactly-63 names, no-collision with divergent prefixes, 2-column collision, 3-column collision, two-table collisions, no-cross-table-collision invariant)Tests
test/sqlite/create-collision.py+test/sqlite/collision.db— SQLite database with aproductstable whose two long column names share the same first 63 characterstest/sqlite-collision.load— regression load file (inREGRESS)test/mysql-collision.load— manual fixture for MySQL (not inREGRESS; requires a running MySQL server)test/Makefile—sqlite-collision.loadadded toREGRESSwith a special rule (LOAD DATABASE is incompatible with--regress)MSSQL uses the same
process-catalog/copy-databasecode path as MySQL and SQLite, so the fix covers MSSQL without a dedicated fixture (no SQL Server in local CI).Example error output