feat: safe type expansion, NVARCHAR/NCHAR catalog fix, seed empty cell fix#702
Conversation
|
Reviewing now, big change with lots of potential for issues so may take a bit longer. |
|
When safe type expansion is enabled, some column changes that don't actually fit the existing values. For example, it will widen an int to numeric(10,5) or numeric(10,2) to numeric(12,5), but in those instances we can overflow the integer portion. We can't just check the overall precision and scale, we need to check that the integer section is wide enough. |
Benjamin-Knight
left a comment
There was a problem hiding this comment.
Think there is an issue with the expansion of integers around precision and scale but otherwise just that one shadowed function I don't understand the basis of.
…l fix - Add dbt_sqlserver_enable_safe_type_expansion flag for safe column type widening (varchar->nvarchar, integer family promotions, numeric precision/scale upgrades) - Add column_type_expansion_max_rows config (default 1,000,000 rows) - Add prefer_single_alter_column config for single ALTER COLUMN statement - Add string_type_instance() to preserve NVARCHAR/NCHAR type family - Fix catalog generation (user_type_id) so NVARCHAR/NCHAR no longer appear as SYSNAME - Fix is_numeric() to exclude money/smallmoney (now is_fixed_numeric()) - Fix seed table ingestion of empty numeric cells - Add tinyint/bit to is_integer() type list
Adds handling for SQL Server MAX string types during `expand_target_column_types()`. MAX columns (varchar(max)/nvarchar(max)) are now correctly emitted when expanding from bounded strings, and are protected from being inadvertently shrunk to bounded sizes. Also removes the unused `alter_column_type` adapter method.
Introduce `is_decimal_type()` to distinguish true arbitrary-precision decimal types from fixed-scale money types. Update type expansion logic to use the new method for precision/scale calculations while keeping `is_numeric()` inclusive of money types for backward compatibility.
Also clean up redundant test mocks and fix type hints for the column expansion method.
5874717 to
4871a6b
Compare
|
@Benjamin-Knight I’ve addressed the review comments and also added the max expansion and tests. Your approval would allow us to auto merge this. |
|
Heads-up from #716: I dropped my
|
|
@axellpadilla There are 2 open comments, otherwise this looks good to me. |
|
@joshmarkovic fixed 1, and yes, 2 is intentional as per oficial docs, bit is an integer, thanks |
Summary
This PR addresses several long-standing issues with SQL Server native type handling during column expansion, catalog generation, and seed ingestion, considered similar problems, reproduced and handled on one run.
Closes: #701, #637, #425, #446
Supersedes: #606, thanks @Cogito
What Changed
1. SQL Server Native String Type Recognition (
sqlserver_column.py)is_string()now includesnvarcharandncharin addition tovarcharandcharstring_type_instance()— new instance method that preserves the original type family:nvarchar(n)emitsnvarchar(n)(notvarchar(n))nchar(n)emitsnchar(n)(notchar(n))varchar(n)/char(n)for non-Unicode typesdata_typeproperty now usesstring_type_instance()instead ofstring_type()is_numeric()— includes is_fixed_numeric and is_decimal_numericis_fixed_numeric()— new method formoney/smallmoneyis_decimal_numeric()— new method fordecimal/numericis_integer()now includestinyintandbitcan_expand_to()— stricter: only allows same-family string size increases (e.g.,varchar(10)→varchar(25))can_expand_safe()— new method for flag-gated safe expansions:max— string can now be expanded to maxvarchar(n)nvarchar(m)wherem >= nchar(n)nchar(m)wherem >= nbit→tinyint→smallint→int→bigintintnumeric(p,s)wherep >= 10numeric(p,s)numeric(p2,s2)wherep2 >= pands2 >= ssmallmoneymoneymoneynumeric(p,s)wherep >= 192. Safe Type Expansion Feature Flag (
sqlserver_adapter.py)New
dbt_sqlserver_enable_safe_type_expansionbehaviour flag (default:false):When enabled, the adapter's
expand_column_types()override performs:varchar(10)→varchar(25))column_type_expansion_max_rowsis not exceeded:varchar/char→nvarchar/ncharnumericwith sufficient precisionnumeric/decimalprecision/scale upgradessmallmoney→money→numeric)expand_target_column_types()— new public API that forwards themax_rowsparameter, called from incremental and snapshot materializations.alter_column_type()— new method that dispatches to thesqlserver__alter_column_typemacro, replacing the base adapter's implementation.3. Row-Count Guardrail (
column_type_expansion_max_rows)New per-model config (default:
1,000,000):{{ config(materialized='incremental', unique_key='id', column_type_expansion_max_rows=500000) }}-1to disable the check0to always skip safe expansion (only same-family string resizes proceed)4. Single ALTER COLUMN Mode (
prefer_single_alter_column)New per-model config (default:
false):{{ config(materialized='incremental', unique_key='id', prefer_single_alter_column=true) }}When
true, thesqlserver__alter_column_typemacro uses a singleALTER COLUMNstatement instead of the safer add+update+drop+rename pattern. This is faster for small/medium tables and instant for safe type expansions, but may fail for types that cannot be implicitly converted.5. Catalog Fix (
catalog.sql)Changed
sys.typesjoin fromsystem_type_idtouser_type_idin both catalog queries. This preventsNVARCHAR/NCHARcolumns from appearing asSYSNAMEindbt docs generateoutput. Fixes #637.6. Seed Empty Cell Fix (
helpers.sql)Changed seed CSV ingestion to inline
NULLliterals instead of binding empty cells as SQL parameters. Previously, an empty cell in anumeric(18,0)column would be bound as an empty string parameter, causing arithmetic overflow error 8115. Now empty cells emitnulldirectly in the VALUES clause. Fixes #425.7. Adapter Configs (
sqlserver_configs.py)Added two new optional config fields:
prefer_single_alter_column: Optional[bool] = Falsecolumn_type_expansion_max_rows: Optional[int] = None8. Unit Tests
test_sqlserver_column.py— Tests foris_string(),string_type_instance(),data_type,is_fixed_numeric(),is_numeric(),string_size()across all string/numeric type familiestest_can_expand_to.py— Parameterized tests forcan_expand_to()andcan_expand_safe()covering same-family resizes, cross-family promotions, integer family promotions, numeric precision/scale upgrades, fixed-money promotions, and prevented shrinking conversionstest_expand_column_types.py— Tests for the adapter'sexpand_column_types()method: row-count skip, max-rows=0 blocking, warning emission,max_rowsforwarding throughexpand_target_column_types()9. Functional Tests
test_column_type_expansion.py— Full integration tests for:on_schema_change='append_new_columns'on_schema_change='sync_all_columns'varchar→nvarcharblocked without flagvarchar→nvarcharworks with flagtest_catalog.py— Tests thatdbt docs generatedoes not returnSYSNAMEforNVARCHAR/NCHARcolumns (fixes Dbt Docs SYSNAME instead of NVARCHAR datatype #637)test_mssql_seed.py— Tests seeding anumeric(18,0)column with empty rows succeeds (fixes "Error 8115 Arithmetic overflow error converting nvarchar to data type numeric" when seeding a numeric column with some empty rows and specifying column_types #425)Note Changes / Migration Notes
is_number()(covers all numeric types including money)is_fixed_numeric()for money types specificallyis_decimal_numeric()only fornumeric/decimaltypesRelated PRs & History