Skip to content

Column Filtering for Change Retention#2110

Open
leejustin wants to merge 10 commits intosequinstream:mainfrom
leejustin:jlee/02-03-feat_column_filtering
Open

Column Filtering for Change Retention#2110
leejustin wants to merge 10 commits intosequinstream:mainfrom
leejustin:jlee/02-03-feat_column_filtering

Conversation

@leejustin
Copy link

@leejustin leejustin commented Feb 4, 2026

Table Column Filtering

Adds column filtering capabilities to change retention pipelines, allowing users to exclude sensitive columns (like user data fields or large metadata columns) or include only specific columns they need. This provides fine-grained control over which column data is captured and stored in change retention tables.

Data schema changes

  • Migration: Adds include_column_attnums and exclude_column_attnums array fields to the source_tables JSONB column in wal_pipelines. Existing pipelines are migrated with NULL values for these fields.
  • SourceTable schema: Extends the SourceTable embedded schema with two new optional array fields storing column attribute numbers. The changeset validates that these fields are mutually exclusive—only one can be set at a time.
  • Storage: Column selection is stored as arrays of PostgreSQL column attribute numbers (attnum), which are resolved from column names during YAML parsing and UI configuration.

UI changes

  • ColumnSelectionForm component: New Svelte component that provides an interactive interface for column selection with three modes:
    • All columns: No filtering (default)
    • Exclude mode: Select columns to exclude from sync
    • Include mode: Select columns to include in sync
  • Form integration: The component is integrated into the WAL pipeline form (Form.svelte), appearing after table selection. It displays available columns with checkboxes, automatically handles primary key constraints (PKs cannot be excluded and are always included), and shows selected columns as removable tags.
  • Display: The pipeline show page displays configured column selection settings, showing excluded or included column names resolved from attribute numbers.

Setting Column Filtering at Change Retention Creation

Screenshot 2026-02-03 at 4 22 46 PM

Note that the PK is disabled and cannot be included/excluded

Viewing Change Retention Details

Screenshot 2026-02-03 at 4 22 30 PM

The details show the selected columns if filtering is enabled

Editing Change Retention

Screenshot 2026-02-03 at 4 22 19 PM

The user can modify the configuration to update the filtering. There is a note in this field indicating that the changes will not backfill historic data for those columns in those rows, which is out of scope.

Implementation details

  • Filtering logic: New ColumnSelection module handles filtering at multiple points:
    • During WAL event creation: Filters fields in record and changes payloads
    • During backfills: Filters column attribute numbers when reading table data
    • In message transformation: Filters both old and new fields for update operations
  • Integration points: Column filtering is applied in MessageHandler.wal_event/2 and Consumers.message_record/2/message_changes/2, ensuring filtered columns are excluded from all downstream consumers and sinks.
  • YAML support: The YAML loader parses exclude_columns and include_columns (column name lists) and converts them to attribute numbers, with validation to ensure primary keys are never excluded and that both options aren't specified simultaneously.

Configuration

Configure column selection via:

  • Web console: Use the column selection form in the WAL pipeline configuration UI
  • YAML: Specify exclude_columns or include_columns arrays in change_retentions source table configuration

Primary key columns are automatically protected—they cannot be excluded and are always included, even when using include mode.

Tests

Added a few tests and ran mix test:

Finished in 46.9 seconds (17.0s async, 29.8s sync)
4 doctests, 1342 tests, 0 failures, 14 skipped

Add support for excluding or including specific columns in WAL pipeline
CDC events. This allows users to filter out sensitive columns (like
passwords, SSNs) or include only specific columns they need.

- Add include_column_attnums and exclude_column_attnums fields to
  SourceTable schema with mutual exclusivity validation
- Create ColumnSelection module with filtering logic
- Integrate column filtering into message_record and message_changes
- Add database migration to update existing WAL pipelines
- Add comprehensive tests for column filtering functionality
- Add ColumnSelectionForm Svelte component for UI column selection
- Support include/exclude column filtering in WAL pipeline configuration
- Update backend to handle column selection (includeColumnAttnums/excludeColumnAttnums)
- Add column selection support in YAML loader and transforms
- Update WAL pipeline form and show pages to display column selection
- Add comprehensive tests for column selection functionality
- Update documentation for change retention with column filtering details

This allows users to selectively include or exclude specific columns
when setting up change retention pipelines, providing fine-grained
control over which column data is replicated.
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. enhancement New feature or request labels Feb 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant