Skip to content

feat: Raise on duplicated aliases#291

Open
gab23r wants to merge 1 commit intoQuantco:mainfrom
gab23r:raise-duplicated-alias
Open

feat: Raise on duplicated aliases#291
gab23r wants to merge 1 commit intoQuantco:mainfrom
gab23r:raise-duplicated-alias

Conversation

@gab23r
Copy link
Contributor

@gab23r gab23r commented Mar 3, 2026

Raises an ImplementationError when a schema contains columns with duplicate aliases.

Motivation

I was using LLM to generate dataframely schemas, it gives me duplicated alias like in:

class MySchema(dy.Schema):
    a = dy.Int64(alias="a")
    b = dy.Int64(alias="a")

It could be nice from dataframely it icould raise in this case.

Changes

  • Detect duplicate aliases within a single schema
  • Detect duplicate aliases when inheriting from a parent schema
  • Add tests for both cases

Copilot AI review requested due to automatic review settings March 3, 2026 15:46
@gab23r gab23r changed the title Raise on duplicated aliases feat: Raise on duplicated aliases Mar 3, 2026
@github-actions github-actions bot added the enhancement New feature or request label Mar 3, 2026
@codecov
Copy link

codecov bot commented Mar 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (b3edd6a) to head (a421961).

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #291   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           54        54           
  Lines         3121      3125    +4     
=========================================
+ Hits          3121      3125    +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds early validation to schema definition to prevent ambiguous schemas where multiple Columns resolve to the same effective column name (via alias), raising ImplementationError during class creation and adding regression tests.

Changes:

  • Raise ImplementationError when two columns in the same schema resolve to the same name (alias or attribute name).
  • Raise ImplementationError when a child schema introduces a column whose resolved name clashes with an inherited one.
  • Add tests covering both same-schema and inherited duplicate-alias scenarios.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
dataframely/_base_schema.py Adds duplicate-name checks during metadata collection/merging to raise ImplementationError early.
tests/columns/test_alias.py Adds tests asserting duplicate aliases are rejected for same-schema and inherited definitions.

Comment on lines +87 to +90
if duplicated_column_names := self.columns.keys() & other.columns.keys():
raise ImplementationError(
f"Columns {duplicated_column_names} are duplicated."
)
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metadata.update() now raises on any overlapping column keys between self and other. Because _get_metadata_recursively() traverses __bases__ recursively (not the MRO), diamond / multiple-inheritance patterns can visit the same ancestor more than once (e.g. A->B, A->C, D(B, C)) and would now raise even though the duplicated columns come from the same original definition. Consider de-duplicating bases during metadata collection (e.g. walk a linearized MRO once) or, at minimum, allow overlaps where the column objects are identical and only raise when the overlapping key maps to different Column instances.

Suggested change
if duplicated_column_names := self.columns.keys() & other.columns.keys():
raise ImplementationError(
f"Columns {duplicated_column_names} are duplicated."
)
"""Merge another Metadata instance into this one.
Overlapping keys are allowed if and only if they refer to the *same*
underlying object. This accommodates multiple-inheritance / diamond
patterns where the same base schema is visited more than once.
"""
# Detect conflicting column definitions: same name, different Column instance
duplicated_column_names = self.columns.keys() & other.columns.keys()
conflicting_columns = {
name
for name in duplicated_column_names
if self.columns[name] is not other.columns[name]
}
if conflicting_columns:
raise ImplementationError(
f"Columns {conflicting_columns} are duplicated with conflicting definitions."
)
# Detect conflicting rule factories: same name, different object
duplicated_rule_names = self.rules.keys() & other.rules.keys()
conflicting_rules = {
name
for name in duplicated_rule_names
if self.rules[name] is not other.rules[name]
}
if conflicting_rules:
raise ImplementationError(
f"Rules {conflicting_rules} are duplicated with conflicting definitions."
)

Copilot uses AI. Check for mistakes.
Comment on lines +88 to +89
raise ImplementationError(
f"Columns {duplicated_column_names} are duplicated."
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception message uses the raw set representation of duplicated_column_names (derived from key-set intersection). For multiple duplicates this will be non-deterministically ordered, which can make debugging and tests brittle. Consider formatting a stable, sorted list of duplicated names (and optionally quoting each name) instead of interpolating the set directly.

Suggested change
raise ImplementationError(
f"Columns {duplicated_column_names} are duplicated."
formatted_names = ", ".join(
sorted(repr(name) for name in duplicated_column_names)
)
raise ImplementationError(
f"Columns {formatted_names} are duplicated."

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants