Skip to content

Feature request: Option to rename columns from alias to attribute name after validation #292

@gab23r

Description

@gab23r

I'm working with Excel files that have messy column names (spaces, special characters, etc.) that are incompatible with pl.col.column_name syntax. I use alias to map these to clean Python identifiers, but after validation the DataFrame retains the alias as the column name.

I would be great if dataframely could rename the column if requested.

import dataframely as dy

class MySchema(dy.Schema):
    price = dy.Int64(alias="price ($)")
    production_rank = dy.Int64(alias="Production rank")

df = pl.DataFrame({"price ($)": [100], "Production rank": [1]})

# Current behavior:
validated = MySchema.validate(df)
validated.schema
# Schema([('price ($)', Int64), ('Production rank', Int64)])

# Desired behavior:
validated = MySchema.validate(df, use_attribute_names=True) # example of API: TBD
validated.schema
# Schema([('price', Int64), ('production_rank', Int64)])

# Current workaround:
validated = MySchema.validate(df).rename({
    col.alias: name 
    for name, col in vars(MySchema).items() 
    if isinstance(col, dy.Column) and col.alias
})

If this is managed via function argument like use_attribute_names, we may need to add this argument other function like create_empty, create_empty_if_none ...
This might not be ideal, maybe it can me an attibute of the Schema class (if possible ?)
Or maybe a global configuration in dataframely.Config.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions