-
Notifications
You must be signed in to change notification settings - Fork 14
Open
Description
I'm working with Excel files that have messy column names (spaces, special characters, etc.) that are incompatible with pl.col.column_name syntax. I use alias to map these to clean Python identifiers, but after validation the DataFrame retains the alias as the column name.
I would be great if dataframely could rename the column if requested.
import dataframely as dy
class MySchema(dy.Schema):
price = dy.Int64(alias="price ($)")
production_rank = dy.Int64(alias="Production rank")
df = pl.DataFrame({"price ($)": [100], "Production rank": [1]})
# Current behavior:
validated = MySchema.validate(df)
validated.schema
# Schema([('price ($)', Int64), ('Production rank', Int64)])
# Desired behavior:
validated = MySchema.validate(df, use_attribute_names=True) # example of API: TBD
validated.schema
# Schema([('price', Int64), ('production_rank', Int64)])
# Current workaround:
validated = MySchema.validate(df).rename({
col.alias: name
for name, col in vars(MySchema).items()
if isinstance(col, dy.Column) and col.alias
})If this is managed via function argument like use_attribute_names, we may need to add this argument other function like create_empty, create_empty_if_none ...
This might not be ideal, maybe it can me an attibute of the Schema class (if possible ?)
Or maybe a global configuration in dataframely.Config.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels