Skip to content

feat: enable dynamic filter pushdown across serialization boundary#348

Closed
sesteves wants to merge 1 commit intodatafusion-contrib:mainfrom
sesteves:fix/partition-isolator-filter-pushdown
Closed

feat: enable dynamic filter pushdown across serialization boundary#348
sesteves wants to merge 1 commit intodatafusion-contrib:mainfrom
sesteves:fix/partition-isolator-filter-pushdown

Conversation

@sesteves
Copy link
Copy Markdown

Summary

  • PartitionIsolatorExec now participates in filter pushdown: implements gather_filters_for_pushdown and handle_child_pushdown_result to forward parent filters to its child unchanged, matching the pattern used by other transparent passthrough nodes (CoalesceBatchesExec, RepartitionExec).

  • Re-run FilterPushdown(Post) after plan deserialization on workers: when the coordinator serializes an execution plan, DynamicFilterPhysicalExpr instances are snapshotted into static expressions, breaking the shared Arc that enables runtime file pruning. After deserialization in do_get, a post-optimization FilterPushdown pass re-establishes the connection between operators like SortExec(TopK) / HashJoinExec and the downstream DataSourceExec.

  • Added tests: unit test for PartitionIsolatorExec filter forwarding, and an integration-style test that verifies TopK dynamic filters survive the serialize → deserialize → optimize round-trip.

Two changes that work together to enable DataFusion's dynamic filter
pushdown (TopK, HashJoin) in distributed execution:

1. PartitionIsolatorExec now forwards parent filters to its child,
   matching the pattern used by CoalesceBatchesExec, RepartitionExec,
   and other transparent passthrough nodes.

2. After plan deserialization on the worker, re-run FilterPushdown(Post)
   to reconnect fresh DynamicFilterPhysicalExpr Arcs from operators
   like SortExec(TopK) to the downstream DataSourceExec. Serialization
   snapshots dynamic filters into static expressions, breaking the
   shared Arc. This pass restores it, enabling runtime file pruning.
@gabotechs
Copy link
Copy Markdown
Collaborator

Thanks for taking the time to contribute!

There is probably a better place for handling this than applying an optimization pass during execution. Doing this here seems more like a workaround for something that is better handled in DataFusion upstream.

There's already efforts for accomplishing that, which seem like a better fit for solving this problem:

@sesteves sesteves closed this Feb 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants