Skip to content

[CALCITE-7604] Add rule to pull up GROUP BY above JOIN#5029

Open
zabetak wants to merge 1 commit into
apache:mainfrom
zabetak:CALCITE-7604-02
Open

[CALCITE-7604] Add rule to pull up GROUP BY above JOIN#5029
zabetak wants to merge 1 commit into
apache:mainfrom
zabetak:CALCITE-7604-02

Conversation

@zabetak

@zabetak zabetak commented Jun 19, 2026

Copy link
Copy Markdown
Member

Jira Link

CALCITE-7604

Changes Proposed

  1. Add JoinAggregateTransposeRule to pull up GROUP BY above JOIN
  2. Add test cases for the rule showcasing the effects on the most common cases

@sonarqubecloud

Copy link
Copy Markdown

* Report CS 95-09, Dept. of Computer Science, University of Waterloo, Canada, 1995.</li>
* <li>Weipeng P. Yan, and Per-Ake Larson. "Eager Aggregation and Lazy Aggregation." Proceedings
* of the 21th International Conference on Very Large Data Bases. 1995.</li>
* </ul>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A short summary of what the rule does will help readers who are too lazy to find the papers.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add the link to the Javadoc if one is available?

final RelNode right = join.getRight();

final JoinInfo joinInfo = join.analyzeCondition();
// The right side must be unique on its join keys (no row duplication)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine this must be symmetric for inner joins, but perhaps you rely on some join commutativity rule to find the symmetric optimization?

LogicalProject(DEPTNO=[$2], TOTAL_SAL=[$3], NAME=[$1])
LogicalProject(DEPTNO=[$2], NAME=[$3], DEPTNO0=[$0], TOTAL_SAL=[$1])
LogicalProject(DEPTNO=[$0], TOTAL_SAL=[$3], DEPTNO0=[$1], NAME=[$2])
LogicalAggregate(group=[{7, 9, 10}], TOTAL_SAL=[SUM($5)])

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some quidem tests would increase our confidence.
Moreover, I don't see any test with 2 aggregates, the rule seems to allow it

* Report CS 95-09, Dept. of Computer Science, University of Waterloo, Canada, 1995.</li>
* <li>Weipeng P. Yan, and Per-Ake Larson. "Eager Aggregation and Lazy Aggregation." Proceedings
* of the 21th International Conference on Very Large Data Bases. 1995.</li>
* </ul>

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add the link to the Javadoc if one is available?

final JoinInfo joinInfo = join.analyzeCondition();
// The right side must be unique on its join keys (no row duplication)
final RelMetadataQuery mq = call.getMetadataQuery();
if (!Boolean.TRUE.equals(mq.areColumnsUnique(right, joinInfo.rightSet()))) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to move this logic into boolean matches(RelOptRuleCall call)?

final int oldJoinWidth = join.getRowType().getFieldCount();
final int newJoinWidth = rawFieldCount + rightFields;

final Map<Integer, Integer> conditionMap = new HashMap<>();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be a bit better? It eliminates the need for a HashMap.

final Mappings.TargetMapping condMapping =
    Mappings.create(MappingType.PARTIAL_FUNCTION, oldJoinWidth, newJoinWidth);
for (int i = 0; i < groupList.size(); i++) {
  condMapping.set(i, groupList.get(i));
}
for (int j = 0; j < rightFields; j++) {
  condMapping.set(leftFields + j, rawFieldCount + j);
}
final RexNode newCondition = RexUtil.apply(condMapping, join.getCondition());

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants