Skip to content

feat: name incompatible aggregate functions in mixed-execution fallback reason#4750

Open
andygrove wants to merge 2 commits into
apache:mainfrom
andygrove:improve-mixed-agg-fallback-message
Open

feat: name incompatible aggregate functions in mixed-execution fallback reason#4750
andygrove wants to merge 2 commits into
apache:mainfrom
andygrove:improve-mixed-agg-fallback-message

Conversation

@andygrove

@andygrove andygrove commented Jun 29, 2026

Copy link
Copy Markdown
Member

Which issue does this PR close?

N/A

Rationale for this change

When a Spark Final HashAggregate cannot be converted to Comet because the partial aggregate ran natively (or would have) and the intermediate buffer formats are incompatible between Spark and Comet, the fallback reason shown in EXPLAIN output and the tpcds plan-stability golden files was:

HashAggregate [COMET: Spark Final aggregate without Comet Partial requires compatible intermediate buffer formats]

This did not tell the reader which aggregate function caused the incompatibility, making it harder to understand why an aggregate fell back to Spark.

What changes are included in this PR?

  • Added QueryPlanSerde.aggsNotSupportingMixedExecution(aggExprs), which returns the aggregate functions whose supportsMixedPartialFinal is false (or that have no Comet serde). Both it and allAggsSupportMixedExecution now share a single supportsMixedExecution predicate, so the boolean check keeps its forall short-circuit.
  • CometBaseAggregate.doConvert now appends the offending function names to the fallback reason. The names are deduplicated and sorted alphabetically so the message is deterministic regardless of expression order, and the list is only computed when the aggregate is in Spark final mode. For example:
HashAggregate [COMET: Spark Final aggregate without Comet Partial requires compatible intermediate buffer formats, but the following aggregate function(s) have incompatible buffers: avg, count, sum]
  • Regenerated the affected tpcds plan-stability golden files across all Spark versions.

How are these changes tested?

Covered by the existing CometTPCDSV1_4_PlanStabilitySuite / CometTPCDSV2_7_PlanStabilitySuite plan-stability tests, whose golden files embed this fallback reason and were regenerated and verified to pass.

…ck reason

When a Spark Final aggregate cannot run because the partial aggregate ran
natively (or would have) and the intermediate buffer formats are incompatible
between Spark and Comet, the fallback reason now names the specific aggregate
function(s) responsible instead of only stating that compatible buffers are
required.
…helpers

Sort the offending aggregate function names alphabetically so the fallback
reason is deterministic regardless of expression order. Extract a shared
supportsMixedExecution predicate so allAggsSupportMixedExecution keeps its
forall short-circuit and aggsNotSupportingMixedExecution reuses the same
lookup, and only compute the incompatible list when in Spark final mode.
@andygrove andygrove marked this pull request as ready for review June 29, 2026 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant