Skip to content

fix: Support scalar inputs for datetime expressions when constant folding is disabled#4769

Open
andygrove wants to merge 1 commit into
apache:mainfrom
andygrove:fix-3336-scalar-datetime
Open

fix: Support scalar inputs for datetime expressions when constant folding is disabled#4769
andygrove wants to merge 1 commit into
apache:mainfrom
andygrove:fix-3336-scalar-datetime

Conversation

@andygrove

Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #3336

This continues the work in #3448 by @0lai0, rebased onto current main with the merge conflicts in unix_timestamp.rs resolved (the file gained a TimestampNTZ branch and div_floor handling on main since that PR was opened). The original commit authorship is preserved.

Rationale for this change

When Spark's ConstantFolding rule is disabled, all-literal datetime expressions such as hour(timestamp('2024-01-15 12:30:45')) reach Comet's native engine as ColumnarValue::Scalar instead of being folded away in the JVM. The native engine had no scalar match arm for these functions and returned "<expr>(scalar) should be fold in Spark JVM side.", failing the query.

What changes are included in this PR?

  • extract_date_part.rs: add a ColumnarValue::Scalar match arm to the extract_date_part! macro (covers hour, minute, second). The scalar is broadcast to a single-element array, evaluated, and converted back to a scalar via ScalarValue::try_from_array.
  • unix_timestamp.rs: refactor the array evaluation into eval_array and add a scalar arm that reuses it. The merge keeps main's Timestamp(Microsecond, None) (TimestampNTZ) branch and div_floor behavior.
  • Remove the ignore(https://github.com/apache/datafusion-comet/issues/3336) annotations from the existing hour, minute, second, and unix_timestamp SQL file tests so the literal cases are exercised.

How are these changes tested?

The existing SQL file tests for hour, minute, second, and unix_timestamp run with ConstantFolding disabled and compare Comet's output against Spark. Their literal-argument cases were previously ignored for this issue and are now enabled. All 104 tests in the datetime SQL file suite pass locally, along with the native datetime_funcs unit tests.

…ding is disabled

When Spark's ConstantFolding rule is disabled, all-literal datetime
expressions (hour, minute, second, unix_timestamp) reach Comet's native
engine as ColumnarValue::Scalar instead of being folded in the JVM. The
native engine previously errored with "<expr>(scalar) should be fold in
Spark JVM side."

Add a Scalar match arm to extract_date_part! and unix_timestamp that
broadcasts the scalar to a single-element array, evaluates it, and
converts the result back to a scalar. Refactor the unix_timestamp array
logic into eval_array so both the array and scalar paths share it. Remove
the ignore annotations from the existing hour/minute/second/unix_timestamp
SQL file tests so they cover the fix against Spark.

Closes apache#3336
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Native engine panics on all-scalar inputs for hour, minute, second, unix_timestamp

2 participants