Skip to content

feat(parquet): implement file commit protocol for native Parquet writes#4746

Draft
peterxcli wants to merge 1 commit into
apache:mainfrom
peterxcli:feat/spark-file-commit
Draft

feat(parquet): implement file commit protocol for native Parquet writes#4746
peterxcli wants to merge 1 commit into
apache:mainfrom
peterxcli:feat/spark-file-commit

Conversation

@peterxcli

Copy link
Copy Markdown
Member

Native Parquet writes now write to the task temp file returned by Spark's FileCommitProtocol, then commit or abort tasks through Spark's standard lifecycle.

Pass the task output path through the native Parquet writer proto, wire commit protocol setup in CometNativeWriteExec, and assert committed output naming/cleanup in CometParquetWriterSuite.

Which issue does this PR close?

Closes #2827.

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Native Parquet writes now write to the task temp file returned by Spark's FileCommitProtocol, then commit or abort tasks through Spark's standard lifecycle.

Pass the task output path through the native Parquet writer proto, wire commit protocol setup in CometNativeWriteExec, and assert committed output naming/cleanup in CometParquetWriterSuite.
@comphead

Copy link
Copy Markdown
Contributor

Thanks @peterxcli please check on your fork https://github.com/apache/datafusion-comet/actions/workflows/spark_sql_writer_tests.yml

@peterxcli

Copy link
Copy Markdown
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement file commit protocol for native Parquet writes

2 participants