Skip to content

Kafka Connect: Fix UUID conversion for Parquet writes#17079

Open
thswlsqls wants to merge 1 commit into
apache:mainfrom
thswlsqls:fix/kafka-connect-uuid-parquet-conversion
Open

Kafka Connect: Fix UUID conversion for Parquet writes#17079
thswlsqls wants to merge 1 commit into
apache:mainfrom
thswlsqls:fix/kafka-connect-uuid-parquet-conversion

Conversation

@thswlsqls

Copy link
Copy Markdown
Contributor

Closes #17076

Summary

  • RecordConverter.convertUUID() returned byte[] for UUID columns when the target file format is Parquet, but the Parquet UUID writer (ParquetValueWriters.uuids()) expects a java.util.UUID and converts to bytes itself, so writes threw ClassCastException: class [B cannot be cast to class java.util.UUID.
  • Removes the byte[] branch so convertUUID() always returns UUID, matching ORC (GenericOrcWriters.uuids()) and Avro, which already accept UUID directly.
  • The byte[] conversion matched the writer contract before PR Parquet: Add readers and writers for the internal object model #11904 changed ParquetValueWriters' UUID writer to accept UUID directly; kafka-connect was not updated to follow — this restores the correct contract.
  • Note: open PR Kafka Connect: Precompute UUID-as-bytes flag in RecordConverter #16654 ("Kafka Connect: Precompute UUID-as-bytes flag in RecordConverter") touches the same method but explicitly preserves the current byte[] behavior, so it does not fix this bug; whichever of the two merges first, the other will need a rebase.

Testing done

  • Updated TestRecordConverter#testUUIDConversionWithParquet to assert the field equals the original UUID, replacing the UUIDUtil.convert(UUID_VAL) byte[] expectation.
  • ./gradlew :iceberg-kafka-connect:iceberg-kafka-connect:check passes — TestRecordConverter 59/59, full module 122/122, 0 failures.

RecordConverter.convertUUID() converted UUID values to byte[] when the
target file format is Parquet. The Parquet UUID writer (ParquetValueWriters.uuids())
expects a java.util.UUID and converts to bytes internally, so writing a
UUID column with the default file format threw ClassCastException: class
[B cannot be cast to class java.util.UUID.

The byte[] branch matched the writer contract before apache#11904 changed
ParquetValueWriters' UUID writer to accept UUID directly; kafka-connect
was not updated to follow. This removes the byte[] conversion so
convertUUID always returns a UUID, matching ORC/Avro and the current
Parquet writer contract.

Generated-by: Claude Code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Kafka Connect: UUID columns fail to write with Parquet (ClassCastException)

1 participant