Kafka Connect: Fix UUID conversion for Parquet writes#17079
Open
thswlsqls wants to merge 1 commit into
Open
Conversation
RecordConverter.convertUUID() converted UUID values to byte[] when the target file format is Parquet. The Parquet UUID writer (ParquetValueWriters.uuids()) expects a java.util.UUID and converts to bytes internally, so writing a UUID column with the default file format threw ClassCastException: class [B cannot be cast to class java.util.UUID. The byte[] branch matched the writer contract before apache#11904 changed ParquetValueWriters' UUID writer to accept UUID directly; kafka-connect was not updated to follow. This removes the byte[] conversion so convertUUID always returns a UUID, matching ORC/Avro and the current Parquet writer contract. Generated-by: Claude Code
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #17076
Summary
RecordConverter.convertUUID()returnedbyte[]for UUID columns when the target file format is Parquet, but the Parquet UUID writer (ParquetValueWriters.uuids()) expects ajava.util.UUIDand converts to bytes itself, so writes threwClassCastException: class [B cannot be cast to class java.util.UUID.byte[]branch soconvertUUID()always returnsUUID, matching ORC (GenericOrcWriters.uuids()) and Avro, which already acceptUUIDdirectly.byte[]conversion matched the writer contract before PR Parquet: Add readers and writers for the internal object model #11904 changedParquetValueWriters' UUID writer to acceptUUIDdirectly;kafka-connectwas not updated to follow — this restores the correct contract.byte[]behavior, so it does not fix this bug; whichever of the two merges first, the other will need a rebase.Testing done
TestRecordConverter#testUUIDConversionWithParquetto assert the field equals the originalUUID, replacing theUUIDUtil.convert(UUID_VAL)byte[] expectation../gradlew :iceberg-kafka-connect:iceberg-kafka-connect:checkpasses —TestRecordConverter59/59, full module 122/122, 0 failures.