Skip to content

HIVE-29695: ClassCastException in MapJoin when Parquet String is mapped to Date#6578

Open
Aggarwal-Raghav wants to merge 1 commit into
apache:masterfrom
Aggarwal-Raghav:HIVE-29695
Open

HIVE-29695: ClassCastException in MapJoin when Parquet String is mapped to Date#6578
Aggarwal-Raghav wants to merge 1 commit into
apache:masterfrom
Aggarwal-Raghav:HIVE-29695

Conversation

@Aggarwal-Raghav

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Adds the missing DATE case to ESTRING_CONVERTER in ETypeConverter.java to properly parse the Parquet string and return a DateWritableV2

Why are the changes needed?

To fix the

java.lang.ClassCastException: class org.apache.hadoop.io.Text cannot be cast to class org.apache.hadoop.hive.serde2.io.DateWritableV2 (org.apache.hadoop.io.Text and org.apache.hadoop.hive.serde2.io.DateWritableV2 are in unnamed module of loader 'app')

Does this PR introduce any user-facing change?

NO

How was this patch tested?

Wrote a Junit test and q file

mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=parquet_string_to_date_mapjoin.q -Drat.skip -Pitests -pl itests/qtest

@Aggarwal-Raghav

Copy link
Copy Markdown
Contributor Author

@deniskuzZ , need some advice here. In the q file present in this PR, if i enable vectorization then it is failing with the following error. I observed the original stacktrace in hive-sever2.err in prod in Hive 4.0.1 (Not sure if user has disabled vectorization at beeline session but this is my best guess). I think work done similar to HIVE-29649, needs to be done if we want to enable String to Date vectorized conversion?

Caused by: java.lang.UnsupportedOperationException
        at org.apache.parquet.column.values.ValuesReader.readLong(ValuesReader.java:185)
        at org.apache.hadoop.hive.ql.io.parquet.vector.ParquetDataColumnReaderFactory$DefaultParquetDataColumnReader.readLong(ParquetDataColumnReaderFactory.java:237)
        at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedPrimitiveColumnReader.readDate(VectorizedPrimitiveColumnReader.java:419)
        at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedPrimitiveColumnReader.readBatchHelper(VectorizedPrimitiveColumnReader.java:113)
        at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedPrimitiveColumnReader.readBatch(VectorizedPrimitiveColumnReader.java:88)
        at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:429)
        at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:358)
        at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:98)
        at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:374)
        at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:82)
        at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:118)
        at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:58)
        at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:208)
        at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116)
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:75)
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:417)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
        ... 15 more

@sonarqubecloud

sonarqubecloud Bot commented Jul 3, 2026

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants