feat(import): add support for multiple hbase snapshot imports#4600
Conversation
f5324bd to
f8b5932
Compare
13f65dc to
a3a6c7f
Compare
d511a61 to
5ec8dc1
Compare
84d905f to
4299c64
Compare
18a4509 to
fa0e9ec
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a new tool, HBaseSnapshotRestoreTool, and updates the existing ImportJobFromHbaseSnapshot to support loading multiple HBase snapshots into Bigtable. It includes several infrastructure improvements, such as adding necessary dependencies, introducing a RegionConfigCoder for efficient serialization, and enhancing the ReadRegions transform with dynamic splitting and sharding capabilities. My review identified several areas for improvement, including fixing an incorrect tracker claim in ReadSnapshotRegion, improving configuration handling in ImportJobFromHbaseSnapshot, ensuring consistent brace usage, and addressing potential null pointer exceptions and minor code style issues.
|
|
||
| List<Mutation> mutations = new ArrayList<>(); | ||
|
|
||
| boolean logAndSkipIncompatibleRowMutations = |
There was a problem hiding this comment.
I don't think the logic is correc,t I think checking the flag should be inside of convertAndValidateThresholds? And also, why pass in an empty list? I think we can just do List mutations = convertAndValidateThresholds(rowKey, element.getValue()..., snapshotName)
mutianf
left a comment
There was a problem hiding this comment.
There's also a EndToEnd IT failing, can you take a look?
com.google.cloud.bigtable.beam.hbasesnapshots.EndToEndIT.testHBaseSnapshotImportWithSharding -- Time elapsed: 434.2 s <<< FAILURE!
java.lang.AssertionError: expected: but was:
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.failNotEquals(Assert.java:835)
at org.junit.Assert.assertEquals(Assert.java:120)
at org.junit.Assert.assertEquals(Assert.java:146)
at com.google.cloud.bigtable.beam.hbasesnapshots.EndToEndIT.testHBaseSnapshotImportWithSharding(EndToEndIT.java:508)
|
/gemini review |
|
/gemini review |
|
/gemini review |
|
/gemini review |
…stabilization, and resilience updates - Supports importing multiple snapshot copies concurrently from GCS or local filesystem. - Adds parallel sharding utilizing Splittable DoFns and custom RegionConfig mapping. - Improves stabilization by resolving various NullPointerExceptions and Mockito inline limitations on JDK 21+. - Introduces comprehensive unit testing for all steps of the snapshot import pipeline.
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
b/429250716
This is the first PR that incorporates changes from https://github.com/jhambleton/java-bigtable-hbase/commits/dataflow-v2-v2.15.6 and some fixes to make it pass the tests.
SnapshotUtilsTest.testGetHbaseConfiguration was failing because the static configuration field SnapshotUtils.hbaseConfiguration cached state between test cases, leaking stale data into subsequent tests.
SnapshotUtilsTest.testAppendCurrentTimestamp was throwing a NumberFormatException because the return value contained a UUID suffix (timestamp-UUID), but the test attempted to parse the entire string directly as a Long.
Integration tests failed on Java 8 and 11 in Kokoro because of unshaded transitive dependency conflicts (com.google.protobuf.LiteralByteString NoClassDefFoundError).
Several useful unit tests were commented out in ImportJobFromHbaseSnapshotTest because mockito-core lacked the ability to mock static methods.
Switched from mockito-core to mockito-inline in the pom.xml to allow static mocking.
Uncommented the code and restored the original formatting to prevent any lint errors, enabling JUnit to verify correct configuration parsing.