Skip to content

feat(import): add support for multiple hbase snapshot imports#4600

Merged
tianlei2 merged 1 commit into
googleapis:mainfrom
tianlei2:dataflow-import
Jun 17, 2026
Merged

feat(import): add support for multiple hbase snapshot imports#4600
tianlei2 merged 1 commit into
googleapis:mainfrom
tianlei2:dataflow-import

Conversation

@tianlei2

@tianlei2 tianlei2 commented Apr 28, 2026

Copy link
Copy Markdown
Contributor

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

b/429250716

This is the first PR that incorporates changes from https://github.com/jhambleton/java-bigtable-hbase/commits/dataflow-v2-v2.15.6 and some fixes to make it pass the tests.

  • Fixed Test Isolation Issues
    SnapshotUtilsTest.testGetHbaseConfiguration was failing because the static configuration field SnapshotUtils.hbaseConfiguration cached state between test cases, leaking stale data into subsequent tests.
    • Solution: Added a @before setup method to reset the static field to null via reflection before every test run.
  • Fixed Timestamp Formatting Tests
    SnapshotUtilsTest.testAppendCurrentTimestamp was throwing a NumberFormatException because the return value contained a UUID suffix (timestamp-UUID), but the test attempted to parse the entire string directly as a Long.
    • Solution: Updated the test to split the string using the "-" character to extract and correctly parse just the timestamp prefix.
  • Resolved Classpath and SPI Conflicts (dnsjava)
    Integration tests failed on Java 8 and 11 in Kokoro because of unshaded transitive dependency conflicts (com.google.protobuf.LiteralByteString NoClassDefFoundError).
    • Solution: Reverted back to the shaded hbase-shaded-mapreduce dependency, ensuring proper compatibility across all Java versions.
  • Uncommented and Fixed Tests in ImportJobFromHbaseSnapshotTest
    Several useful unit tests were commented out in ImportJobFromHbaseSnapshotTest because mockito-core lacked the ability to mock static methods.
    • Solution:
      Switched from mockito-core to mockito-inline in the pom.xml to allow static mocking.
      Uncommented the code and restored the original formatting to prevent any lint errors, enabling JUnit to verify correct configuration parsing.
  • ComputeAndValidateHashFromBigtableDoFnTest.java was accidentally deleted, adding back
  • Cleanups on unused comments

@tianlei2 tianlei2 requested a review from a team as a code owner April 28, 2026 18:41
@product-auto-label product-auto-label Bot added size: xl Pull request size is extra large. api: bigtable Issues related to the googleapis/java-bigtable-hbase API. labels Apr 28, 2026
@tianlei2 tianlei2 force-pushed the dataflow-import branch 3 times, most recently from f5324bd to f8b5932 Compare April 28, 2026 19:22
@tianlei2 tianlei2 changed the title Dataflow import feat(import): add support for multiple hbase snapshot imports Apr 28, 2026
@tianlei2 tianlei2 force-pushed the dataflow-import branch 3 times, most recently from 13f65dc to a3a6c7f Compare April 28, 2026 20:17
@tianlei2 tianlei2 marked this pull request as draft April 28, 2026 20:42
@tianlei2 tianlei2 marked this pull request as ready for review April 28, 2026 22:41
@tianlei2 tianlei2 marked this pull request as draft April 28, 2026 22:42
@tianlei2 tianlei2 added the kokoro:run Add this label to force Kokoro to re-run the tests. label Apr 29, 2026
@tianlei2 tianlei2 self-assigned this Apr 29, 2026
@yoshi-kokoro yoshi-kokoro removed the kokoro:run Add this label to force Kokoro to re-run the tests. label Apr 29, 2026
@tianlei2 tianlei2 force-pushed the dataflow-import branch 5 times, most recently from d511a61 to 5ec8dc1 Compare April 29, 2026 19:57
@tianlei2 tianlei2 requested a review from vermas2012 April 29, 2026 20:27
@googleapis googleapis deleted a comment from google-cla Bot Apr 29, 2026
@tianlei2 tianlei2 marked this pull request as ready for review April 29, 2026 20:33
@tianlei2 tianlei2 force-pushed the dataflow-import branch 4 times, most recently from 84d905f to 4299c64 Compare May 6, 2026 16:20
@tianlei2 tianlei2 force-pushed the dataflow-import branch 7 times, most recently from 18a4509 to fa0e9ec Compare May 16, 2026 01:48
@tianlei2 tianlei2 added kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. and removed kokoro:run Add this label to force Kokoro to re-run the tests. labels May 16, 2026
@mutianf

mutianf commented May 19, 2026

Copy link
Copy Markdown
Contributor

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new tool, HBaseSnapshotRestoreTool, and updates the existing ImportJobFromHbaseSnapshot to support loading multiple HBase snapshots into Bigtable. It includes several infrastructure improvements, such as adding necessary dependencies, introducing a RegionConfigCoder for efficient serialization, and enhancing the ReadRegions transform with dynamic splitting and sharding capabilities. My review identified several areas for improvement, including fixing an incorrect tracker claim in ReadSnapshotRegion, improving configuration handling in ImportJobFromHbaseSnapshot, ensuring consistent brace usage, and addressing potential null pointer exceptions and minor code style issues.

Comment thread bigtable-dataflow-parent/bigtable-beam-import/pom.xml

List<Mutation> mutations = new ArrayList<>();

boolean logAndSkipIncompatibleRowMutations =

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the logic is correc,t I think checking the flag should be inside of convertAndValidateThresholds? And also, why pass in an empty list? I think we can just do List mutations = convertAndValidateThresholds(rowKey, element.getValue()..., snapshotName)

@mutianf mutianf left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also a EndToEnd IT failing, can you take a look?

com.google.cloud.bigtable.beam.hbasesnapshots.EndToEndIT.testHBaseSnapshotImportWithSharding -- Time elapsed: 434.2 s <<< FAILURE!
java.lang.AssertionError: expected: but was:
	at org.junit.Assert.fail(Assert.java:89)
	at org.junit.Assert.failNotEquals(Assert.java:835)
	at org.junit.Assert.assertEquals(Assert.java:120)
	at org.junit.Assert.assertEquals(Assert.java:146)
	at com.google.cloud.bigtable.beam.hbasesnapshots.EndToEndIT.testHBaseSnapshotImportWithSharding(EndToEndIT.java:508)

@tianlei2

Copy link
Copy Markdown
Contributor Author

/gemini review

gemini-code-assist[bot]

This comment was marked as resolved.

@tianlei2

Copy link
Copy Markdown
Contributor Author

/gemini review

gemini-code-assist[bot]

This comment was marked as resolved.

@mutianf mutianf left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some final nits.

@mutianf

mutianf commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

/gemini review

gemini-code-assist[bot]

This comment was marked as resolved.

@tianlei2

tianlei2 commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

/gemini review

gemini-code-assist[bot]

This comment was marked as resolved.

Comment thread bigtable-dataflow-parent/bigtable-beam-import/pom.xml
…stabilization, and resilience updates

- Supports importing multiple snapshot copies concurrently from GCS or local filesystem.
- Adds parallel sharding utilizing Splittable DoFns and custom RegionConfig mapping.
- Improves stabilization by resolving various NullPointerExceptions and Mockito inline limitations on JDK 21+.
- Introduces comprehensive unit testing for all steps of the snapshot import pipeline.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigtable Issues related to the googleapis/java-bigtable-hbase API. size: xl Pull request size is extra large.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants