[MINOR] test use deterministic alphabetical test run order in surefire#2490
Closed
Baunsgaard wants to merge 3 commits into
Closed
[MINOR] test use deterministic alphabetical test run order in surefire#2490Baunsgaard wants to merge 3 commits into
Baunsgaard wants to merge 3 commits into
Conversation
Set surefire runOrder to alphabetical so test classes execute in the same sequence on every machine. The previous default (filesystem order) varied between CI runners and local checkouts, which made the component-c fork hang appear at an unpredictable class boundary and prevented local reproduction. Deterministic ordering makes the hang reproduce at a stable point so the responsible class can be identified.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2490 +/- ##
============================================
- Coverage 71.45% 71.42% -0.03%
+ Complexity 48855 48834 -21
============================================
Files 1572 1572
Lines 189117 189117
Branches 37106 37106
============================================
- Hits 135126 135072 -54
- Misses 43546 43585 +39
- Partials 10445 10460 +15 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Add a watchdog to the docker test entrypoint that detects a surefire fork whose cumulative CPU time stops advancing while the process stays alive (the signature of a fork that finished its tests but never exits). On detection it dumps the fork's Java stacks (jstack -l and forced jstack -F -l), mixed native frames (jstack -m), and per-thread kernel wait channels from /proc to the job log and to target/thread-dumps. This surfaces the JVM-shutdown stall behind the component-c job hang, which is not a live non-daemon Java thread (ruled out locally) and is therefore only observable in CI at the moment of the stall, before the fork is force-killed.
The watchdog crashed healthy test forks and turned nearly every Java test group red. It matched any process whose command line contained "surefirebooter", which includes the /bin/sh wrapper that launches the fork and waits in do_wait with zero CPU forever, so the zero-CPU-progress detector fired on essentially every fork after 12s. The subsequent jstack attach to a live fork disrupted surefire's master/fork stream protocol, killing the JVM (exit 131 / SIGQUIT) and producing "forked VM terminated without properly saying goodbye". The forced and mixed-mode jstack variants are also no-ops on JDK 17. Attaching to a busy fork is inherently unsafe for diagnostics, so drop the watchdog entirely and restore the previous entrypoint. The deterministic alphabetical run order added separately is unaffected.
Contributor
Author
|
Not needed, since it is not reproducing the error. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Set surefire runOrder to alphabetical so test classes execute in the same sequence on every machine.
This makes local reproduction of tests easier.