Skip to content

[MINOR] test use deterministic alphabetical test run order in surefire#2490

Closed
Baunsgaard wants to merge 3 commits into
apache:mainfrom
Baunsgaard:investigate/component-c-hang
Closed

[MINOR] test use deterministic alphabetical test run order in surefire#2490
Baunsgaard wants to merge 3 commits into
apache:mainfrom
Baunsgaard:investigate/component-c-hang

Conversation

@Baunsgaard

Copy link
Copy Markdown
Contributor

Set surefire runOrder to alphabetical so test classes execute in the same sequence on every machine.

This makes local reproduction of tests easier.

Set surefire runOrder to alphabetical so test classes execute in the same
sequence on every machine. The previous default (filesystem order) varied
between CI runners and local checkouts, which made the component-c fork
hang appear at an unpredictable class boundary and prevented local
reproduction. Deterministic ordering makes the hang reproduce at a stable
point so the responsible class can be identified.
@codecov

codecov Bot commented Jun 16, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.42%. Comparing base (65e734e) to head (bec5e70).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2490      +/-   ##
============================================
- Coverage     71.45%   71.42%   -0.03%     
+ Complexity    48855    48834      -21     
============================================
  Files          1572     1572              
  Lines        189117   189117              
  Branches      37106    37106              
============================================
- Hits         135126   135072      -54     
- Misses        43546    43585      +39     
- Partials      10445    10460      +15     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Add a watchdog to the docker test entrypoint that detects a surefire
fork whose cumulative CPU time stops advancing while the process stays
alive (the signature of a fork that finished its tests but never exits).
On detection it dumps the fork's Java stacks (jstack -l and forced
jstack -F -l), mixed native frames (jstack -m), and per-thread kernel
wait channels from /proc to the job log and to target/thread-dumps.

This surfaces the JVM-shutdown stall behind the component-c job hang,
which is not a live non-daemon Java thread (ruled out locally) and is
therefore only observable in CI at the moment of the stall, before the
fork is force-killed.
The watchdog crashed healthy test forks and turned nearly every Java
test group red. It matched any process whose command line contained
"surefirebooter", which includes the /bin/sh wrapper that launches the
fork and waits in do_wait with zero CPU forever, so the zero-CPU-progress
detector fired on essentially every fork after 12s. The subsequent
jstack attach to a live fork disrupted surefire's master/fork stream
protocol, killing the JVM (exit 131 / SIGQUIT) and producing "forked VM
terminated without properly saying goodbye". The forced and mixed-mode
jstack variants are also no-ops on JDK 17.

Attaching to a busy fork is inherently unsafe for diagnostics, so drop
the watchdog entirely and restore the previous entrypoint. The
deterministic alphabetical run order added separately is unaffected.
@Baunsgaard

Copy link
Copy Markdown
Contributor Author

Not needed, since it is not reproducing the error.

@Baunsgaard Baunsgaard closed this Jun 17, 2026
@github-project-automation github-project-automation Bot moved this from In Progress to Done in SystemDS PR Queue Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant