fix(threads): rayon multithread verification + cgroup-aware thread cap (#263)#265
Merged
Conversation
Design for issue #263: force-parallel override, cap_threads CFS-quota fix + RAYON_NUM_THREADS overwrite, defensive allow_threads around rayon FFI, and a spawn-worker stress reproducer. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ation plan Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…263) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
cap_threads() overwrites RAYON_NUM_THREADS with GVL's resolved count, so a bare RAYON_NUM_THREADS=8 in the worker was clobbered by the host's detected cpu count. Set GVL_NUM_THREADS=8 so each worker deterministically runs 8 rayon threads (N_WORKERS*8 oversubscription) regardless of host cores. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Makes gvl's rayon-parallel paths runnable and verified under real multithreaded load, fixes the
cap_threads()oversubscription bugs behind #263, and releases the GIL around the rayon FFI so parallel workers no longer serialize on it or oversubscribe/park.Closes #263.
What changed
Thread-count resolver (
python/genvarloader/_threads.py)GVL_FORCE_PARALLELenv knob bypasses the size gate so the multithreaded paths run on small inputs (tests, repro harnesses).cpu.max(falling back to v1cpu.cfs_quota_us/cpu.cfs_period_us). A quota is invisible tosched_getaffinity, so a 15.3-core container previously reported the full host core count and oversubscribed. Detection now takesmin(affinity, quota).cap_threads()now overwritesRAYON_NUM_THREADSwith GVL's resolved cap (wassetdefault).FFI (
src/ffi/mod.rs)py.detach(...)(PyO3 0.28's rename ofallow_threads). EveryPyReadonly*/PyReadwrite*guard is resolved to anndarrayview before the closure; onlyUngilviews + POD are captured; everyinto_pyarrayruns after. Byte-identical serial==parallel==golden parity is preserved.Tests
tests/integration/test_rayon_forced_parallel.py— forced-paralleldataset[:, :]is byte-identical to the serial path end-to-end (compares both ragged.dataand.offsets).tests/integration/test_rayon_stress.py(slow) — repeated spawn-worker waves, oversubscribed rayon, with a per-launch timeout as the deadlock detector.tests/parity/*— marked thepytest.skipexceptbranchesNoReturn(trailingraise) to clear pre-existing pyreflyunbound-nameerrors, unblocking the project-wide pre-commit pyrefly hook.RAYON_NUM_THREADSis now overwrittencap_threads()previously respected an externally-setRAYON_NUM_THREADS; it now clobbers it with GVL's cgroup-derived cap. This is the point of the fix — an inherited value (e.g. from a base image) must not defeat the cap and cause oversubscription (#263).Escape hatch: users who want an explicit worker count should set
GVL_NUM_THREADS(which becomes the resolved cap and is written toRAYON_NUM_THREADS). Setting a bareRAYON_NUM_THREADSalone no longer wins.Root cause / Task 8
The spawn-worker stress reproducer completed all launches cleanly (no deadlock) once the oversubscription fixes + GIL release were in place — so the contingent root-cause task was not needed. The #263 hang was driven by per-worker oversubscription, addressed by the cgroup-aware cap + unconditional
RAYON_NUM_THREADSoverwrite + GIL release.Verification
cargo-test: 110 + 4 passed;cargo build --release: clean.test_rayon_equivalence.py(byte-identical parity): 5 passed.pytest tests -q, slow excluded): 945 passed, 54 skipped, 4 xfailed.pyrefly: 0 errors;ruff check/ruff format --check: clean. Pre-commit hooks pass without--no-verify.No public API /
__all__/SKILL.mdchange (env-only knobs), so no skill update needed.Note: this branch also carries the initiative's spec and plan docs (
docs/superpowers/{specs,plans}/2026-06-30-rayon-multithread-verification*), which were committed to localmainahead of origin.🤖 Generated with Claude Code