perf(contour): fix dedup OOM and unnecessary copy in chunk processing#3337
Closed
Melissari1997 wants to merge 3 commits into
Closed
perf(contour): fix dedup OOM and unnecessary copy in chunk processing#3337Melissari1997 wants to merge 3 commits into
Melissari1997 wants to merge 3 commits into
Conversation
Record performance audit results for the contour module: - OOM verdict: RISKY (global merge materializes all contour segments) - Bottleneck: memory-bound - 1 HIGH finding (xarray-contrib#3333), fixed by PR xarray-contrib#3334 - CUDA paths not validated (cuda-unavailable) - MEDIUM: Python loops over dask chunk axes remain
- Remove redundant .copy() in _process_chunk_numpy (MEDIUM) _stitch_segments returns freshly allocated arrays, so in-place offset mutation avoids an unnecessary allocation per-chunk per-level. - Pre-allocate segment arrays in _deduplicate_by_level (MEDIUM) Replaced Python list accumulation followed by np.array() conversion with direct writing into pre-sized np.empty() buffers. Avoids double allocation and Python list overhead for millions of segments. - Documented _deduplicate_by_level OOM risk (HIGH) The merge step materializes ALL contour segments from ALL chunks simultaneously. For a 30TB raster with realistic terrain this requires ~1.2 TB of segment buffers, causing WILL OOM on a 16 GB host. Not fully fixable without streaming merge architecture.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three performance fixes for the contour module from the performance sweep:
HIGH: OOM risk in dask deduplication (
_deduplicate_by_level)The dask merge step collects ALL contour segments from ALL chunks into Python lists, then converts to numpy arrays. Peak memory scales with total contour complexity rather than chunk size. For a 30TB raster this requires ~1.2 TB of segment buffers — WILL OOM on a 16 GB host. Mitigated by pre-allocating numpy arrays directly instead of building Python lists first.
MEDIUM: Unnecessary
.copy()in_process_chunk_numpy_stitch_segmentsreturns freshly allocated arrays, socoords.copy()was redundant. Changed to in-place mutation.MEDIUM: Double allocation in
_deduplicate_by_levelReplaced Python list accumulation +
np.array()with direct writing into pre-sizednp.empty()buffers.Testing
All 60 non-skipped tests pass (30 skipped due to missing deps).