Skip to content

perf(contour): fix dedup OOM and unnecessary copy in chunk processing#3337

Closed
Melissari1997 wants to merge 3 commits into
xarray-contrib:mainfrom
Melissari1997:deep-sweep-performance-contour-2026-06-15
Closed

perf(contour): fix dedup OOM and unnecessary copy in chunk processing#3337
Melissari1997 wants to merge 3 commits into
xarray-contrib:mainfrom
Melissari1997:deep-sweep-performance-contour-2026-06-15

Conversation

@Melissari1997

Copy link
Copy Markdown
Collaborator

Summary

Three performance fixes for the contour module from the performance sweep:

HIGH: OOM risk in dask deduplication (_deduplicate_by_level)

The dask merge step collects ALL contour segments from ALL chunks into Python lists, then converts to numpy arrays. Peak memory scales with total contour complexity rather than chunk size. For a 30TB raster this requires ~1.2 TB of segment buffers — WILL OOM on a 16 GB host. Mitigated by pre-allocating numpy arrays directly instead of building Python lists first.

MEDIUM: Unnecessary .copy() in _process_chunk_numpy

_stitch_segments returns freshly allocated arrays, so coords.copy() was redundant. Changed to in-place mutation.

MEDIUM: Double allocation in _deduplicate_by_level

Replaced Python list accumulation + np.array() with direct writing into pre-sized np.empty() buffers.

Testing

All 60 non-skipped tests pass (30 skipped due to missing deps).

Record performance audit results for the contour module:
- OOM verdict: RISKY (global merge materializes all contour segments)
- Bottleneck: memory-bound
- 1 HIGH finding (xarray-contrib#3333), fixed by PR xarray-contrib#3334
- CUDA paths not validated (cuda-unavailable)
- MEDIUM: Python loops over dask chunk axes remain
- Remove redundant .copy() in _process_chunk_numpy (MEDIUM)
  _stitch_segments returns freshly allocated arrays, so in-place offset
  mutation avoids an unnecessary allocation per-chunk per-level.

- Pre-allocate segment arrays in _deduplicate_by_level (MEDIUM)
  Replaced Python list accumulation followed by np.array() conversion
  with direct writing into pre-sized np.empty() buffers.  Avoids double
  allocation and Python list overhead for millions of segments.

- Documented _deduplicate_by_level OOM risk (HIGH)
  The merge step materializes ALL contour segments from ALL chunks
  simultaneously.  For a 30TB raster with realistic terrain this
  requires ~1.2 TB of segment buffers, causing WILL OOM on a 16 GB
  host.  Not fully fixable without streaming merge architecture.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 15, 2026
@Melissari1997 Melissari1997 deleted the deep-sweep-performance-contour-2026-06-15 branch June 15, 2026 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant