Skip to content

Pass memory::reservation into HOST converter allocations#152

Open
Jedi18 wants to merge 3 commits into
NVIDIA:mainfrom
Jedi18:fix_reservation_converter
Open

Pass memory::reservation into HOST converter allocations#152
Jedi18 wants to merge 3 commits into
NVIDIA:mainfrom
Jedi18:fix_reservation_converter

Conversation

@Jedi18

@Jedi18 Jedi18 commented Jun 26, 2026

Copy link
Copy Markdown

Summary

The converter API had no memory::reservation, so converters were double counting the capacity in the __allocated_bytes, causing artificial rmm::out_of_memory under tight capacity.

This PR threads an optional memory::reservation* from the conversion entry points down to the HOST allocator so the conversion draws down the existing reservation instead of committing fresh capacity.

Root cause

The HOST converters called allocate_multiple_blocks(size) without a reservation. That path adds the full allocation size to the pool's _allocated_bytes counter — the same counter reserve() already charged. So reserve-then-convert charged N twice. The fix is to pass the reservation: allocate_multiple_blocks(size, reservation*) charges nothing extra when the allocation fits the reservation. The overload already existed; converters just never got a reservation to hand it.

Changes

Converter API — added a memory::reservation* parameter to representation_converter_fn, convert<T>(), the runtime convert(...) overload, convert_impl, and data_batch's convert_to / clone_to methods. The parameter defaults to nullptr on every public entry point, so existing non-reserving callers are source-compatible and unchanged in behavior. (Because std::function types can't carry default arguments, every registered converter's signature gains the param.)

HOST converters (behavior change) — the 5 HOST-target converters (convert_gpu_to_host, convert_host_to_host, convert_gpu_to_host_fast, convert_host_fast_to_host_fast, convert_disk_to_host_data) now pass the reservation into allocate_multiple_blocks(size, reservation).

GPU/DISK converters (no behavior change) — the 6 GPU/DISK-target converters take the param [[maybe_unused]]; they allocate via get_default_allocator(), not allocate_multiple_blocks.

Tests — added a regression test ("HOST converter draws from caller reservation (no double-count)") that reserves N, converts GPU→HOST with the reservation, and asserts the conversion adds ≤ one block and total committed stays well below 2N.

Addresses sirius-db/sirius#566

@copy-pr-bot

copy-pr-bot Bot commented Jun 26, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@mbrobbel

Copy link
Copy Markdown
Member

/ok to test 741892d

@Jedi18

Jedi18 commented Jun 26, 2026

Copy link
Copy Markdown
Author

CI seems to be failing on the setup pixi step?

@mbrobbel

Copy link
Copy Markdown
Member

CI seems to be failing on the setup pixi step?

#153

@Jedi18 Jedi18 force-pushed the fix_reservation_converter branch from 741892d to 4c35ce8 Compare June 29, 2026 15:49
@mbrobbel

Copy link
Copy Markdown
Member

/ok to test 4c35ce8

Comment thread include/cucascade/data/data_batch.hpp Outdated
void convert_to(representation_converter_registry& registry,
const memory::memory_space* target_memory_space,
rmm::cuda_stream_view stream);
rmm::cuda_stream_view stream,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this APIs are problematic imp. you pass a memory_resource from memory_space, and you can pass one from reservation. I'd add multiple methods.

clone_to(stream, reservation)
clone_to(stream, memory_space)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, added the two overloads

…oads

Addresses the PR NVIDIA#152 review on data_batch: convert_to/clone_to and
convert<T>() took both a memory_space* and an optional reservation*, which is
redundant (a reservation already carries its own memory_space) and let callers
pass a mismatched (space, reservation) pair that only threw at runtime inside
allocate_multiple_blocks.

Replace the combined "(space, stream, reservation=nullptr)" shape with two
overloads, each with a single unambiguous source of the target location +
allocator:
- convert<T>(source, reservation&, stream), convert_to(registry, reservation&,
  stream), clone_to(registry, id, reservation&, stream): derive the target space
  from reservation.get_memory_space() and draw the allocation from the
  reservation (unrepresentable to pass a mismatched pair).
- convert<T>(source, memory_space*, stream) and the memory_space forms of
  convert_to/clone_to: the no-reservation path (unchanged behavior).

The internal representation_converter_fn signature is unchanged (source, space,
stream, reservation*); both overloads funnel into convert_impl with &reservation
or nullptr. The runtime type_index convert(...) overload stays reservation-free
(used by the bandwidth profiler). A shared install_converted_representation
helper holds the GPU-sync tail common to both convert_to overloads. Regression
test updated to the reservation overload.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Jedi18 Jedi18 requested a review from aminaramoon July 1, 2026 08:14
@mbrobbel

mbrobbel commented Jul 1, 2026

Copy link
Copy Markdown
Member

/ok to test f3dae32

@Jedi18

Jedi18 commented Jul 1, 2026

Copy link
Copy Markdown
Author

#155 for followup work

* fresh capacity, preventing double-counting against the target pool. May be nullptr.
* @return A new idata_representation of the registered target type (always unique_ptr)
*
* @note Because std::function cannot carry default arguments, every registered converter must

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps using optional in that case ? which carries the intent more clearly that this can be optional

@Jedi18 Jedi18 Jul 3, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude pointed out that if we wrap it into an optional this would lead to the issue of double nullability - an empty optional and a null pointer both mean "none". I think keeping it as a pointer would be better since nullptr already represents it well enough in this case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants