Pass memory::reservation into HOST converter allocations#152
Conversation
|
/ok to test 741892d |
|
CI seems to be failing on the setup pixi step? |
|
741892d to
4c35ce8
Compare
|
/ok to test 4c35ce8 |
| void convert_to(representation_converter_registry& registry, | ||
| const memory::memory_space* target_memory_space, | ||
| rmm::cuda_stream_view stream); | ||
| rmm::cuda_stream_view stream, |
There was a problem hiding this comment.
this APIs are problematic imp. you pass a memory_resource from memory_space, and you can pass one from reservation. I'd add multiple methods.
clone_to(stream, reservation)
clone_to(stream, memory_space)
…oads Addresses the PR NVIDIA#152 review on data_batch: convert_to/clone_to and convert<T>() took both a memory_space* and an optional reservation*, which is redundant (a reservation already carries its own memory_space) and let callers pass a mismatched (space, reservation) pair that only threw at runtime inside allocate_multiple_blocks. Replace the combined "(space, stream, reservation=nullptr)" shape with two overloads, each with a single unambiguous source of the target location + allocator: - convert<T>(source, reservation&, stream), convert_to(registry, reservation&, stream), clone_to(registry, id, reservation&, stream): derive the target space from reservation.get_memory_space() and draw the allocation from the reservation (unrepresentable to pass a mismatched pair). - convert<T>(source, memory_space*, stream) and the memory_space forms of convert_to/clone_to: the no-reservation path (unchanged behavior). The internal representation_converter_fn signature is unchanged (source, space, stream, reservation*); both overloads funnel into convert_impl with &reservation or nullptr. The runtime type_index convert(...) overload stays reservation-free (used by the bandwidth profiler). A shared install_converted_representation helper holds the GPU-sync tail common to both convert_to overloads. Regression test updated to the reservation overload. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
/ok to test f3dae32 |
|
#155 for followup work |
| * fresh capacity, preventing double-counting against the target pool. May be nullptr. | ||
| * @return A new idata_representation of the registered target type (always unique_ptr) | ||
| * | ||
| * @note Because std::function cannot carry default arguments, every registered converter must |
There was a problem hiding this comment.
perhaps using optional in that case ? which carries the intent more clearly that this can be optional
There was a problem hiding this comment.
Claude pointed out that if we wrap it into an optional this would lead to the issue of double nullability - an empty optional and a null pointer both mean "none". I think keeping it as a pointer would be better since nullptr already represents it well enough in this case?
Summary
The converter API had no
memory::reservation, so converters were double counting the capacity in the__allocated_bytes, causing artificialrmm::out_of_memoryunder tight capacity.This PR threads an optional
memory::reservation*from the conversion entry points down to the HOST allocator so the conversion draws down the existing reservation instead of committing fresh capacity.Root cause
The HOST converters called allocate_multiple_blocks(size) without a reservation. That path adds the full allocation size to the pool's
_allocated_bytescounter — the same counter reserve() already charged. So reserve-then-convert charged N twice. The fix is to pass the reservation:allocate_multiple_blocks(size, reservation*)charges nothing extra when the allocation fits the reservation. The overload already existed; converters just never got a reservation to hand it.Changes
Converter API — added a memory::reservation* parameter to
representation_converter_fn,convert<T>(), the runtime convert(...) overload,convert_impl, and data_batch'sconvert_to/clone_tomethods. The parameter defaults to nullptr on every public entry point, so existing non-reserving callers are source-compatible and unchanged in behavior. (Because std::function types can't carry default arguments, every registered converter's signature gains the param.)HOST converters (behavior change) — the 5 HOST-target converters (
convert_gpu_to_host, convert_host_to_host, convert_gpu_to_host_fast, convert_host_fast_to_host_fast, convert_disk_to_host_data) now pass the reservation intoallocate_multiple_blocks(size, reservation).GPU/DISK converters (no behavior change) — the 6 GPU/DISK-target converters take the param [[maybe_unused]]; they allocate via get_default_allocator(), not allocate_multiple_blocks.
Tests — added a regression test ("HOST converter draws from caller reservation (no double-count)") that reserves N, converts GPU→HOST with the reservation, and asserts the conversion adds ≤ one block and total committed stays well below 2N.
Addresses sirius-db/sirius#566