docs(getting_started): standardize on CUDA thread/kernel terminology#397
Open
samlaf wants to merge 1 commit into
Open
docs(getting_started): standardize on CUDA thread/kernel terminology#397samlaf wants to merge 1 commit into
samlaf wants to merge 1 commit into
Conversation
The getting started guide used "invocation" / "multiple invocations of a
kernel" to describe parallel execution. That vocabulary comes from the
GLSL/SPIR-V/Vulkan compute world — natural for the rust-gpu org, whose
flagship project targets SPIR-V, where "invocation" is the official term
for a single execution instance of a shader (exactly what CUDA calls a
thread). The doc was even internally consistent about it.
The problem is that "invocation" means something different in CUDA-native
usage: a "kernel invocation" is the `<<<>>>` launch — one invocation per
launch, which then spawns many threads. So a reader coming from CUDA C++
parses "multiple invocations running in parallel" as multiple grid
launches on streams, a separate concept. Since this is rust-cuda, that
collision is a real cost.
Standardize on CUDA's own model — kernel = the `__global__` function,
thread = the unit of parallel execution, launch = the `<<<>>>` call —
and drop "invocation" entirely. This is unambiguous in a CUDA context and
easier to follow for beginners.
Also fixes the outright error "mutable state shared by multiple kernels
executing in parallel" (there is one kernel; threads execute it), and two
incidental typos in the Blocks bullet ("that it execute" → "that execute
together", "blocks index avaiable" → "block's index available").
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Human Note
Found the getting_started a bit confusing as a gpu learner, as it uses nomenclature that doesn't match cuda's docs. Wondering whether this is intended to make this docs use similar wording as the broader rust-gpu effort, but given that this is rust-cuda, figured being closer to cuda usage of the words would make more sense?
LLM generated
The getting started guide used "invocation" / "multiple invocations of a kernel" to describe parallel execution. That vocabulary comes from the GLSL/SPIR-V/Vulkan compute world — natural for the rust-gpu org, whose flagship project targets SPIR-V, where "invocation" is the official term for a single execution instance of a shader (exactly what CUDA calls a thread). The doc was even internally consistent about it.
The problem is that "invocation" means something different in CUDA-native usage: a "kernel invocation" is the
<<<>>>launch — one invocation per launch, which then spawns many threads. So a reader coming from CUDA C++ parses "multiple invocations running in parallel" as multiple grid launches on streams, a separate concept. Since this is rust-cuda, that collision is a real cost.Standardize on CUDA's own model — kernel = the
__global__function, thread = the unit of parallel execution, launch = the<<<>>>call — and drop "invocation" entirely. This is unambiguous in a CUDA context and easier to follow for beginners.Also fixes the outright error "mutable state shared by multiple kernels executing in parallel" (there is one kernel; threads execute it), and two incidental typos in the Blocks bullet ("that it execute" → "that execute together", "blocks index avaiable" → "block's index available").