[ExecuTorch][WebGPU] Dynamic resize hook for slice_copy (dynamic start/end) by JulianCloudNTH · Pull Request #20581 · pytorch/executorch

JulianCloudNTH · 2026-06-28T16:22:42Z

Stack from ghstack (oldest at bottom):

Make slice_copy support a dynamic gather range so the RoPE-freqs slice [input_pos : input_pos + S] works under one dynamic graph.

Problem: the static slice handler read start via a scalar reader that throws on a SymInt and ignored end (output length baked AOT). The RoPE-freqs slice uses a SymInt input_pos for start and a live S for the range, so the static op could neither build nor resize for it.

Solution: read start/end as possibly-dynamic SymInts and add a resize hook that recomputes the gather offset and live output length each step.

Before: start is a static scalar (SymInt throws); end ignored; output length fixed at the serialized max.
After: start/end read via a SymInt-aware reader; a hook recomputes out[dim] = (end - start + step - 1) / step, rewrites out_meta/in_meta/params UBOs + the dispatch count, and sets the output cur_dims.

Implementation:

Hook registered on the start/end value-ids when they are SymInts and on the input tensor always (inert until resized, so a static slice is byte-identical).
Output/input TensorMeta rebuilt from live dims; dim/step stay static.
Keep the uniforms alive via own_uniform_buffer so the hook can rewrite them.
Mirrors Vulkan resize_slice_copy_node.

Constraints: fp32-only; dim/step static; numerics + layout unchanged; inert on a static graph.

NOTE (stacking): this diff sits on top of the in-review slice_copy op (D108793168); rebase onto it once that op lands on master.

Co-authored-with: Claude Code.
@exported-using-ghexport

Differential Revision: D109906092

[ghstack-poisoned]

pytorch-bot · 2026-06-28T16:22:47Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20581

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 99a8cbe with merge base db5f1ba ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-06-28T16:23:25Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

JulianCloudNTH · 2026-06-29T20:21:41Z

@claude review and check for any areas or opportunities for modularization

claude · 2026-06-29T20:22:29Z

Claude finished @JulianCloudNTH's task in 2m 32s —— View job

Code Review: Dynamic resize hook for `slice_copy`

Read PR diff and full Slice.cpp
Check WebGPUGraph resize API (read_symint, add_resize_hook, add_tensor_resize_hook, own_uniform_buffer, propagate_resize)
Compare against Vulkan resize_slice_copy_node
Identify correctness issues
Identify modularization opportunities

Overall this is clean and matches the established WebGPU dynamic-shape pattern well (SymInt hooks → tensor hooks, idempotent recompute, own_uniform_buffer to keep UBOs writable). The hook is correctly inert on a static graph. A few items below — one I'd consider a real (latent) correctness bug, the rest are quality/modularization.

1. Output non-slice dims should be derived from live input dims (divergence from Vulkan)

Slice.cpp:224-226

std::vector<int64_t> od = g.cur_dims(out_id);   // <- seeds from output's *prior* dims
od[dim] = len;
g.set_cur_dims(out_id, od);

The Vulkan reference this claims to mirror derives the output shape from the input sizes each resize:

std::vector<int64_t> new_out_sizes = in_sizes;        // backends/vulkan/.../Slice.cpp:63-64
new_out_sizes.at(dim) = (end - start + step - 1) / step;

For slice_copy, every non-dim axis of the output is by definition equal to the input's. By seeding od from the output's prior cur_dims and only rewriting od[dim], this hook leaves the other axes stale. The in_meta it writes does track in_dims, so if any non-slice axis of the input is also dynamic, out_meta and in_meta disagree on those axes — a silent shape mismatch.

For the RoPE-freqs case only dim is dynamic, so it happens to work today, but it's fragile and diverges from the reference. Suggest mirroring Vulkan:

std::vector<int64_t> od = in_dims;   // already have `const auto& in_dims = g.cur_dims(in_id);`
od[dim] = len;

Fix this →

2. Redundant hook executions (minor; idempotent so not a bug)

The same recompute closure is registered up to three times — on start_id, end_id, and in_id. When more than one trigger is dirty in the same step (e.g. input_pos SymInt + an input resize), recompute runs 2–3× per propagate_resize. It's fully idempotent (recomputes from live state), so output is correct, just wasted wgpuQueueWriteBuffer calls. Worth a one-line comment noting the intentional redundancy, or skip if the duplicate writes are negligible.

3. Modularization opportunities (the explicit ask)

a. Extract shared slice geometry — highest value. The build path (Slice.cpp:100-116) and the hook (Slice.cpp:217-242) independently compute start/clamp and fill SliceParams (dim/start/step). A small helper would remove the duplication and make the build path and resize path provably consistent:

struct SliceGeom { std::vector<int64_t> out_dims; SliceParams params; };
SliceGeom compute_slice_geom(const std::vector<int64_t>& in_dims,
                             int64_t dim, int64_t start, int64_t end, int64_t step);

Both sites call it; the hook additionally writes the UBOs/dispatch. This also naturally fixes finding #1, since the helper would build out_dims from in_dims.

b. read_index / is_symint / norm_clamp are reusable across transfer ops. slice, and soon select/narrow/index_select, all need "read a possibly-dynamic index" + "normalize+clamp into [0, size]". Vulkan already centralizes this as normalize_idx. Consider hoisting these three free functions into a shared WebGPU helper header (e.g. alongside TensorMeta.h or a new transfer helper) rather than re-declaring them per op. Even within this file, read_index and read_scalar (Slice.cpp:34-61) share the Int → INT64_MAX→default branch — read_scalar could be expressed as the strict variant of read_index.

c. Bind-group / pipeline boilerplate (out of scope here). Slice.cpp:122-200 is ~80 lines of BGL + pipeline + bind-group setup that is near-identical to other WebGPU ops. Not for this PR, but a make_compute_pipeline(...) / make_bind_group(...) helper would shrink every op file substantially. Flagging as a follow-up direction.

Minor

Slice.cpp:76 comment still reads end unread (out shape is AOT) — now stale, end_id is read in the hook. Update or drop.
len == 0 path → om.numel == 0 → compute_1d_workgroup_count(..., 0, ...). Confirm a 0-element dispatch is a safe no-op (Vulkan tolerates it); a quick assertion/comment would document the intent.

No blocking issues beyond #1, which I'd recommend addressing before this lands given it directly contradicts the "Mirrors Vulkan resize_slice_copy_node" claim in the description.
• Base: origin/gh/JulianCloudNTH/73/base

[ghstack-poisoned]

Update

caedbba

[ghstack-poisoned]

JulianCloudNTH temporarily deployed to cadence June 28, 2026 16:22 — with GitHub Actions Inactive

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 28, 2026

Update

20ebba2

[ghstack-poisoned]

JulianCloudNTH temporarily deployed to cadence June 29, 2026 22:10 — with GitHub Actions Inactive

meta-codesync Bot added the meta-exported label Jun 29, 2026

Update

6d0d72a

[ghstack-poisoned]

JulianCloudNTH temporarily deployed to cadence June 30, 2026 02:46 — with GitHub Actions Inactive

Update

99a8cbe

[ghstack-poisoned]

JulianCloudNTH temporarily deployed to cadence June 30, 2026 21:10 — with GitHub Actions Inactive

JulianCloudNTH mentioned this pull request Jun 30, 2026

[ExecuTorch][WebGPU] 2D-fold mul + permute dispatch (lift 65535 1D cap) #20651

Open

JulianCloudNTH mentioned this pull request Jun 30, 2026

[ExecuTorch][WebGPU] Use requiredFeatures instance API on native + emscripten Dawn #20652

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ExecuTorch][WebGPU] Dynamic resize hook for slice_copy (dynamic start/end)#20581

[ExecuTorch][WebGPU] Dynamic resize hook for slice_copy (dynamic start/end)#20581
JulianCloudNTH wants to merge 4 commits into
gh/JulianCloudNTH/73/basefrom
gh/JulianCloudNTH/73/head

JulianCloudNTH commented Jun 28, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Jun 28, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 28, 2026

Uh oh!

JulianCloudNTH commented Jun 29, 2026

Uh oh!

claude Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

JulianCloudNTH commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20581

✅ No Failures

Uh oh!

github-actions Bot commented Jun 28, 2026

This PR needs a release notes: label

Uh oh!

JulianCloudNTH commented Jun 29, 2026

Uh oh!

claude Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: Dynamic resize hook for slice_copy

1. Output non-slice dims should be derived from live input dims (divergence from Vulkan)

2. Redundant hook executions (minor; idempotent so not a bug)

3. Modularization opportunities (the explicit ask)

Minor

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JulianCloudNTH commented Jun 28, 2026 •

edited

Loading

pytorch-bot Bot commented Jun 28, 2026 •

edited

Loading

This PR needs a `release notes:` label

claude Bot commented Jun 29, 2026 •

edited

Loading

Code Review: Dynamic resize hook for `slice_copy`