backends/mlx: runtime MoE expert-sort for decode (issue #20554) by AxelNoun · Pull Request #20685 · pytorch/executorch

AxelNoun · 2026-07-02T03:40:57Z

Summary

Replace the compile-time sort_experts: bool flag in SwitchMLP with a runtime decision inside two new custom ops (moe_gather_inputs, moe_scatter_outputs). A single exported .pte now handles both prefill (sorted, coalesced gather_mm) and decode (unsorted, no argsort overhead) without separate exports.

Key changes:

schema.fbs: sorted_indices: bool → IntOrVid (required) on GatherMmNode/GatherQmmNode; required fields before optionals
MLXInterpreter.h: resolve_int(n.sorted_indices, st) != 0 (cf. kth)
custom_ops.py: moe_gather_inputs, moe_scatter_outputs; gather_mm/gather_qmm sorted_indices: Optional[Tensor]
ops.py: new MoE handlers + updated gather handlers for IntOrVid
switch.py: sort_cutoff replaces compile-time sort branch
test_ops.py: MoE + GatherMm/GatherQmm tests with sorted_indices=Tensor configs

MLXLoader.{h,cpp} and FlatBuffer bindings are regenerated automatically by generate.py + flatc during the CMake build on Mac CI — they are not included in this commit, per repo convention.

Test plan

Windows: python backends/mlx/test/validate_moe_20554.py (all passed)
CI: test-mlx job on macos-14-xlarge (run_all_tests — covers gather_mm, gather_qmm, moe_gather_inputs, moe_scatter_outputs)

Fixes #20554

PR authored with Claude.

pytorch-bot · 2026-07-02T03:41:00Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20685

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 86ee91c with merge base 0f3303f ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-cla · 2026-07-02T03:41:02Z

Hi @AxelNoun!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

linux-foundation-easycla · 2026-07-02T03:41:03Z

✅ login: cursoragent / name: Cursor (86ee91c)
❌ - login: @AxelNoun / name: Axel.Cffrd.Dnty. The commit (86ee91c) is not authorized under a signed CLA. Please click here to be authorized. For further assistance with EasyCLA, please visit our EasyCLA portal and chat with our support bot.

github-actions · 2026-07-02T03:41:41Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Replace the compile-time sort_experts: bool flag in SwitchMLP with a runtime decision made inside two new custom ops (moe_gather_inputs, moe_scatter_outputs). A single exported .pte now handles both prefill (sorted, coalesced gather_mm) and decode (unsorted, no argsort overhead) without requiring separate exports. Changes: - schema.fbs: sorted_indices: bool -> IntOrVid (required) on GatherMmNode/GatherQmmNode; required fields moved before optionals - MLXInterpreter.h: resolve_int(n.sorted_indices, st) != 0 (cf. kth) - custom_ops.py: moe_gather_inputs, moe_scatter_outputs + register_fake; gather_mm/gather_qmm sorted_indices: bool -> Optional[Tensor] - op_helpers.py: emit_floordiv helper (alongside emit_ceil_div) - ops.py: _moe_gather_inputs_handler, _moe_scatter_outputs_handler; updated _gather_mm/_gather_qmm handlers for IntOrVid - switch.py: SwitchMLP gains sort_cutoff; forward replaces if/else block with the two new ops; SwitchLinear sorted_indices: bool -> Optional[Tensor] - mlx_source_transformations.py + export.py: sort_experts -> sort_cutoff - test_ops.py: MoeGatherInputsTest, MoeScatterOutputsTest with expected_node_counts; GatherMmTest/GatherQmmTest extended for sorted_indices=Tensor configs Test plan: - Windows: python backends/mlx/test/validate_moe_20554.py (all passed) - CI: test-mlx job on macos-14-xlarge (run_all_tests) Fixes pytorch#20554 PR authored with Claude. Co-authored-by: Cursor <cursoragent@cursor.com>

AxelNoun force-pushed the moe-runtime-sort-20554 branch from d709ef4 to 86ee91c Compare July 2, 2026 03:44

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

backends/mlx: runtime MoE expert-sort for decode (issue #20554)#20685

backends/mlx: runtime MoE expert-sort for decode (issue #20554)#20685
AxelNoun wants to merge 1 commit into
pytorch:mainfrom
AxelNoun:moe-runtime-sort-20554

AxelNoun commented Jul 2, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Jul 2, 2026 •

edited

Loading

Uh oh!

meta-cla Bot commented Jul 2, 2026

Uh oh!

linux-foundation-easycla Bot commented Jul 2, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

AxelNoun commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

pytorch-bot Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20685

✅ No Failures

Uh oh!

meta-cla Bot commented Jul 2, 2026

Action Required

Process

Uh oh!

linux-foundation-easycla Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jul 2, 2026

This PR needs a release notes: label

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AxelNoun commented Jul 2, 2026 •

edited

Loading

pytorch-bot Bot commented Jul 2, 2026 •

edited

Loading

linux-foundation-easycla Bot commented Jul 2, 2026 •

edited

Loading

This PR needs a `release notes:` label