Skip to content

backends/mlx: runtime MoE expert-sort for decode (issue #20554)#20685

Open
AxelNoun wants to merge 1 commit into
pytorch:mainfrom
AxelNoun:moe-runtime-sort-20554
Open

backends/mlx: runtime MoE expert-sort for decode (issue #20554)#20685
AxelNoun wants to merge 1 commit into
pytorch:mainfrom
AxelNoun:moe-runtime-sort-20554

Conversation

@AxelNoun

@AxelNoun AxelNoun commented Jul 2, 2026

Copy link
Copy Markdown

Summary

Replace the compile-time sort_experts: bool flag in SwitchMLP with a runtime decision inside two new custom ops (moe_gather_inputs, moe_scatter_outputs). A single exported .pte now handles both prefill (sorted, coalesced gather_mm) and decode (unsorted, no argsort overhead) without separate exports.

Key changes:

  • schema.fbs: sorted_indices: boolIntOrVid (required) on GatherMmNode/GatherQmmNode; required fields before optionals
  • MLXInterpreter.h: resolve_int(n.sorted_indices, st) != 0 (cf. kth)
  • custom_ops.py: moe_gather_inputs, moe_scatter_outputs; gather_mm/gather_qmm sorted_indices: Optional[Tensor]
  • ops.py: new MoE handlers + updated gather handlers for IntOrVid
  • switch.py: sort_cutoff replaces compile-time sort branch
  • test_ops.py: MoE + GatherMm/GatherQmm tests with sorted_indices=Tensor configs

MLXLoader.{h,cpp} and FlatBuffer bindings are regenerated automatically by generate.py + flatc during the CMake build on Mac CI — they are not included in this commit, per repo convention.

Test plan

  • Windows: python backends/mlx/test/validate_moe_20554.py (all passed)
  • CI: test-mlx job on macos-14-xlarge (run_all_tests — covers gather_mm, gather_qmm, moe_gather_inputs, moe_scatter_outputs)

Fixes #20554

PR authored with Claude.

@pytorch-bot

pytorch-bot Bot commented Jul 2, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20685

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 86ee91c with merge base 0f3303f (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla

meta-cla Bot commented Jul 2, 2026

Copy link
Copy Markdown

Hi @AxelNoun!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

@linux-foundation-easycla

linux-foundation-easycla Bot commented Jul 2, 2026

Copy link
Copy Markdown

CLA Not Signed

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Replace the compile-time sort_experts: bool flag in SwitchMLP with a
runtime decision made inside two new custom ops (moe_gather_inputs,
moe_scatter_outputs). A single exported .pte now handles both prefill
(sorted, coalesced gather_mm) and decode (unsorted, no argsort overhead)
without requiring separate exports.

Changes:
- schema.fbs: sorted_indices: bool -> IntOrVid (required) on
  GatherMmNode/GatherQmmNode; required fields moved before optionals
- MLXInterpreter.h: resolve_int(n.sorted_indices, st) != 0 (cf. kth)
- custom_ops.py: moe_gather_inputs, moe_scatter_outputs + register_fake;
  gather_mm/gather_qmm sorted_indices: bool -> Optional[Tensor]
- op_helpers.py: emit_floordiv helper (alongside emit_ceil_div)
- ops.py: _moe_gather_inputs_handler, _moe_scatter_outputs_handler;
  updated _gather_mm/_gather_qmm handlers for IntOrVid
- switch.py: SwitchMLP gains sort_cutoff; forward replaces if/else block
  with the two new ops; SwitchLinear sorted_indices: bool -> Optional[Tensor]
- mlx_source_transformations.py + export.py: sort_experts -> sort_cutoff
- test_ops.py: MoeGatherInputsTest, MoeScatterOutputsTest with
  expected_node_counts; GatherMmTest/GatherQmmTest extended for
  sorted_indices=Tensor configs

Test plan:
- Windows: python backends/mlx/test/validate_moe_20554.py (all passed)
- CI: test-mlx job on macos-14-xlarge (run_all_tests)

Fixes pytorch#20554

PR authored with Claude.

Co-authored-by: Cursor <cursoragent@cursor.com>
@AxelNoun AxelNoun force-pushed the moe-runtime-sort-20554 branch from d709ef4 to 86ee91c Compare July 2, 2026 03:44
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Good First Issue: Runtime MoE expert-sort for decode (MLX backend, Qwen 3.5 MoE)

1 participant