Skip to content

Add experimental attention quantization flags.#4321

Open
copybara-service[bot] wants to merge 1 commit into
mainfrom
test_940686402
Open

Add experimental attention quantization flags.#4321
copybara-service[bot] wants to merge 1 commit into
mainfrom
test_940686402

Conversation

@copybara-service

Copy link
Copy Markdown
Contributor

Add experimental attention quantization flags.

This will add two experimental flags to the splash attention config, to
quantize Q and K respectively. Attention quantization is currently a research
project so these should be turned off by default. In the future we will support
more quantization options for the backwards pass, RoPE vs. no-RoPE, more
dtypes, etc.

@codecov

codecov Bot commented Jul 1, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/maxtext/layers/attention_mla.py 0.00% 2 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

This will add two experimental flags to the splash attention config, to
quantize Q and K respectively. Attention quantization is currently a research
project so these should be turned off by default. In the future we will support
more quantization options for the backwards pass, RoPE vs. no-RoPE, more
dtypes, etc.

PiperOrigin-RevId: 940686402
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant