add Qwen3 VL tests, update logit checker, and golden logit generator by subawocit · Pull Request #4330 · AI-Hypercomputer/maxtext

subawocit · 2026-07-01T23:21:11Z

Description

This PR adds logit checking and end-to-end integration validation support for Qwen3-VL models. It extends both the golden logit generation utility and the forward-pass logit checker to handle multimodal components for Qwen3-VL models. An end-to-end shell script was also added to compare MaxText logits with Hugging Face golden logits.

`generate_hf_golden_logits.py`

Supported loading Qwen3VLForConditionalGeneration model class for "qwen3-vl" model identifiers.
Added pixel values saving logic for Qwen3-VL models.

`forward_pass_logit_checker.py`

Added example useages following the discussion from PR 4310.
Added reshape logic for Qwen3 pixel values. Reshapes visual patches from flat layout back to the 4D shape (channels, time * temporal_patch_size, height * patch_size, width * patch_size).
Added 3D position ID generation (M-RoPE index) via processor_qwen3_omni.get_rope_index when use_mrope is enabled for Qwen3 models.
Added masking for vision placeholder tokens (vision_start, vision_end, image_pad, video_pad) during KL divergence calculation. Since the model is not trained to predict next token from raw vision placeholders, their logits can be random and vary numerically. Masking them prevents visual placeholders from skewing the max KL divergence score, and focuses the comparison on actual language-generation logits.

`test_qwen3_vl_2b_to_hf_e2e.sh`

Added tests/end_to_end/tpu/qwen3/vl_2b/test_qwen3_vl_2b_to_hf_e2e.sh following test_gemma4_to_hf.sh:
1. Converts Qwen3-VL-2B-Instruct Hugging Face original checkpoint to Maxtext.
2. Converts Qwen3-VL-2B-Instruct MaxText checkpoint to Hugging Face format.
3. Generates golden logits using the original HF Qwen3-VL-2B-Instruct model for comparison.
4. Runs the forward pass logit checker to verify that whether the MaxText model's outputs closely match the HF golden logits within the specified KL divergence threshold (--max_kl_div=0.1).

Tests

Define variables: HF_TOKEN, MODEL_BUCKET, LOCAL_PATH, USE_MULTIMODAL, USE_SCAN_LAYERS in tests/end_to_end/tpu/qwen3/vl_2b/test_qwen3_vl_2b_to_hf_e2e.sh

bash tests/end_to_end/tpu/qwen3/vl_2b/test_qwen3_vl_2b_to_hf_e2e.sh

Test Results

Text-only input (USE_MULTIMODAL=false)

INFO:absl:[process=0] [sync] Finished load in 8.83 seconds @ gs://yuchenhou-maxtext-logs/checkpoints/qwen3-vl-2b/unscanned/2026-07-01-18-23/0/items
INFO:absl:
--- Prompt: I love to ---
INFO:absl:
--- MaxText model top 10 tokens ---
INFO:absl:| Token ID   | Token                | Score      |
|------------|----------------------|------------|
| 8180       | eat                  | 22.5420    |
| 1349       | read                 | 22.3564    |
| 4296       | cook                 | 22.0374    |
| 728        | go                   | 21.8470    |
| 1486       | play                 | 21.8353    |
| 3270       | write                | 21.8291    |
| 5821       | travel               | 21.7823    |
| 3736       | watch                | 21.6665    |
| 13186      | explore              | 21.5567    |
| 1281       | make                 | 21.5405    |

INFO:absl:
--- HF model top 10 tokens ---
INFO:absl:| Token ID   | Token                | Score      |
|------------|----------------------|------------|
| 8180       | eat                  | 22.5487    |
| 1349       | read                 | 22.3641    |
| 4296       | cook                 | 22.0673    |
| 1486       | play                 | 21.8461    |
| 728        | go                   | 21.8436    |
| 3270       | write                | 21.8413    |
| 5821       | travel               | 21.8039    |
| 3736       | watch                | 21.6521    |
| 13186      | explore              | 21.5848    |
| 1281       | make                 | 21.5531    |

INFO:absl:
--- Similarity Metrics of Top Tokens ---
INFO:absl:| Metric                         | Value                |
|--------------------------------|----------------------|
| overlap_count                  | 10/10                |
| jaccard_similarity             | 1.0                  |
| rank_agreement_percentage      | 80.0                 |

INFO:absl:
Average KL divergence per token (D_KL(P_golden || Q_model)): 1.0004e-04
INFO:absl:Per-token KL Divergences: 
['2.0459e-04', '4.0602e-05', '5.4927e-05']
INFO:absl:
Max KL divergence for a single token in the set: 2.0459e-04
INFO:absl:
--- Prompt: Today is a ---
INFO:absl:
--- MaxText model top 10 tokens ---
INFO:absl:| Token ID   | Token                | Score      |
|------------|----------------------|------------|
| 3281       | special              | 20.7137    |
| 1661       | good                 | 20.4657    |
| 7270       | Sunday               | 20.2309    |
| 13257      | holiday              | 20.2303    |
| 1602       | very                 | 20.1504    |
| 6602       | Friday               | 20.1455    |
| 1899       | day                  | 20.1153    |
| 7728       | Saturday             | 20.0202    |
| 39698      | sunny                | 19.8981    |
| 7014       | Monday               | 19.8644    |

INFO:absl:
--- HF model top 10 tokens ---
INFO:absl:| Token ID   | Token                | Score      |
|------------|----------------------|------------|
| 3281       | special              | 20.6899    |
| 1661       | good                 | 20.4342    |
| 13257      | holiday              | 20.2157    |
| 7270       | Sunday               | 20.2105    |
| 6602       | Friday               | 20.1267    |
| 1602       | very                 | 20.1233    |
| 1899       | day                  | 20.0998    |
| 7728       | Saturday             | 19.9995    |
| 39698      | sunny                | 19.8598    |
| 7014       | Monday               | 19.8466    |

INFO:absl:
--- Similarity Metrics of Top Tokens ---
INFO:absl:| Metric                         | Value                |
|--------------------------------|----------------------|
| overlap_count                  | 10/10                |
| jaccard_similarity             | 1.0                  |
| rank_agreement_percentage      | 60.0                 |

INFO:absl:
Average KL divergence per token (D_KL(P_golden || Q_model)): 7.7883e-04
INFO:absl:Per-token KL Divergences: 
['2.1909e-03', '9.6892e-05', '4.8693e-05']
INFO:absl:
Max KL divergence for a single token in the set: 2.1909e-03
INFO:absl:
--- Prompt: What is the ---
INFO:absl:
--- MaxText model top 10 tokens ---
INFO:absl:| Token ID   | Token                | Score      |
|------------|----------------------|------------|
| 829        | name                 | 25.1269    |
| 3476       | role                 | 25.0997    |
| 1887       | main                 | 24.9784    |
| 7428       | purpose              | 24.9392    |
| 25361      | significance         | 24.6545    |
| 6028       | primary              | 24.2031    |
| 2265       | title                | 24.2027    |
| 6672       | difference           | 23.6845    |
| 5025       | relationship         | 23.6137    |
| 14806      | formula              | 23.4353    |

INFO:absl:
--- HF model top 10 tokens ---
INFO:absl:| Token ID   | Token                | Score      |
|------------|----------------------|------------|
| 829        | name                 | 25.1467    |
| 3476       | role                 | 25.1241    |
| 1887       | main                 | 24.9939    |
| 7428       | purpose              | 24.9528    |
| 25361      | significance         | 24.6584    |
| 2265       | title                | 24.2286    |
| 6028       | primary              | 24.2195    |
| 6672       | difference           | 23.7052    |
| 5025       | relationship         | 23.6169    |
| 14806      | formula              | 23.4388    |

INFO:absl:
--- Similarity Metrics of Top Tokens ---
INFO:absl:| Metric                         | Value                |
|--------------------------------|----------------------|
| overlap_count                  | 10/10                |
| jaccard_similarity             | 1.0                  |
| rank_agreement_percentage      | 80.0                 |

INFO:absl:
Average KL divergence per token (D_KL(P_golden || Q_model)): 1.2265e-03
INFO:absl:Per-token KL Divergences: 
['3.6423e-03', '5.0167e-06', '3.2072e-05']
INFO:absl:
Max KL divergence for a single token in the set: 3.6423e-03

Text and image input (USE_MULTIMODAL=true)

--- Comparing forward pass for golden data index: 0 ---
INFO:absl:config.global_batch_size_to_train_on=4
INFO:absl:pixel_values.shape = (1360, 1536)
INFO:absl: prompt="<|im_start|>user
<|vision_start|><|image_pad|><|vision_end|>Describe this image<|im_end|>
<|im_start|>assistant
" raw ids=[151644    872    198 151652 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151653  74785    419   2168 151645    198
 151644  77091    198], logits.shape = (353, 151936)
INFO:absl:maxtext forward pass
INFO:absl:
[logits: token 2]
INFO:absl:golden_logits_slice[2]=array([12.931317, 13.484477, 16.856901, ...,  1.990514,  1.990514,
        1.990514], dtype=float32)
INFO:absl:train_logits_slice[2]=array([12.94264 , 13.495019, 16.869377, ...,  1.988577,  1.988577,
        1.988577], dtype=float32)
INFO:absl:
[numerical difference]
Max absolute difference: 2.2326e+01 at index (Array(101, dtype=int32), Array(99488, dtype=int32))
  (Train: -8.6359e+00, Golden: 1.3690e+01)
Max relative difference: 7.1453e+05 at index (Array(102, dtype=int32), Array(147597, dtype=int32))
  (Train: -1.0222e+00, Golden: -1.4305e-06)
INFO:absl:
[probability: token 1]
INFO:absl:golden_probabilities[1]=Array([2.2498674e-09, 1.4269050e-11, 8.2976987e-07, ..., 1.1374250e-12,
       1.1374250e-12, 1.1374250e-12], dtype=float32)
INFO:absl:model_probabilities[1]=Array([2.2730862e-09, 1.4259688e-11, 8.3804821e-07, ..., 1.1365343e-12,
       1.1365343e-12, 1.1365343e-12], dtype=float32)
INFO:absl:
[KL divergence]
KL divergence = [1.0378430e-06 6.7985662e-10 6.8534757e-07 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 1.9606786e-04 8.2948734e-04 1.1934385e-06 6.0532159e-14 2.3219182e-10
 3.2614307e-06 1.5081502e-08 3.3456541e-07], max KL divergence = 0.0008294873405247927 at index 346, the corresponding token id is 419
INFO:absl:Checking KL Divergence between train distribution and golden distribution against threshold 0.1.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-07-01T23:25:46Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

add Qwen3 VL tests, update logit checker, and golden logit generator

7c52501

subawocit added 5 commits July 1, 2026 23:57

add Qwen3 VL tests, update logit checker, and golden logit generator

310e248

Fix pyink formatting

6be9015

Fix trailing whitespace and pyink list formatting

372def3

Fix final pyink indentation and comma

dbdf879

Fix pyink indentation

22cb2b0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add Qwen3 VL tests, update logit checker, and golden logit generator#4330

add Qwen3 VL tests, update logit checker, and golden logit generator#4330
subawocit wants to merge 6 commits into
mainfrom
update-logit-checker

subawocit commented Jul 1, 2026

Uh oh!

codecov Bot commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

subawocit commented Jul 1, 2026

Description

generate_hf_golden_logits.py

forward_pass_logit_checker.py

test_qwen3_vl_2b_to_hf_e2e.sh

Tests

Test Results

Checklist

Uh oh!

codecov Bot commented Jul 1, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`generate_hf_golden_logits.py`

`forward_pass_logit_checker.py`

`test_qwen3_vl_2b_to_hf_e2e.sh`