Skip to content

add Qwen3 VL tests, update logit checker, and golden logit generator#4330

Draft
subawocit wants to merge 6 commits into
mainfrom
update-logit-checker
Draft

add Qwen3 VL tests, update logit checker, and golden logit generator#4330
subawocit wants to merge 6 commits into
mainfrom
update-logit-checker

Conversation

@subawocit

Copy link
Copy Markdown
Collaborator

Description

This PR adds logit checking and end-to-end integration validation support for Qwen3-VL models. It extends both the golden logit generation utility and the forward-pass logit checker to handle multimodal components for Qwen3-VL models. An end-to-end shell script was also added to compare MaxText logits with Hugging Face golden logits.

generate_hf_golden_logits.py

  • Supported loading Qwen3VLForConditionalGeneration model class for "qwen3-vl" model identifiers.
  • Added pixel values saving logic for Qwen3-VL models.

forward_pass_logit_checker.py

  • Added example useages following the discussion from PR 4310.
  • Added reshape logic for Qwen3 pixel values. Reshapes visual patches from flat layout back to the 4D shape (channels, time * temporal_patch_size, height * patch_size, width * patch_size).
  • Added 3D position ID generation (M-RoPE index) via processor_qwen3_omni.get_rope_index when use_mrope is enabled for Qwen3 models.
  • Added masking for vision placeholder tokens (vision_start, vision_end, image_pad, video_pad) during KL divergence calculation. Since the model is not trained to predict next token from raw vision placeholders, their logits can be random and vary numerically. Masking them prevents visual placeholders from skewing the max KL divergence score, and focuses the comparison on actual language-generation logits.

test_qwen3_vl_2b_to_hf_e2e.sh

  • Added tests/end_to_end/tpu/qwen3/vl_2b/test_qwen3_vl_2b_to_hf_e2e.sh following test_gemma4_to_hf.sh:
    1. Converts Qwen3-VL-2B-Instruct Hugging Face original checkpoint to Maxtext.
    2. Converts Qwen3-VL-2B-Instruct MaxText checkpoint to Hugging Face format.
    3. Generates golden logits using the original HF Qwen3-VL-2B-Instruct model for comparison.
    4. Runs the forward pass logit checker to verify that whether the MaxText model's outputs closely match the HF golden logits within the specified KL divergence threshold (--max_kl_div=0.1).

Tests

Define variables: HF_TOKEN, MODEL_BUCKET, LOCAL_PATH, USE_MULTIMODAL, USE_SCAN_LAYERS in tests/end_to_end/tpu/qwen3/vl_2b/test_qwen3_vl_2b_to_hf_e2e.sh

bash tests/end_to_end/tpu/qwen3/vl_2b/test_qwen3_vl_2b_to_hf_e2e.sh

Test Results

Text-only input (USE_MULTIMODAL=false)
INFO:absl:[process=0] [sync] Finished load in 8.83 seconds @ gs://yuchenhou-maxtext-logs/checkpoints/qwen3-vl-2b/unscanned/2026-07-01-18-23/0/items
INFO:absl:
--- Prompt: I love to ---
INFO:absl:
--- MaxText model top 10 tokens ---
INFO:absl:| Token ID   | Token                | Score      |
|------------|----------------------|------------|
| 8180       | eat                  | 22.5420    |
| 1349       | read                 | 22.3564    |
| 4296       | cook                 | 22.0374    |
| 728        | go                   | 21.8470    |
| 1486       | play                 | 21.8353    |
| 3270       | write                | 21.8291    |
| 5821       | travel               | 21.7823    |
| 3736       | watch                | 21.6665    |
| 13186      | explore              | 21.5567    |
| 1281       | make                 | 21.5405    |

INFO:absl:
--- HF model top 10 tokens ---
INFO:absl:| Token ID   | Token                | Score      |
|------------|----------------------|------------|
| 8180       | eat                  | 22.5487    |
| 1349       | read                 | 22.3641    |
| 4296       | cook                 | 22.0673    |
| 1486       | play                 | 21.8461    |
| 728        | go                   | 21.8436    |
| 3270       | write                | 21.8413    |
| 5821       | travel               | 21.8039    |
| 3736       | watch                | 21.6521    |
| 13186      | explore              | 21.5848    |
| 1281       | make                 | 21.5531    |

INFO:absl:
--- Similarity Metrics of Top Tokens ---
INFO:absl:| Metric                         | Value                |
|--------------------------------|----------------------|
| overlap_count                  | 10/10                |
| jaccard_similarity             | 1.0                  |
| rank_agreement_percentage      | 80.0                 |

INFO:absl:
Average KL divergence per token (D_KL(P_golden || Q_model)): 1.0004e-04
INFO:absl:Per-token KL Divergences: 
['2.0459e-04', '4.0602e-05', '5.4927e-05']
INFO:absl:
Max KL divergence for a single token in the set: 2.0459e-04
INFO:absl:
--- Prompt: Today is a ---
INFO:absl:
--- MaxText model top 10 tokens ---
INFO:absl:| Token ID   | Token                | Score      |
|------------|----------------------|------------|
| 3281       | special              | 20.7137    |
| 1661       | good                 | 20.4657    |
| 7270       | Sunday               | 20.2309    |
| 13257      | holiday              | 20.2303    |
| 1602       | very                 | 20.1504    |
| 6602       | Friday               | 20.1455    |
| 1899       | day                  | 20.1153    |
| 7728       | Saturday             | 20.0202    |
| 39698      | sunny                | 19.8981    |
| 7014       | Monday               | 19.8644    |

INFO:absl:
--- HF model top 10 tokens ---
INFO:absl:| Token ID   | Token                | Score      |
|------------|----------------------|------------|
| 3281       | special              | 20.6899    |
| 1661       | good                 | 20.4342    |
| 13257      | holiday              | 20.2157    |
| 7270       | Sunday               | 20.2105    |
| 6602       | Friday               | 20.1267    |
| 1602       | very                 | 20.1233    |
| 1899       | day                  | 20.0998    |
| 7728       | Saturday             | 19.9995    |
| 39698      | sunny                | 19.8598    |
| 7014       | Monday               | 19.8466    |

INFO:absl:
--- Similarity Metrics of Top Tokens ---
INFO:absl:| Metric                         | Value                |
|--------------------------------|----------------------|
| overlap_count                  | 10/10                |
| jaccard_similarity             | 1.0                  |
| rank_agreement_percentage      | 60.0                 |

INFO:absl:
Average KL divergence per token (D_KL(P_golden || Q_model)): 7.7883e-04
INFO:absl:Per-token KL Divergences: 
['2.1909e-03', '9.6892e-05', '4.8693e-05']
INFO:absl:
Max KL divergence for a single token in the set: 2.1909e-03
INFO:absl:
--- Prompt: What is the ---
INFO:absl:
--- MaxText model top 10 tokens ---
INFO:absl:| Token ID   | Token                | Score      |
|------------|----------------------|------------|
| 829        | name                 | 25.1269    |
| 3476       | role                 | 25.0997    |
| 1887       | main                 | 24.9784    |
| 7428       | purpose              | 24.9392    |
| 25361      | significance         | 24.6545    |
| 6028       | primary              | 24.2031    |
| 2265       | title                | 24.2027    |
| 6672       | difference           | 23.6845    |
| 5025       | relationship         | 23.6137    |
| 14806      | formula              | 23.4353    |

INFO:absl:
--- HF model top 10 tokens ---
INFO:absl:| Token ID   | Token                | Score      |
|------------|----------------------|------------|
| 829        | name                 | 25.1467    |
| 3476       | role                 | 25.1241    |
| 1887       | main                 | 24.9939    |
| 7428       | purpose              | 24.9528    |
| 25361      | significance         | 24.6584    |
| 2265       | title                | 24.2286    |
| 6028       | primary              | 24.2195    |
| 6672       | difference           | 23.7052    |
| 5025       | relationship         | 23.6169    |
| 14806      | formula              | 23.4388    |

INFO:absl:
--- Similarity Metrics of Top Tokens ---
INFO:absl:| Metric                         | Value                |
|--------------------------------|----------------------|
| overlap_count                  | 10/10                |
| jaccard_similarity             | 1.0                  |
| rank_agreement_percentage      | 80.0                 |

INFO:absl:
Average KL divergence per token (D_KL(P_golden || Q_model)): 1.2265e-03
INFO:absl:Per-token KL Divergences: 
['3.6423e-03', '5.0167e-06', '3.2072e-05']
INFO:absl:
Max KL divergence for a single token in the set: 3.6423e-03
Text and image input (USE_MULTIMODAL=true)
--- Comparing forward pass for golden data index: 0 ---
INFO:absl:config.global_batch_size_to_train_on=4
INFO:absl:pixel_values.shape = (1360, 1536)
INFO:absl: prompt="<|im_start|>user
<|vision_start|><|image_pad|><|vision_end|>Describe this image<|im_end|>
<|im_start|>assistant
" raw ids=[151644    872    198 151652 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151655 151655 151655 151655 151655 151655
 151655 151655 151655 151655 151653  74785    419   2168 151645    198
 151644  77091    198], logits.shape = (353, 151936)
INFO:absl:maxtext forward pass
INFO:absl:
[logits: token 2]
INFO:absl:golden_logits_slice[2]=array([12.931317, 13.484477, 16.856901, ...,  1.990514,  1.990514,
        1.990514], dtype=float32)
INFO:absl:train_logits_slice[2]=array([12.94264 , 13.495019, 16.869377, ...,  1.988577,  1.988577,
        1.988577], dtype=float32)
INFO:absl:
[numerical difference]
Max absolute difference: 2.2326e+01 at index (Array(101, dtype=int32), Array(99488, dtype=int32))
  (Train: -8.6359e+00, Golden: 1.3690e+01)
Max relative difference: 7.1453e+05 at index (Array(102, dtype=int32), Array(147597, dtype=int32))
  (Train: -1.0222e+00, Golden: -1.4305e-06)
INFO:absl:
[probability: token 1]
INFO:absl:golden_probabilities[1]=Array([2.2498674e-09, 1.4269050e-11, 8.2976987e-07, ..., 1.1374250e-12,
       1.1374250e-12, 1.1374250e-12], dtype=float32)
INFO:absl:model_probabilities[1]=Array([2.2730862e-09, 1.4259688e-11, 8.3804821e-07, ..., 1.1365343e-12,
       1.1365343e-12, 1.1365343e-12], dtype=float32)
INFO:absl:
[KL divergence]
KL divergence = [1.0378430e-06 6.7985662e-10 6.8534757e-07 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 1.9606786e-04 8.2948734e-04 1.1934385e-06 6.0532159e-14 2.3219182e-10
 3.2614307e-06 1.5081502e-08 3.3456541e-07], max KL divergence = 0.0008294873405247927 at index 346, the corresponding token id is 419
INFO:absl:Checking KL Divergence between train distribution and golden distribution against threshold 0.1.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@codecov

codecov Bot commented Jul 1, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant