Skip to content

Fix/verification#1441

Open
huseyincavusbi wants to merge 6 commits into
TransformerLensOrg:devfrom
huseyincavusbi:fix/verification
Open

Fix/verification#1441
huseyincavusbi wants to merge 6 commits into
TransformerLensOrg:devfrom
huseyincavusbi:fix/verification

Conversation

@huseyincavusbi

Copy link
Copy Markdown
Contributor

Description

Fixes three verification bugs found during Gemma4 adapter testing:

  • Component benchmark false failures — Models using DelegatedAttentionBlockBridge (Gemma4) reported 81-96 component failures per model, dragging P1 from 100% to 50%. These are benchmark infrastructure failures. The component comparison can't call delegated attention/rotary/PLE modules in isolation because they require model-specific kwargs the benchmark doesn't provide. Added skip logic in component_outputs.py that detects DelegatedAttentionBlockBridge (via missing hook_q_input in hook aliases) and skips untestable components (attn, rotary_emb, per_layer projections). The gold-standard forward_pass_logits test was already passing. This just stops the benchmark from reporting false negatives.

  • Phase 2 empty headermain_benchmark.py printed a "PHASE 2:" header with zero tests on every run regardless of whether Phase 2 was selected. Moved the header inside the should_run_phase(2) guard.

  • Misleading "encoder-decoder" warning — The HF model loading log printed "for encoder-decoder model" for multimodal architectures using AutoModelForImageTextToText. Removed the incorrect label.

Verification (tiny-random/gemma-4-e):

  • Dev branch: 15/64 component failures
  • Fix branch: 0/31 component failures (all untestable components skipped)
  • Unit tests: 167/167 passed,

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • My changes generate no new warnings
  • New and existing unit tests pass locally with my changes
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

@jlarson4

Copy link
Copy Markdown
Collaborator

Looks good! One small suggestion: _is_delegated_block() detects delegation by checking that hook_q_input is missing from the block's hook aliases, which is a bit indirect. The benchmark already skips attention via an explicit flag (maintain_native_attention / requires_position_embeddings), you could set maintain_native_attention=True on DelegatedAttentionBlockBridge, which would then reuse the existing skip for that case, rather than adding the hook-alias check?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants