Skip to content

test(oracle): add cross-attention, AdaLN, timestep-embed to gradcheck+oracle#164

Merged
dndungu merged 1 commit into
mainfrom
feat/oracle-attn-adaln-timestep-t127
Jun 17, 2026
Merged

test(oracle): add cross-attention, AdaLN, timestep-embed to gradcheck+oracle#164
dndungu merged 1 commit into
mainfrom
feat/oracle-attn-adaln-timestep-t127

Conversation

@dndungu

@dndungu dndungu commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Extends the ADR-091 gradcheck + PyTorch-oracle harness with three more E127/T127.1.0a diffusion-DiT op classes — each composed from existing engine ops with an analytic backward verified against finite-difference on CPU:

  • CrossAttention — single-head scaled dot-product attention (Q,K,V; no params). torch: scaled_dot_product_attention.
  • AdaLNout = x*(1 + c@Ws) + c@Wsh modulation core (two projection params).
  • TimestepEmbedconcat(sin(t@freqs), cos(t@freqs)) sinusoidal embedding (freqs leaf).

Verified

  • TestRegistry/{CrossAttention,AdaLN,TimestepEmbed} gradcheck pass; full gradcheck + oracle registry↔torchmap lockstep green; go vet/build clean.

Coverage

With GroupNorm (already merged, #159), 4 of the 6 T127.1.0a op classes are now covered. The remaining two — Conv3D, ConvTranspose — are forward-only per ADR-092 (inference-only VAE, forward-parity not gradcheck) and do not fit this backward-checking harness; they need a separate forward-parity path (tracked follow-up, not this PR).

Companion to #159. Refs zerfoo E127.

…+oracle

Extends the ADR-091 gradcheck + PyTorch-oracle harness with three more E127
diffusion-DiT op classes (T127.1.0a), each composed from existing engine ops
with an analytic backward verified against finite-difference on CPU:

- CrossAttention: single-head scaled dot-product attention (Q,K,V; no params).
  torch: scaled_dot_product_attention.
- AdaLN: out = x*(1+c@Ws) + c@Wsh modulation core (two projection params).
- TimestepEmbed: concat(sin(t@freqs), cos(t@freqs)) sinusoidal embedding.

Verified: TestRegistry/{CrossAttention,AdaLN,TimestepEmbed} gradcheck pass;
full gradcheck + oracle registry<->torchmap lockstep green; go vet clean.

With GroupNorm (already merged), 4 of the 6 T127.1.0a op classes are now
covered. The remaining two (Conv3D, ConvTranspose) are FORWARD-ONLY per ADR-092
and do not fit a backward-checking gradcheck harness; they need a separate
forward-parity path (tracked follow-up).
@dndungu dndungu merged commit cc6948a into main Jun 17, 2026
1 check failed
@dndungu dndungu deleted the feat/oracle-attn-adaln-timestep-t127 branch June 17, 2026 07:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant