Skip to content

feat(blockchain): pre-build proposer block one interval early#445

Open
MegaRedHand wants to merge 21 commits into
mainfrom
feat/proposer-prebuild-aggregation
Open

feat(blockchain): pre-build proposer block one interval early#445
MegaRedHand wants to merge 21 commits into
mainfrom
feat/proposer-prebuild-aggregation

Conversation

@MegaRedHand

Copy link
Copy Markdown
Collaborator

What

Proposers do the heavy leanVM block-building work (attestation compaction + the Type-1 → Type-2 merge, ~1–2s) synchronously at interval 0 today, squeezed into the slot's first interval. This builds the next slot's block one interval early — at the previous slot's interval 4, synchronously on the blockchain actor — and publishes it at interval 0 after a revalidation. On any mismatch or failure it falls back to the existing synchronous build, so there is no regression.

How

  • Interval 4 (of slot S-1): if one of our validators proposes slot S, prebuild_block builds + signs + merges the full block against the current (read-only) head and stashes it as a PreparedBlock.
  • Interval 0 (of slot S): propose_block publishes the prepared block iff it is still valid (prebuilt_block_is_usable: same slot/proposer, parent still the canonical head, and the built justified slot not behind the store's). Otherwise it rebuilds synchronously.
  • Shared assemble_signed_block / process_and_publish_block helpers back both the pre-build and the fallback path.
  • Signing stays on the actor throughout: XMSS keys are stateful (sign_block_root takes &mut self), so the build cannot be moved off-thread without risking OTS-index reuse.

Finalization fix (included)

Capturing the pre-build head via get_proposal_head — which is not read-only, it ticks the store clock — one interval early advanced the clock prematurely and stalled finalization. Fixed by capturing the head read-only via store.head().

Validation (local devnet: 4 ethlambda nodes, 1 aggregator)

  • Finalization healthy (finalized lag ~3 slots).
  • Near-100% pre-build hit rate; zero stale discards, zero skipped ticks, zero errors.

Tradeoff

Building synchronously blocks the actor for the build duration at interval 4. This is acceptable: between interval 4 and the next slot the actor has no other consensus-critical duty, and the devnet run showed zero skipped ticks.

Tests

make fmt + cargo clippy -p ethlambda-blockchain --all-targets -- -D warnings clean; lib tests pass, including the new block_builder::prebuild_tests covering the usability predicate.

MegaRedHand added a commit that referenced this pull request Jun 22, 2026
Split out of #445 (proposer pre-build).

`gate_proposer` was a one-line wrapper over `duties_allowed()` with a
single call site. This inlines it as `scheduled_proposer.filter(|_|
self.sync_status.duties_allowed())` and removes the method, so the sync
gate reads the same way everywhere.

**Behavior-preserving** — `proposer_validator_id` is computed
identically; every downstream use (the skip-while-syncing log, the
`store::on_tick` proposal flag, the proposal call) is untouched.

The three `gate_proposer`-only unit tests collapse onto the
`duties_allowed()` assertions they already made.

Stacked under the on_tick-regrouping PR; #445 will drop these hunks once
both land.

Co-authored-by: Pablo Deymonnaz <pdeymon@fi.uba.ar>
on_tick ran its per-interval blocks out of order (4-snapshot, tick, 2, 0,
1). Regroup them into ascending interval order behind `==== interval N ====`
markers so the slot timeline reads top to bottom and matches the duty
schedule.

Pure reorder, behavior unchanged. The interval-4 new_payloads snapshot is
the one block that stays ahead of store::on_tick: the interval-4 tick
promotes new_payloads out, so it cannot move into a post-tick group. A
comment now pins that constraint.
…ick flag

Compute scheduled_proposer just before store::on_tick and pass the raw
is_proposer (= scheduled_proposer.is_some()) to it; gate the actual
proposal on duties_allowed() at the call site instead. While syncing and
scheduled to propose, on_tick now accepts attestations early at interval 0
(it did not before); the proposal itself is still skipped.
@MegaRedHand MegaRedHand force-pushed the feat/proposer-prebuild-aggregation branch from d4a251e to 4955800 Compare June 22, 2026 16:34
@MegaRedHand MegaRedHand changed the base branch from main to refactor/group-on-tick-by-interval June 22, 2026 16:34
At interval 4, if one of our validators proposes the next slot, build and
sign its block synchronously on the actor so the heavy leanVM aggregation
(~1-2s) is done before interval 0. The proposer then only has to publish.

Shares one block-build core (produce_block_on_head) across the prebuild and
normal proposal paths; a publish-time usability gate falls back to a fresh
build if the prebuilt block is stale.
@MegaRedHand MegaRedHand force-pushed the feat/proposer-prebuild-aggregation branch from 4955800 to 2c6eb1c Compare June 22, 2026 16:42
MegaRedHand added a commit that referenced this pull request Jun 22, 2026
Split out of #445 (proposer pre-build). **Stacked on #449**
(`gate_proposer` removal) — review/merge that first; this PR's base
retargets to `main` once #449 lands.

`on_tick` ran its per-interval blocks out of order (interval-4 snapshot,
then tick, then 2, 0, 1). This regroups them into ascending interval
order behind `==== interval N ====` markers so the slot timeline reads
top-to-bottom and lines up with the duty schedule.

**Pure reorder, behavior unchanged.**

### One block intentionally does *not* move
The interval-4 `new_payloads` snapshot stays **ahead** of
`store::on_tick`. At interval 4 the tick runs `accept_new_attestations →
promote_new_aggregated_payloads()`, which drains `new_payloads`;
snapshotting after the tick would capture an already-drained set and
silently break the post-block coverage report's "timely" cohort. A
comment now pins that ordering constraint.

> Note: commit `7474c15` on #445 currently moves this snapshot *into*
the post-tick interval-4 group, which hits exactly that regression
(observability-only, so devnet doesn't surface it). #445 should adopt
this PR's placement when it rebases.
Base automatically changed from refactor/group-on-tick-by-interval to main June 22, 2026 16:50
@MegaRedHand MegaRedHand marked this pull request as ready for review June 22, 2026 16:51
@greptile-apps

greptile-apps Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR reworks block proposal timing: instead of building and signing the next slot's block at interval 0 of that slot, the actor now does it at interval 4 of the previous slot, then waits out the remainder of the interval before publishing at the slot boundary. A companion fix removes the get_proposal_head call (which ticked the store clock prematurely) and replaces it with a read-only store.head() lookup, resolving a finalization stall.

  • The old propose_block / produce_block_with_signatures monolith is split into three focused helpers: build_signed_block, assemble_signed_block, and process_and_publish_block, with propose_block now async to accommodate the slot-boundary sleep.
  • store::produce_block_on_head replaces the removed produce_block_with_signatures, taking an explicit head_root instead of advancing the store clock internally.
  • A new build_overran_publish_window predicate (with unit tests) determines whether the build finished before or after the target slot opened, driving the publish-or-wait decision.

Confidence Score: 5/5

Safe to merge; the core build-at-interval-4 logic is sound, the finalization fix is correct, and devnet validation showed zero stale discards and zero skipped ticks.

The refactor is well-structured with clear comments explaining every invariant. The single-threaded actor guarantees cited in the code are accurate for the current architecture, making the rebuild guard genuinely inert today. The only findings are a key-advance ordering nit and a missing must_use annotation.

crates/blockchain/src/lib.rs — the key-advance ordering and the ignored process_and_publish_block return value are both in this file, though neither blocks the change.

Important Files Changed

Filename Overview
crates/blockchain/src/lib.rs Core change: propose_block moved to async, runs at interval 4 of the previous slot, and internally calls store::on_tick to advance the store to the next slot's interval 0 before building; small key-advance ordering change and ignored process_and_publish_block return value are worth a look.
crates/blockchain/src/store.rs Removes get_proposal_head (which ticked the store clock prematurely) and renames produce_block_with_signatures to produce_block_on_head; caller now owns the on_tick call and provides head_root explicitly.
crates/blockchain/src/block_builder.rs Adds build_overran_publish_window helper and unit tests; straightforward arithmetic predicate, well-covered by tests.
CLAUDE.md Documentation update reflecting the new slot-interval schedule; accurately describes the merged build+publish at interval 4 and the no-op interval 0.
docs/infographics/ethlambda_architecture.html Trivial rename of produce_block_with_signatures to produce_block_on_head in the architecture diagram.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant HT as handle_tick
    participant PB as propose_block
    participant ST as store
    participant KM as key_manager
    participant P2P as gossip

    Note over HT: Slot S, Interval 4
    HT->>ST: "on_tick(interval-4 timestamp, has_proposal=false)"
    Note over ST: Promotes attestations
    HT->>HT: get_our_proposer(S+1)
    HT->>PB: propose_block(S+1, validator_id)
    PB->>ST: "on_tick(slot_S+1_start_ms, has_proposal=true)"
    Note over ST: Advances to slot S+1 interval 0
    PB->>ST: store.head() parent_root
    PB->>KM: build_signed_block(S+1, validator_id, parent_root)
    Note over KM: advance_key(S+1) + sign inline
    PB->>PB: build_overran_publish_window()?
    alt build finished early
        PB->>PB: tokio::time::sleep(remaining_ms)
    end
    PB->>ST: process_block(signed_block)
    PB->>P2P: publish_block(signed_block)
    PB-->>HT: return
    HT->>KM: advance_keys_to(S+1) -- no-op, already advanced
    Note over HT: Slot S+1, Interval 0
    HT->>ST: on_tick(slot_S+1_start_ms, is_proposer)
    Note over ST: Idempotency guard -- no-op
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant HT as handle_tick
    participant PB as propose_block
    participant ST as store
    participant KM as key_manager
    participant P2P as gossip

    Note over HT: Slot S, Interval 4
    HT->>ST: "on_tick(interval-4 timestamp, has_proposal=false)"
    Note over ST: Promotes attestations
    HT->>HT: get_our_proposer(S+1)
    HT->>PB: propose_block(S+1, validator_id)
    PB->>ST: "on_tick(slot_S+1_start_ms, has_proposal=true)"
    Note over ST: Advances to slot S+1 interval 0
    PB->>ST: store.head() parent_root
    PB->>KM: build_signed_block(S+1, validator_id, parent_root)
    Note over KM: advance_key(S+1) + sign inline
    PB->>PB: build_overran_publish_window()?
    alt build finished early
        PB->>PB: tokio::time::sleep(remaining_ms)
    end
    PB->>ST: process_block(signed_block)
    PB->>P2P: publish_block(signed_block)
    PB-->>HT: return
    HT->>KM: advance_keys_to(S+1) -- no-op, already advanced
    Note over HT: Slot S+1, Interval 0
    HT->>ST: on_tick(slot_S+1_start_ms, is_proposer)
    Note over ST: Idempotency guard -- no-op
Loading

Reviews (2): Last reviewed commit: "refactor: inline function" | Re-trigger Greptile

Comment thread crates/blockchain/src/lib.rs Outdated
Comment thread crates/blockchain/src/lib.rs Outdated
Comment thread crates/blockchain/src/lib.rs
#450 (the on_tick interval regrouping) landed in main via squash, so both
sides added the same interval markers/blocks relative to the pre-#450
merge-base. Git kept both with no textual conflict, duplicating the
interval 2/3/4 sections (start_aggregation_session would have run twice).
Removed the duplicate set, keeping a single interval 0-4 sequence with the
prebuild trigger in interval 4.
@MegaRedHand MegaRedHand force-pushed the feat/proposer-prebuild-aggregation branch from c5812bb to d940222 Compare June 22, 2026 17:15
@MegaRedHand MegaRedHand marked this pull request as draft June 22, 2026 21:20
…t start

The proposer pre-built its block at the previous slot's interval 4 and
stashed it for the interval-0 tick to publish. But the build takes longer
than one 0.8s interval, so it overruns the slot boundary; handle_tick's
overrun catch-up (#413) then re-ticks at the current interval and skips
the missed interval-0. The stashed block was never published and block
production halted (head frozen, every proposer blocked by its own prebuild).

Consolidate the proposal into a single path that runs entirely at
interval 4: build on the current head, align publication to the slot
boundary (sleep out the remainder if the build finished early, publish
immediately if it overran), revalidate the head and rebuild if it moved,
then publish. This never depends on the interval-0 tick firing.

Remove the now-dead dual-path machinery: the prepared_block stash,
PreparedBlock/prebuilt_block_is_usable, the interval-0 propose_block, and
the orphaned produce_block_with_signatures/get_proposal_head clock-ticking
build path.
Review follow-up to the interval-4 proposal consolidation; no behavior change:
- Restore the interval-0 section marker in on_tick with a note that the
  block is built and published at interval 4 of the previous slot.
- Re-comment the head-revalidation guard in propose_block: under the
  single-threaded actor it always passes today (no message interleaves
  mid-build, even across the align-sleep), and is kept as a safety net for
  if the build is ever moved off the actor thread.
- Fix stale references to the removed prebuild_block / synchronous proposal
  path / produce_block_with_signatures in doc comments, the CLAUDE.md
  tick-duties table, and the architecture infographic.
With block proposal moved to interval 4 of the previous slot, the
interval-0 accept_new_attestations fired after the block had already been
built and published: too late to be included in it, and it diverged the
proposer's fork-choice view from the one its block closed over. Accepting
should happen immediately before the build, which the unconditional
interval-4 accept already does (it runs at the top of the same tick, just
before propose_block).

Comment out the interval-0 accept and its gate. has_proposal is now unused
but kept in the signature (discarded via `let _`) so re-enabling needs no
call-site change.
Follow-up to disabling the proposer's interval-0 accept_new_attestations:
attestation promotion now happens only at interval 4, so the tick-duties
table and the attestation-pipeline note are updated to match.
The block is published at interval 0 (the slot boundary). The build+publish
code path is merged into the previous slot's interval 4 and aligned to
publish at the boundary, but conceptually publication belongs to interval 0.
Also reflect that the proposer's interval-0 accept_new_attestations was
dropped, so promotion now happens only at interval 4.
Move the proposer's interval-0 deviation out of store::on_tick and into the
actor. Commenting out store::on_tick's interval-0 attestation accept (f34e81c)
changed a function the conformance tests drive directly; revert store::on_tick
to its spec-faithful form.

Instead, propose_block (which runs at the previous slot's interval 4) now
advances the store to the target slot's interval 0 via store::on_tick(.., true)
— accepting attestations exactly as the real interval-0 tick would — before
building, so the block closes over the same interval-0 state a non-prebuilding
proposer would see. Publication is still aligned to the slot boundary, and the
real interval-0 tick is skipped by the idempotency guard since the store clock
is already advanced.
@MegaRedHand MegaRedHand marked this pull request as ready for review June 24, 2026 20:53
@github-actions

Copy link
Copy Markdown

🤖 Codex Code Review

  1. High — Prebuilding now blocks the blockchain actor for the rest of interval 4 and even across the boundary sleep, so the code no longer builds on the true slot-boundary view it claims to use. handle_tick awaits the prebuild at lib.rs, propose_block explicitly relies on the actor processing no other messages during build/sleep at lib.rs, and gossip/block ingestion is on the same actor at lib.rs. As a result, late blocks or attestations arriving after interval 4 starts cannot affect the manual store::on_tick(..., slot_start_ms, true) at lib.rs. That can make the proposer build on a stale fork-choice head and miss attestations that the old interval-0 path would have seen.

  2. Medium — The failure path advances store.time() into the next slot before the wall-clock boundary and then returns immediately. store::on_tick(..., slot_start_ms, true) runs first at lib.rs, but a build failure exits at lib.rs. Future-slot attestation validation uses store.time() at store.rs, so if prebuild fails early, next-slot attestations can be accepted up to one interval early. That is a consensus-time validation bug on the proposer-failure path.

Aside from those two points, the helper split (build_signed_block / assemble_signed_block / process_and_publish_block) is cleaner and the XMSS aggregation changes look mechanically sound.

I couldn’t run cargo test here because cargo tried to sync the Rust toolchain into read-only /home/runner/.rustup.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@github-actions

Copy link
Copy Markdown

🤖 Claude Code Review

Here is my review of PR #445.


Review: feat(blockchain): pre-build proposer block one interval early

This PR shifts the expensive leanVM block-building work from interval 0 (where it blocked the first attestation window) to interval 4 of the previous slot. The core machinery — propose_block, build_signed_block, process_and_publish_block, and the build_overran_publish_window alignment helper — is sound, and the finalization-clock-advance bug fix (replacing get_proposal_head with a plain store.head()) is correct and well-motivated. The devnet validation is reassuring.

Four findings survived verification, ranked by severity:


crates/blockchain/src/lib.rs:685 — Unconditional signed_block.clone() wasted on every build failure

process_and_publish_block clones the block up front and passes the clone to process_block:

if let Err(err) = self.process_block(signed_block.clone()) {
    ...
    return false;
}
// original signed_block used for publish_block here

If process_block returns Err, the function returns early and the retained original signed_block is dropped without ever reaching publish_block. SignedBlock holds a MultiMessageAggregate backed by a ByteList512KiB — the clone can be several KB of XMSS proof material. Swap the roles: pass ownership to process_block and clone for the publish path (on the success path only), or restructure so the clone is made after confirming local import succeeded.


crates/blockchain/src/lib.rs:328–330 — Silent proposal skip when node is syncing

The old interval-0 code had an explicit else branch logging "Skipping block proposal while syncing". The new interval-4 code folds the sync guard into a .filter() with no fallback:

let next_proposer = self
    .get_our_proposer(next_slot)
    .filter(|_| self.sync_status.duties_allowed());

When a validator is elected proposer but duties_allowed() returns false, the slot is silently skipped. The attestation path at interval 1 (lines 293–297) uses an explicit else if branch with info!(%slot, "Skipping attestations while syncing") as the established pattern. An operator diagnosing missing proposals during a sync stall gets no signal. Add a log (gated on !self.key_manager.validator_ids().is_empty(), matching the attestation pattern) when a proposer is elected but suppressed by sync.


crates/blockchain/src/lib.rs:439–441unix_now_ms() sampled twice in the sleep path

if !build_overran_publish_window(unix_now_ms(), genesis_time_ms, slot) {
    let wait_ms = slot_start_ms.saturating_sub(unix_now_ms());
    tokio::time::sleep(Duration::from_millis(wait_ms)).await;
}

unix_now_ms() wraps SystemTime::UNIX_EPOCH.elapsed() — a non-monotonic wall clock. If the system clock is stepped backward by NTP between the two calls (possible on a validator host under heavy leanVM load), the second sample yields a smaller timestamp than the first, causing saturating_sub to compute a larger-than-correct wait_ms and delaying publication past the slot boundary. Capture once:

let now_ms = unix_now_ms();
if !build_overran_publish_window(now_ms, genesis_time_ms, slot) {
    let wait_ms = slot_start_ms.saturating_sub(now_ms);
    tokio::time::sleep(Duration::from_millis(wait_ms)).await;
}

crates/blockchain/src/lib.rs:683,456,459published_msg: &'static str is dead variation

Both call sites pass the identical literal:

self.process_and_publish_block(slot, validator_id, signed_block, "Published block");
// ...
self.process_and_publish_block(slot, validator_id, rebuilt, "Published block");

The parameter exists to vary the log message but never does. Remove it and hardcode "Published block" inside the function. If the rebuild path ever needs a distinct message, add an info! at the call site before the call — tunneling it as a parameter adds signature noise for no benefit.


Automated review by Claude (Anthropic) · sonnet · custom prompt

build_signed_block and assemble_signed_block were each single-call-site
helpers extracted during development; folding them back into propose_block
keeps the build/sign/merge body inline where it lived before this branch,
so the diff reads as the prebuild + slot-alignment change rather than a
structural split. Drops the dead justified-slot return value, scopes the
block-building timing guard tightly via an explicit drop before the
alignment wait, and corrects the now-stale process_and_publish_block doc.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant