feat(blockchain): pre-build proposer block one interval early#445
feat(blockchain): pre-build proposer block one interval early#445MegaRedHand wants to merge 21 commits into
Conversation
Split out of #445 (proposer pre-build). `gate_proposer` was a one-line wrapper over `duties_allowed()` with a single call site. This inlines it as `scheduled_proposer.filter(|_| self.sync_status.duties_allowed())` and removes the method, so the sync gate reads the same way everywhere. **Behavior-preserving** — `proposer_validator_id` is computed identically; every downstream use (the skip-while-syncing log, the `store::on_tick` proposal flag, the proposal call) is untouched. The three `gate_proposer`-only unit tests collapse onto the `duties_allowed()` assertions they already made. Stacked under the on_tick-regrouping PR; #445 will drop these hunks once both land. Co-authored-by: Pablo Deymonnaz <pdeymon@fi.uba.ar>
on_tick ran its per-interval blocks out of order (4-snapshot, tick, 2, 0, 1). Regroup them into ascending interval order behind `==== interval N ====` markers so the slot timeline reads top to bottom and matches the duty schedule. Pure reorder, behavior unchanged. The interval-4 new_payloads snapshot is the one block that stays ahead of store::on_tick: the interval-4 tick promotes new_payloads out, so it cannot move into a post-tick group. A comment now pins that constraint.
…ick flag Compute scheduled_proposer just before store::on_tick and pass the raw is_proposer (= scheduled_proposer.is_some()) to it; gate the actual proposal on duties_allowed() at the call site instead. While syncing and scheduled to propose, on_tick now accepts attestations early at interval 0 (it did not before); the proposal itself is still skipped.
d4a251e to
4955800
Compare
At interval 4, if one of our validators proposes the next slot, build and sign its block synchronously on the actor so the heavy leanVM aggregation (~1-2s) is done before interval 0. The proposer then only has to publish. Shares one block-build core (produce_block_on_head) across the prebuild and normal proposal paths; a publish-time usability gate falls back to a fresh build if the prebuilt block is stale.
4955800 to
2c6eb1c
Compare
Split out of #445 (proposer pre-build). **Stacked on #449** (`gate_proposer` removal) — review/merge that first; this PR's base retargets to `main` once #449 lands. `on_tick` ran its per-interval blocks out of order (interval-4 snapshot, then tick, then 2, 0, 1). This regroups them into ascending interval order behind `==== interval N ====` markers so the slot timeline reads top-to-bottom and lines up with the duty schedule. **Pure reorder, behavior unchanged.** ### One block intentionally does *not* move The interval-4 `new_payloads` snapshot stays **ahead** of `store::on_tick`. At interval 4 the tick runs `accept_new_attestations → promote_new_aggregated_payloads()`, which drains `new_payloads`; snapshotting after the tick would capture an already-drained set and silently break the post-block coverage report's "timely" cohort. A comment now pins that ordering constraint. > Note: commit `7474c15` on #445 currently moves this snapshot *into* the post-tick interval-4 group, which hits exactly that regression (observability-only, so devnet doesn't surface it). #445 should adopt this PR's placement when it rebases.
Greptile SummaryThis PR reworks block proposal timing: instead of building and signing the next slot's block at interval 0 of that slot, the actor now does it at interval 4 of the previous slot, then waits out the remainder of the interval before publishing at the slot boundary. A companion fix removes the
Confidence Score: 5/5Safe to merge; the core build-at-interval-4 logic is sound, the finalization fix is correct, and devnet validation showed zero stale discards and zero skipped ticks. The refactor is well-structured with clear comments explaining every invariant. The single-threaded actor guarantees cited in the code are accurate for the current architecture, making the rebuild guard genuinely inert today. The only findings are a key-advance ordering nit and a missing must_use annotation. crates/blockchain/src/lib.rs — the key-advance ordering and the ignored process_and_publish_block return value are both in this file, though neither blocks the change.
|
| Filename | Overview |
|---|---|
| crates/blockchain/src/lib.rs | Core change: propose_block moved to async, runs at interval 4 of the previous slot, and internally calls store::on_tick to advance the store to the next slot's interval 0 before building; small key-advance ordering change and ignored process_and_publish_block return value are worth a look. |
| crates/blockchain/src/store.rs | Removes get_proposal_head (which ticked the store clock prematurely) and renames produce_block_with_signatures to produce_block_on_head; caller now owns the on_tick call and provides head_root explicitly. |
| crates/blockchain/src/block_builder.rs | Adds build_overran_publish_window helper and unit tests; straightforward arithmetic predicate, well-covered by tests. |
| CLAUDE.md | Documentation update reflecting the new slot-interval schedule; accurately describes the merged build+publish at interval 4 and the no-op interval 0. |
| docs/infographics/ethlambda_architecture.html | Trivial rename of produce_block_with_signatures to produce_block_on_head in the architecture diagram. |
Sequence Diagram
%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant HT as handle_tick
participant PB as propose_block
participant ST as store
participant KM as key_manager
participant P2P as gossip
Note over HT: Slot S, Interval 4
HT->>ST: "on_tick(interval-4 timestamp, has_proposal=false)"
Note over ST: Promotes attestations
HT->>HT: get_our_proposer(S+1)
HT->>PB: propose_block(S+1, validator_id)
PB->>ST: "on_tick(slot_S+1_start_ms, has_proposal=true)"
Note over ST: Advances to slot S+1 interval 0
PB->>ST: store.head() parent_root
PB->>KM: build_signed_block(S+1, validator_id, parent_root)
Note over KM: advance_key(S+1) + sign inline
PB->>PB: build_overran_publish_window()?
alt build finished early
PB->>PB: tokio::time::sleep(remaining_ms)
end
PB->>ST: process_block(signed_block)
PB->>P2P: publish_block(signed_block)
PB-->>HT: return
HT->>KM: advance_keys_to(S+1) -- no-op, already advanced
Note over HT: Slot S+1, Interval 0
HT->>ST: on_tick(slot_S+1_start_ms, is_proposer)
Note over ST: Idempotency guard -- no-op
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant HT as handle_tick
participant PB as propose_block
participant ST as store
participant KM as key_manager
participant P2P as gossip
Note over HT: Slot S, Interval 4
HT->>ST: "on_tick(interval-4 timestamp, has_proposal=false)"
Note over ST: Promotes attestations
HT->>HT: get_our_proposer(S+1)
HT->>PB: propose_block(S+1, validator_id)
PB->>ST: "on_tick(slot_S+1_start_ms, has_proposal=true)"
Note over ST: Advances to slot S+1 interval 0
PB->>ST: store.head() parent_root
PB->>KM: build_signed_block(S+1, validator_id, parent_root)
Note over KM: advance_key(S+1) + sign inline
PB->>PB: build_overran_publish_window()?
alt build finished early
PB->>PB: tokio::time::sleep(remaining_ms)
end
PB->>ST: process_block(signed_block)
PB->>P2P: publish_block(signed_block)
PB-->>HT: return
HT->>KM: advance_keys_to(S+1) -- no-op, already advanced
Note over HT: Slot S+1, Interval 0
HT->>ST: on_tick(slot_S+1_start_ms, is_proposer)
Note over ST: Idempotency guard -- no-op
Reviews (2): Last reviewed commit: "refactor: inline function" | Re-trigger Greptile
#450 (the on_tick interval regrouping) landed in main via squash, so both sides added the same interval markers/blocks relative to the pre-#450 merge-base. Git kept both with no textual conflict, duplicating the interval 2/3/4 sections (start_aggregation_session would have run twice). Removed the duplicate set, keeping a single interval 0-4 sequence with the prebuild trigger in interval 4.
c5812bb to
d940222
Compare
…t start The proposer pre-built its block at the previous slot's interval 4 and stashed it for the interval-0 tick to publish. But the build takes longer than one 0.8s interval, so it overruns the slot boundary; handle_tick's overrun catch-up (#413) then re-ticks at the current interval and skips the missed interval-0. The stashed block was never published and block production halted (head frozen, every proposer blocked by its own prebuild). Consolidate the proposal into a single path that runs entirely at interval 4: build on the current head, align publication to the slot boundary (sleep out the remainder if the build finished early, publish immediately if it overran), revalidate the head and rebuild if it moved, then publish. This never depends on the interval-0 tick firing. Remove the now-dead dual-path machinery: the prepared_block stash, PreparedBlock/prebuilt_block_is_usable, the interval-0 propose_block, and the orphaned produce_block_with_signatures/get_proposal_head clock-ticking build path.
Review follow-up to the interval-4 proposal consolidation; no behavior change: - Restore the interval-0 section marker in on_tick with a note that the block is built and published at interval 4 of the previous slot. - Re-comment the head-revalidation guard in propose_block: under the single-threaded actor it always passes today (no message interleaves mid-build, even across the align-sleep), and is kept as a safety net for if the build is ever moved off the actor thread. - Fix stale references to the removed prebuild_block / synchronous proposal path / produce_block_with_signatures in doc comments, the CLAUDE.md tick-duties table, and the architecture infographic.
With block proposal moved to interval 4 of the previous slot, the interval-0 accept_new_attestations fired after the block had already been built and published: too late to be included in it, and it diverged the proposer's fork-choice view from the one its block closed over. Accepting should happen immediately before the build, which the unconditional interval-4 accept already does (it runs at the top of the same tick, just before propose_block). Comment out the interval-0 accept and its gate. has_proposal is now unused but kept in the signature (discarded via `let _`) so re-enabling needs no call-site change.
Follow-up to disabling the proposer's interval-0 accept_new_attestations: attestation promotion now happens only at interval 4, so the tick-duties table and the attestation-pipeline note are updated to match.
The block is published at interval 0 (the slot boundary). The build+publish code path is merged into the previous slot's interval 4 and aligned to publish at the boundary, but conceptually publication belongs to interval 0. Also reflect that the proposer's interval-0 accept_new_attestations was dropped, so promotion now happens only at interval 4.
Move the proposer's interval-0 deviation out of store::on_tick and into the actor. Commenting out store::on_tick's interval-0 attestation accept (f34e81c) changed a function the conformance tests drive directly; revert store::on_tick to its spec-faithful form. Instead, propose_block (which runs at the previous slot's interval 4) now advances the store to the target slot's interval 0 via store::on_tick(.., true) — accepting attestations exactly as the real interval-0 tick would — before building, so the block closes over the same interval-0 state a non-prebuilding proposer would see. Publication is still aligned to the slot boundary, and the real interval-0 tick is skipped by the idempotency guard since the store clock is already advanced.
🤖 Codex Code Review
Aside from those two points, the helper split ( I couldn’t run Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
🤖 Claude Code ReviewHere is my review of PR #445. Review: feat(blockchain): pre-build proposer block one interval earlyThis PR shifts the expensive leanVM block-building work from interval 0 (where it blocked the first attestation window) to interval 4 of the previous slot. The core machinery — Four findings survived verification, ranked by severity:
|
build_signed_block and assemble_signed_block were each single-call-site helpers extracted during development; folding them back into propose_block keeps the build/sign/merge body inline where it lived before this branch, so the diff reads as the prebuild + slot-alignment change rather than a structural split. Drops the dead justified-slot return value, scopes the block-building timing guard tightly via an explicit drop before the alignment wait, and corrects the now-stale process_and_publish_block doc.
What
Proposers do the heavy leanVM block-building work (attestation compaction + the Type-1 → Type-2 merge, ~1–2s) synchronously at interval 0 today, squeezed into the slot's first interval. This builds the next slot's block one interval early — at the previous slot's interval 4, synchronously on the blockchain actor — and publishes it at interval 0 after a revalidation. On any mismatch or failure it falls back to the existing synchronous build, so there is no regression.
How
prebuild_blockbuilds + signs + merges the full block against the current (read-only) head and stashes it as aPreparedBlock.propose_blockpublishes the prepared block iff it is still valid (prebuilt_block_is_usable: same slot/proposer, parent still the canonical head, and the built justified slot not behind the store's). Otherwise it rebuilds synchronously.assemble_signed_block/process_and_publish_blockhelpers back both the pre-build and the fallback path.sign_block_roottakes&mut self), so the build cannot be moved off-thread without risking OTS-index reuse.Finalization fix (included)
Capturing the pre-build head via
get_proposal_head— which is not read-only, it ticks the store clock — one interval early advanced the clock prematurely and stalled finalization. Fixed by capturing the head read-only viastore.head().Validation (local devnet: 4 ethlambda nodes, 1 aggregator)
Tradeoff
Building synchronously blocks the actor for the build duration at interval 4. This is acceptable: between interval 4 and the next slot the actor has no other consensus-critical duty, and the devnet run showed zero skipped ticks.
Tests
make fmt+cargo clippy -p ethlambda-blockchain --all-targets -- -D warningsclean; lib tests pass, including the newblock_builder::prebuild_testscovering the usability predicate.