Skip to content

feat(storage): prune only old finalized block signatures, keep blocks forever#453

Merged
MegaRedHand merged 2 commits into
mainfrom
feat/prune-block-signatures
Jun 23, 2026
Merged

feat(storage): prune only old finalized block signatures, keep blocks forever#453
MegaRedHand merged 2 commits into
mainfrom
feat/prune-block-signatures

Conversation

@MegaRedHand

@MegaRedHand MegaRedHand commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Summary

Relaxes block pruning: instead of deleting old blocks wholesale, keep block headers and bodies forever and prune only the signatures of old finalized blocks. This preserves the full block history (for debugging, re-org safety, and state reconstruction) while still reclaiming the heavy signature data (~3KB+ per block).

Change

  • Replaces prune_old_blocks (deleted headers, bodies, and signatures beyond a fixed BLOCKS_TO_KEEP window) with prune_old_block_signatures(finalized_slot, tip_slot).
  • Policy, with cutoff = tip_slot - SIGNATURE_PRUNING_RANGE:
    • healthy finality (cutoff <= finalized_slot): delete signatures for slot < cutoff (entirely within finalized history);
    • deep non-finality (non-finalized range exceeds the window): prune nothing, so non-finalized signatures are never touched.
  • BlockHeaders and BlockBodies are kept forever; all non-finalized signatures are always retained.
  • get_signed_block returns None when a signature is absent, now including a pruned finalized block (deep historical signed-block serving via BlocksByRoot is lost; peers use checkpoint sync).
  • prune_old_data derives the tip slot from the head header and runs signature pruning alongside state pruning.

Key layout (slot-ordered pruning)

  • BlockSignatures is now keyed by slot||root (big-endian slot), reusing the shared encode_slot_root_key codec also used by LiveChain.
  • Pruning iterates in slot order and stops at the first entry past the cutoff (take_while), and no longer does a per-row BlockHeaders lookup to recover the slot. Every read site already has the slot: get_signed_block loads the header first.
  • InMemoryBackend::prefix_iterator now returns keys in sorted order to match the RocksDB backend, which these slot-ordered range scans rely on.

Migration

Changes the BlockSignatures key format: existing root-keyed entries are not read after upgrade (old finalized blocks read as pruned). Fine for fresh / checkpoint-synced nodes; no backfill.

Tests

  • prune_signatures_keeps_recent_window_when_finality_healthy, prune_signatures_noop_when_non_finalized_range_exceeds_window, prune_signatures_noop_when_tip_within_window.
  • Storage lib suite green (42 tests); clippy -D warnings clean.

Context

Split out of #444 (diff-layer state storage), which depends on this: reconstructing historical states needs block headers retained. #444 stacks on top of this PR.

@github-actions

Copy link
Copy Markdown

🤖 Kimi Code Review

The PR changes block pruning to retain headers and bodies indefinitely while only pruning signatures of finalized blocks outside a retention window. This is a safer approach for consensus.

Code correctness and safety:

  1. Finality check logic (line 977-980) — The safety condition if cutoff > finalized_slot { return 0; } correctly prevents pruning of non-finalized signatures. When the non-finalized range exceeds SIGNATURE_PRUNING_RANGE, the function correctly no-ops, preserving safety during deep re-orgs.

  2. Integer safety — Use of saturating_sub (line 976) prevents underflow when tip_slot < SIGNATURE_PRUNING_RANGE.

  3. Database consistency assumption — Lines 985-988 iterate BlockSignatures and assume corresponding BlockHeaders entries exist. The code panics (expect) if lookup fails. While consistent with the existing codebase style, consider whether a corrupted database (missing header for existing signature) should panic or return an error in production.

Performance:

  1. Lookup overhead — The current implementation performs a BlockHeaders lookup for every entry in BlockSignatures during pruning (lines 985-988). For a large signature table, this is O(N) random reads. Since headers are kept forever, consider whether slot information should be cached in the signature key or a separate index to avoid the header deserialization cost during pruning.

Testing:

  1. Test coverage — The three new test cases cover:

    • Healthy finality pruning (cutoff within finalized range)
    • Deep non-finality protection (cutoff beyond finalized slot)
    • Early chain saturation (tip within window)

    This is comprehensive for the state transitions.

Documentation:

  1. CLAUDE.md update — The documentation accurately reflects the new behavior: signatures are pruned for finalized blocks below the retention window while headers/bodies remain.

Nitpick:

  1. Log message consistency — In maybe_prune_old_data (lines 805-809), the log message format changed but the variable name pruned_signatures in the log context is clear and correct.

Conclusion: The logic is sound and safely preserves consensus-critical data. The pruning strategy correctly prioritizes safety (never pruning non-finalized data) over storage efficiency.


Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

@github-actions

Copy link
Copy Markdown

🤖 Codex Code Review

Findings

  1. crates/storage/src/store.rs:993 introduces a storage contract that the rest of the codebase does not honor. After prune_old_block_signatures() deletes old BlockSignatures rows, get_signed_block() still treats any missing non-slot-0 signature as corruption and returns None at crates/storage/src/store.rs:1096-1119. That propagates directly into block-serving paths at crates/net/p2p/src/req_resp/handlers.rs:210-215 and crates/net/p2p/src/req_resp/handlers.rs:260-295, so once a block ages past the pruning window this node silently stops serving that historical block by root/range even though header/body are still retained. This is a behavioral regression for historical sync/debugging and contradicts the “headers and bodies are always retained” intent. Either keep signatures for every block that may still be served, or change the serving/storage APIs consistently.

  2. crates/storage/src/store.rs:993-1000 regresses prune performance on RocksDB. The new loop scans BlockSignatures and then does a separate view.get(Table::BlockHeaders, &key) for every row to recover the slot. That turns pruning into an iterator plus O(N) point reads over the entire signed-block history. On a long-lived node this will be substantially more expensive than the previous single-table header scan. If the new retention policy stays, it would be better to derive age from a sequential header iteration or maintain a slot-indexed signature key.

I did not run cargo test because the environment’s rustup attempted to write under /home/runner/.rustup and failed with a read-only filesystem error.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@greptile-apps

greptile-apps Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR relaxes block pruning by replacing the old count-based prune_old_blocks (which deleted headers, bodies, and signatures) with prune_old_block_signatures, which keeps headers and bodies forever and only removes signatures of finalized blocks beyond a SIGNATURE_PRUNING_RANGE slot window.

  • BlockHeaders and BlockBodies are now retained indefinitely, preserving full block history for state reconstruction and re-org safety.
  • Signatures are pruned only when cutoff (= tip_slot - SIGNATURE_PRUNING_RANGE) <= finalized_slot, ensuring non-finalized signatures are never touched; if the non-finalized range exceeds the window, pruning is skipped entirely.
  • get_signed_block now returns None for deep-historical finalized blocks whose signatures have been pruned, intentionally giving up BlocksByRoot serving for pre-window history.

Confidence Score: 4/5

The change is safe to merge; the pruning logic, boundary conditions, and the guard against touching non-finalized signatures are all correct.

The core logic is sound: the cutoff <= finalized_slot guard correctly ensures only fully-finalized signature ranges are deleted, and the test suite covers the three key pruning scenarios. The two small gaps — orphaned signatures being silently skipped and the prune_old_data integration test not asserting BlockSignatures count — are edge-case quality issues rather than current defects.

crates/storage/src/store.rs — specifically the orphaned-signature skip path and the integration test assertion coverage.

Important Files Changed

Filename Overview
crates/storage/src/store.rs Replaces count-based prune_old_blocks with slot-based prune_old_block_signatures; headers/bodies kept forever, signatures pruned only when the cutoff lies entirely within finalized history. Logic and boundary conditions are correct; two minor gaps: orphaned signatures are silently skipped rather than cleaned up, and the prune_old_data integration test does not assert the BlockSignatures count.
CLAUDE.md Updates block-storage documentation to reflect the new signature-only pruning policy and the fact that get_signed_block returns None for deep historical finalized blocks.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[prune_old_data called] --> B[Compute finalized_slot from latest_finalized checkpoint]
    B --> C[Get tip_slot from head block header]
    C -->|header found| D[tip_slot = header.slot]
    C -->|header missing| E[tip_slot = finalized_slot fallback]
    D --> F[prune_old_block_signatures]
    E --> F
    F --> G[cutoff = tip_slot - SIGNATURE_PRUNING_RANGE]
    G --> H{cutoff > finalized_slot?}
    H -->|Yes: non-finalized range exceeds window| I[Return 0 - prune nothing]
    H -->|No: window is fully within finalized history| J[Scan BlockSignatures table]
    J --> K[For each signature key, fetch header from BlockHeaders]
    K -->|header.slot < cutoff| L[Mark for deletion]
    K -->|header.slot >= cutoff| M[Keep signature]
    K -->|header missing - orphaned| N[Skip silently]
    L --> O[Delete batch from BlockSignatures only]
    O --> P[BlockHeaders and BlockBodies never deleted]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[prune_old_data called] --> B[Compute finalized_slot from latest_finalized checkpoint]
    B --> C[Get tip_slot from head block header]
    C -->|header found| D[tip_slot = header.slot]
    C -->|header missing| E[tip_slot = finalized_slot fallback]
    D --> F[prune_old_block_signatures]
    E --> F
    F --> G[cutoff = tip_slot - SIGNATURE_PRUNING_RANGE]
    G --> H{cutoff > finalized_slot?}
    H -->|Yes: non-finalized range exceeds window| I[Return 0 - prune nothing]
    H -->|No: window is fully within finalized history| J[Scan BlockSignatures table]
    J --> K[For each signature key, fetch header from BlockHeaders]
    K -->|header.slot < cutoff| L[Mark for deletion]
    K -->|header.slot >= cutoff| M[Keep signature]
    K -->|header missing - orphaned| N[Skip silently]
    L --> O[Delete batch from BlockSignatures only]
    O --> P[BlockHeaders and BlockBodies never deleted]
Loading

Comments Outside Diff (1)

  1. crates/storage/src/store.rs, line 1662-1722 (link)

    P2 Integration test for prune_old_data doesn't assert BlockSignatures count

    fallback_pruning_removes_old_states_and_blocks calls prune_old_data() and then asserts BlockHeaders and States counts, but never checks BlockSignatures. In this test tip_slot = STATES_TO_KEEP + 4 = 3004 < SIGNATURE_PRUNING_RANGE, so cutoff saturates to 0 and no signatures are pruned — all 3005 should remain. Without an explicit assertion the new signature-pruning path within prune_old_data goes unvalidated at the integration level.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: crates/storage/src/store.rs
    Line: 1662-1722
    
    Comment:
    **Integration test for `prune_old_data` doesn't assert `BlockSignatures` count**
    
    `fallback_pruning_removes_old_states_and_blocks` calls `prune_old_data()` and then asserts `BlockHeaders` and `States` counts, but never checks `BlockSignatures`. In this test `tip_slot = STATES_TO_KEEP + 4 = 3004 < SIGNATURE_PRUNING_RANGE`, so `cutoff` saturates to 0 and no signatures are pruned — all 3005 should remain. Without an explicit assertion the new signature-pruning path within `prune_old_data` goes unvalidated at the integration level.
    
    How can I resolve this? If you propose a fix, please make it concise.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
crates/storage/src/store.rs:993-1001
**Orphaned signatures silently skipped, never cleaned up**

When a `BlockSignatures` entry exists but `BlockHeaders` returns `Ok(None)` for the same key, the `?` in the `filter_map` closure short-circuits to `None`, causing the entry to be skipped. Under the new design (headers kept forever) this scenario cannot arise in the forward path, but any signature entries that outlived their headers from prior runs will accumulate in storage indefinitely since nothing else will ever delete them. A defensive pass to collect orphan keys and delete them — or at minimum an instrumented warn — would surface corruption early.

### Issue 2 of 2
crates/storage/src/store.rs:1662-1722
**Integration test for `prune_old_data` doesn't assert `BlockSignatures` count**

`fallback_pruning_removes_old_states_and_blocks` calls `prune_old_data()` and then asserts `BlockHeaders` and `States` counts, but never checks `BlockSignatures`. In this test `tip_slot = STATES_TO_KEEP + 4 = 3004 < SIGNATURE_PRUNING_RANGE`, so `cutoff` saturates to 0 and no signatures are pruned — all 3005 should remain. Without an explicit assertion the new signature-pruning path within `prune_old_data` goes unvalidated at the integration level.

Reviews (1): Last reviewed commit: "feat(storage): prune only old finalized ..." | Re-trigger Greptile

Comment thread crates/storage/src/store.rs Outdated
@github-actions

Copy link
Copy Markdown

🤖 Claude Code Review

Here is the review:


PR 453 — Review: feat(storage): prune only old finalized block signatures, keep blocks forever

This PR replaces prune_old_blocks (which deleted headers, bodies, and signatures beyond a fixed window) with prune_old_block_signatures (which keeps headers and bodies forever and only prunes signatures for finalized blocks older than ~1 day). The policy is clearly motivated by upcoming state-reconstruction work (#444), and the pruning guard — only prune when the full window is within finalized history — is logically sound.

Three findings survived verification, ranked by severity.


Finding 1 — prune_old_states now scans an unbounded BlockHeaders table on every finalization tick

crates/storage/src/store.rs:926–938

prune_old_states determines which states to prune by iterating all BlockHeaders entries (line 927: prefix_iterator(Table::BlockHeaders, &[]), empty prefix = full table scan), SSZ-decoding every header, and collecting into a Vec. The only early-exit guard is if entries.len() <= STATES_TO_KEEP { return 0; } (line 937), which checks the number of headers, not states. Since STATES_TO_KEEP = 3_000, this guard stops firing after the first ~3.3 hours of operation.

Before this PR, prune_old_blocks kept BlockHeaders bounded at BLOCKS_TO_KEEP = 21_600. Now headers accumulate forever. At 30 days of chain history (~648,000 headers), every prune_old_data call (triggered on each finalization, every few seconds) decodes and allocates ~648,000 BlockHeader structs into memory. At modest RocksDB read latencies, this scan will grow from negligible to several seconds per tick, blocking the store actor.

Fix: Either (a) use the LiveChain table (slot-prefixed keys, already sorted) to walk only the oldest STATES_TO_KEEP entries without reading all headers, or (b) maintain a separate slot-indexed count for the pruning guard independent of BlockHeaders size. The existing prune_live_chain already uses a sorted prefix walk with take_while and is O(pruned), not O(total) — prune_old_states should follow the same pattern.


Finding 2 — BlocksByRange and BlocksByRoot silently return incomplete responses for blocks with pruned signatures

crates/net/p2p/src/req_resp/handlers.rscanonical_blocks_by_range and handle_blocks_by_root_request

Both handlers call store.get_signed_block(root) and silently drop the block when it returns None. The comment in handle_blocks_by_root_request says "Missing blocks are silently skipped (per spec)" — this was previously only for blocks the node genuinely did not have. After this PR, any finalized block older than tip_slot - SIGNATURE_PRUNING_RANGE (~1 day) returns None from get_signed_block despite its header and body being present, because only BlockSignatures is pruned.

Additionally, the range-walk in canonical_blocks_by_range follows parent_root links via get_block_header — which succeeds for all blocks (headers are kept forever). The walk has no depth limit. For a syncing peer requesting a range covering old blocks (e.g., start_slot=0, count=50000 on a week-old node), the walk traverses back through all headers but filter_map drops every block with a pruned signature, returning a response with 0–few blocks and no error signal to the requester.

The PR description acknowledges this ("deep historical signed-block serving via BlocksByRoot is lost; peers use checkpoint sync"), so this is a deliberate tradeoff. Two improvements would make this safer:

  1. The range-walk should be depth-bounded or should stop walking back past tip_slot - SIGNATURE_PRUNING_RANGE, so it doesn't traverse thousands of headers it cannot serve.
  2. The log line in the response path should record how many blocks were found vs. requested so operators can diagnose stalling peers.

Finding 3 — .expect() on get_signed_block in block re-processing path is now unsound

crates/blockchain/src/lib.rs:690–693

let block = self
    .store
    .get_signed_block(&missing_root)
    .expect("header and parent state exist, so the full signed block must too");

This assertion was valid before PR 453. It is now broken: get_signed_block returns None for any non-genesis block with a missing BlockSignatures row (store.rs line 1119: None => return None), and this PR explicitly removes those rows for old finalized blocks. The comment's invariant — "header + parent state ⟹ full signed block" — no longer holds once signatures are pruned.

The panic trigger requires has_state(&header.parent_root) to return true for a block with a pruned signature. Under normal operation this cannot happen simultaneously (STATES_TO_KEEP = 3,000 slots << SIGNATURE_PRUNING_RANGE = 21,600 slots), so states are pruned first. However, a node initialized via checkpoint sync could inject a state at an arbitrary old slot, making the parent-state check pass for a block whose signature has already been pruned — causing a panic on the next block-processing attempt.

The assertion message should at minimum be updated; ideally the code should handle None gracefully with an error log rather than a panic, now that the invariant can legitimately be violated.


Minor — prune_old_block_signatures does O(N) secondary BlockHeaders lookups

crates/storage/src/store.rs:997–1001

The iterator over BlockSignatures does a point lookup into BlockHeaders for every entry (view.get(Table::BlockHeaders, &key)). The BlockSignatures table is bounded (~21,600 entries at steady state), but this is still 21,600 random reads per pruning cycle to resolve slots. The same LiveChain-based approach mentioned in Finding 1 would avoid this: LiveChain keys encode the slot directly, eliminating the secondary lookup.


What looks good

  • The pruning guard (cutoff > finalized_slot → return 0) is correct. It is mathematically impossible for the current finalized block's signature to be pruned, so the /lean/v0/states/finalized RPC endpoint and the justified checkpoint are not affected.
  • The saturating_sub on cutoff correctly handles early-chain nodes where tip_slot < SIGNATURE_PRUNING_RANGE (cutoff = 0, nothing gets pruned).
  • The new tests cover the three key policy branches (healthy finality, deep non-finality, tip within window).
  • The CLAUDE.md update accurately documents the new invariant.

Automated review by Claude (Anthropic) · sonnet · custom prompt

… forever

Replace prune_old_blocks (which deleted headers, bodies, and signatures
beyond a fixed block window) with prune_old_block_signatures, which drops
only the signatures of old finalized blocks while keeping headers and
bodies for the full history.

- BlockSignatures is keyed by slot||root (big-endian slot), so pruning
  iterates in slot order and stops at the first entry past the cutoff,
  without a per-row BlockHeaders lookup to recover the slot. The shared
  slot||root codec (also used by LiveChain) is named encode_slot_root_key.
- With cutoff = tip_slot - SIGNATURE_PRUNING_RANGE: prune signatures for
  slot < cutoff only when cutoff <= finalized_slot (healthy finality);
  during deep non-finality (non-finalized range > window) prune nothing,
  so non-finalized signatures are never touched.
- get_signed_block reads the header first (so it has the slot) and looks
  up the signature by slot||root; it returns None when absent, including a
  pruned finalized block. Headers and bodies are kept forever.
- prune_old_data computes the tip slot from the head header and runs
  signature pruning alongside state pruning.
- InMemoryBackend::prefix_iterator now returns keys in sorted order to
  match the RocksDB backend, which the slot-ordered range scans rely on.

Note: changes the BlockSignatures key format; existing root-keyed entries
are not read after upgrade (old finalized blocks read as pruned). Fine for
fresh/checkpoint-synced nodes; no backfill.
@MegaRedHand MegaRedHand force-pushed the feat/prune-block-signatures branch from 6600cb2 to fb1cf37 Compare June 23, 2026 19:27
@MegaRedHand MegaRedHand merged commit ac727a5 into main Jun 23, 2026
2 checks passed
@MegaRedHand MegaRedHand deleted the feat/prune-block-signatures branch June 23, 2026 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants