feat(storage): diff-layer state storage with an in-memory LRU cache#444
feat(storage): diff-layer state storage with an in-memory LRU cache#444MegaRedHand wants to merge 9 commits into
Conversation
2cb4ca3 to
a342fa7
Compare
6600cb2 to
fb1cf37
Compare
a342fa7 to
34d6345
Compare
… forever (#453) ## Summary Relaxes block pruning: instead of deleting old blocks wholesale, keep block headers and bodies **forever** and prune only the signatures of old finalized blocks. This preserves the full block history (for debugging, re-org safety, and state reconstruction) while still reclaiming the heavy signature data (~3KB+ per block). ## Change - Replaces `prune_old_blocks` (deleted headers, bodies, and signatures beyond a fixed `BLOCKS_TO_KEEP` window) with `prune_old_block_signatures(finalized_slot, tip_slot)`. - Policy, with `cutoff = tip_slot - SIGNATURE_PRUNING_RANGE`: - **healthy finality** (`cutoff <= finalized_slot`): delete signatures for `slot < cutoff` (entirely within finalized history); - **deep non-finality** (non-finalized range exceeds the window): prune nothing, so non-finalized signatures are never touched. - `BlockHeaders` and `BlockBodies` are kept forever; all non-finalized signatures are always retained. - `get_signed_block` returns `None` when a signature is absent, now including a pruned finalized block (deep historical signed-block serving via BlocksByRoot is lost; peers use checkpoint sync). - `prune_old_data` derives the tip slot from the head header and runs signature pruning alongside state pruning. ## Key layout (slot-ordered pruning) - `BlockSignatures` is now keyed by **`slot||root`** (big-endian slot), reusing the shared `encode_slot_root_key` codec also used by `LiveChain`. - Pruning iterates in slot order and **stops at the first entry past the cutoff** (`take_while`), and no longer does a per-row `BlockHeaders` lookup to recover the slot. Every read site already has the slot: `get_signed_block` loads the header first. - `InMemoryBackend::prefix_iterator` now returns keys in sorted order to match the RocksDB backend, which these slot-ordered range scans rely on. ## Migration Changes the `BlockSignatures` key format: existing root-keyed entries are not read after upgrade (old finalized blocks read as pruned). Fine for fresh / checkpoint-synced nodes; no backfill. ## Tests - `prune_signatures_keeps_recent_window_when_finality_healthy`, `prune_signatures_noop_when_non_finalized_range_exceeds_window`, `prune_signatures_noop_when_tip_within_window`. - Storage lib suite green (42 tests); clippy `-D warnings` clean. ## Context Split out of #444 (diff-layer state storage), which depends on this: reconstructing historical states needs block headers retained. #444 stacks on top of this PR. Co-authored-by: Pablo Deymonnaz <pdeymon@fi.uba.ar>
608e97f to
d1a76aa
Compare
Store every non-genesis state as a parent-linked StateDiff (StateDiffs, never pruned) plus a full snapshot (States) written only at 1024-slot anchors (and the bootstrap). Neither table is ever pruned, so the full state history is preserved cheaply. - get_state returns an anchor snapshot directly, else reconstructs by walking base_root back to the nearest anchor and replaying the appended historical_block_hashes tails; config/validators come from the snapshot and latest_block_header from BlockHeaders. - Reconstructed and freshly imported states are memoized in an in-memory LRU (STATE_CACHE_CAPACITY), keyed by block root. States are immutable per root, so the cache never needs invalidation; it keeps recent reads (e.g. a child block's parent state) hot without reconstruction. - DiffBase captures the parent (root, hbh_len, slot) before it is consumed into the post-state. StateDiff/DiffBase live in the storage crate. - No snapshot eviction and no StateAnchors table: anchors are simply the snapshots in States, so the prune-states scan is gone entirely.
d1a76aa to
1a060f2
Compare
…ktree-feat+state-diff-layers
The migrated test built its DiffBase from the target post-state itself (DiffBase::from_state(a, &head_state)), so base.slot/base.hbh_len came from the target rather than the parent. That made the anchor-boundary check always false (no snapshot written, contradicting the comment) and left the diff self-referential, passing only via the cache memoization. Diff against the genesis anchor already present in the store instead, so the base correctly describes the parent state.
The reconstruction and boundary tests built DiffBase struct literals directly from the crate-internal fields instead of the public DiffBase::from_state constructor. Use the constructor so the tests exercise the real construction path (and would catch a regression in from_state itself).
🤖 Kimi Code ReviewOverall this is a well-architected optimization that significantly reduces storage overhead by diffing states against their parents rather than storing full copies. The logic appears correct and the tests are comprehensive. Critical Observations
Code Quality & Safety
Minor Suggestions
Consensus Security The reconstruction logic preserves all consensus-critical fields ( Verdict: Approved with minor defensive programming suggestions. Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt |
…ktree-feat+state-diff-layers
🤖 Codex Code Review
I did not run the test suite successfully here: Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
The table dropped the 4 stale rows (LatestKnown/LatestNewAttestations, GossipSignatures, AggregatedPayloads) that no longer exist as persisted tables: attestations and gossip signatures are now in-memory Store buffers. Also fixes the BlockSignatures key (slot||root, not H256).
Greptile SummaryThis PR replaces aggressive state pruning with a diff-layer storage model: every non-genesis post-state is persisted as a parent-linked
Confidence Score: 4/5The core diff-layer write and reconstruction paths are correct and the tests cover single-diff, multi-diff, anchor-crossing, and bootstrap-snapshot cases with a cold store. No state is silently lost or incorrectly reconstructed under normal operation. The doc comment on prune_old_data still describes snapshot eviction that the new design abandoned entirely, which could mislead future maintainers. Anchor states also get a redundant StateDiffs row that is never read back. The backward walk in reconstruct_state ignores the LRU cache for intermediate steps and lacks any depth bound, so DB corruption could cause an indefinite hang. None of these are correctness issues under well-formed data, but the documentation inaccuracy and unbounded walk are worth addressing before the second anchor soak completes. crates/storage/src/store.rs — specifically the prune_old_data doc comment, insert_state_with_diff (redundant anchor diff), and the reconstruct_state backward walk loop.
|
| Filename | Overview |
|---|---|
| crates/storage/src/state_diff.rs | New file implementing StateDiff/DiffBase types, SSZ-encode/decode, and the reconstruct() function for replaying diffs onto a snapshot. Well-documented with explicit invariant contracts. |
| crates/storage/src/store.rs | Core storage changes: adds LRU state cache, replaces insert_state with insert_state_with_diff (snapshot-on-anchor), adds reconstruct_state for diff-chain replay, removes prune_old_states. Doc comment on prune_old_data misleadingly describes snapshot eviction that no longer happens; anchor states get a redundant StateDiffs entry; the backward walk has no cycle-detection/depth guard. |
| crates/blockchain/src/store.rs | Captures DiffBase before parent is consumed into post-state, then calls insert_state_with_diff. Test updated to use the new diff API correctly. |
| crates/storage/src/api/tables.rs | Adds StateDiffs table variant and updates ALL_TABLES, name(), plus updates rocksdb.rs to delegate cf_name to table.name() for single source of truth. |
| crates/storage/src/lib.rs | Adds state_diff module and re-exports DiffBase for callers. |
Sequence Diagram
%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant Caller
participant Store
participant LRU as LRU Cache
participant States as States Table (DB)
participant Diffs as StateDiffs Table (DB)
participant Headers as BlockHeaders Table (DB)
Note over Caller,Headers: get_state(root)
Caller->>Store: get_state(root)
Store->>LRU: get(root)
alt Cache hit
LRU-->>Store: State
Store-->>Caller: Some(State)
else Cache miss
Store->>States: get(root)
alt Snapshot found
States-->>Store: State (full snapshot)
else Not an anchor
Store->>Store: reconstruct_state(root)
loop walk back until snapshot found
Store->>States: get(cursor)
alt Snapshot found
States-->>Store: snapshot State
else
Store->>Diffs: get(cursor)
Diffs-->>Store: StateDiff
Note over Store: cursor = diff.base_root
end
end
Store->>Headers: get(root)
Headers-->>Store: latest_block_header
Store->>Store: reconstruct(snapshot, diffs[], header)
end
Store->>LRU: put(root, state)
Store-->>Caller: Some(State)
end
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant Caller
participant Store
participant LRU as LRU Cache
participant States as States Table (DB)
participant Diffs as StateDiffs Table (DB)
participant Headers as BlockHeaders Table (DB)
Note over Caller,Headers: get_state(root)
Caller->>Store: get_state(root)
Store->>LRU: get(root)
alt Cache hit
LRU-->>Store: State
Store-->>Caller: Some(State)
else Cache miss
Store->>States: get(root)
alt Snapshot found
States-->>Store: State (full snapshot)
else Not an anchor
Store->>Store: reconstruct_state(root)
loop walk back until snapshot found
Store->>States: get(cursor)
alt Snapshot found
States-->>Store: snapshot State
else
Store->>Diffs: get(cursor)
Diffs-->>Store: StateDiff
Note over Store: cursor = diff.base_root
end
end
Store->>Headers: get(root)
Headers-->>Store: latest_block_header
Store->>Store: reconstruct(snapshot, diffs[], header)
end
Store->>LRU: put(root, state)
Store-->>Caller: Some(State)
end
Prompt To Fix All With AI
Fix the following 4 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 4
crates/storage/src/store.rs:848-852
The doc comment says "evicting old state snapshots" but the new design never prunes snapshots — anchors are kept forever. Only block signatures are removed. The second sentence ("only full-state snapshots outside the hot window … are removed") reinforces the misconception. This will mislead future readers who rely on this doc to understand storage bounds.
```suggestion
/// Prune finalized block signatures to keep signature storage bounded.
///
/// State diffs, block headers, block bodies, and full-state snapshots are
/// all retained for the full history and are never pruned. Only signatures
/// of finalized blocks older than the pruning window are removed.
```
### Issue 2 of 4
crates/storage/src/store.rs:1213-1235
**Redundant `StateDiffs` entry written for every anchor state**
`insert_state_with_diff` always writes a `StateDiffs` row, even when `is_anchor` is true and a full `States` snapshot is also written. In `reconstruct_state` the loop checks `States[cursor]` before `StateDiffs[cursor]`, so when reconstruction reaches an anchor it breaks immediately on the snapshot — the diff row is never read. Over a long-running chain each anchor accumulates a useless diff entry (anchor-slot states get two on-disk representations).
### Issue 3 of 4
crates/storage/src/store.rs:1164-1189
**`reconstruct_state` backward walk skips the LRU cache for intermediate states**
The loop only checks `States` (the DB snapshot table) to decide whether to stop. If an intermediate state (e.g. the direct parent, already imported a moment ago) is present in the LRU cache, the walk continues all the way back to the nearest DB snapshot anyway, doing up to `SNAPSHOT_ANCHOR_INTERVAL` extra DB reads. Checking `self.state_cache.lock().unwrap().get(&cursor)` at the top of each iteration would short-circuit the walk and use the cached state as the reconstruction base, which is safe because states are immutable per root.
### Issue 4 of 4
crates/storage/src/store.rs:1164-1175
**Unbounded backward walk with no cycle detection or depth limit**
The `loop` in `reconstruct_state` terminates only when it finds a snapshot in `States` or a missing diff. With DB corruption (e.g. a `StateDiff` whose `base_root` points to itself or forms a multi-hop cycle), the loop runs indefinitely and hangs the calling thread. A simple guard — capping iterations at `SNAPSHOT_ANCHOR_INTERVAL + 1` and returning `None` if exceeded — would bound the worst case and surface corruption as a clean error rather than a hang.
Reviews (1): Last reviewed commit: "docs: correct storage tables list to mat..." | Re-trigger Greptile
| /// Bound storage by evicting old state snapshots and finalized signatures. | ||
| /// | ||
| /// State diffs, block headers, and block bodies are retained for the full | ||
| /// history; only full-state snapshots outside the hot window (diffs remain) | ||
| /// and signatures of finalized blocks are removed. |
There was a problem hiding this comment.
The doc comment says "evicting old state snapshots" but the new design never prunes snapshots — anchors are kept forever. Only block signatures are removed. The second sentence ("only full-state snapshots outside the hot window … are removed") reinforces the misconception. This will mislead future readers who rely on this doc to understand storage bounds.
| /// Bound storage by evicting old state snapshots and finalized signatures. | |
| /// | |
| /// State diffs, block headers, and block bodies are retained for the full | |
| /// history; only full-state snapshots outside the hot window (diffs remain) | |
| /// and signatures of finalized blocks are removed. | |
| /// Prune finalized block signatures to keep signature storage bounded. | |
| /// | |
| /// State diffs, block headers, block bodies, and full-state snapshots are | |
| /// all retained for the full history and are never pruned. Only signatures | |
| /// of finalized blocks older than the pruning window are removed. |
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/storage/src/store.rs
Line: 848-852
Comment:
The doc comment says "evicting old state snapshots" but the new design never prunes snapshots — anchors are kept forever. Only block signatures are removed. The second sentence ("only full-state snapshots outside the hot window … are removed") reinforces the misconception. This will mislead future readers who rely on this doc to understand storage bounds.
```suggestion
/// Prune finalized block signatures to keep signature storage bounded.
///
/// State diffs, block headers, block bodies, and full-state snapshots are
/// all retained for the full history and are never pruned. Only signatures
/// of finalized blocks older than the pruning window are removed.
```
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| pub fn insert_state_with_diff(&mut self, root: H256, base: DiffBase, state: State) { | ||
| let slot = state.slot; | ||
| let is_anchor = slot / SNAPSHOT_ANCHOR_INTERVAL > base.slot / SNAPSHOT_ANCHOR_INTERVAL; | ||
|
|
||
| // Snapshot only at anchors; serialize before `state` is consumed. | ||
| let snapshot_bytes = is_anchor.then(|| state.to_ssz()); | ||
| // Memoize the post-state for fast reads, then move it into the diff so | ||
| // its multi-MB justification fields are not cloned again. | ||
| self.state_cache.lock().unwrap().put(root, state.clone()); | ||
| let diff_bytes = StateDiff::from_base(base.root, base.hbh_len, state).to_ssz(); | ||
|
|
||
| let key = root.to_ssz(); | ||
| let mut batch = self.backend.begin_write().expect("write batch"); | ||
| let entries = vec![(root.to_ssz(), state.to_ssz())]; | ||
| batch.put_batch(Table::States, entries).expect("put state"); | ||
| batch | ||
| .put_batch(Table::StateDiffs, vec![(key.clone(), diff_bytes)]) | ||
| .expect("put state diff"); | ||
| if let Some(snapshot_bytes) = snapshot_bytes { | ||
| batch | ||
| .put_batch(Table::States, vec![(key, snapshot_bytes)]) | ||
| .expect("put state snapshot"); | ||
| } | ||
| batch.commit().expect("commit"); | ||
| } |
There was a problem hiding this comment.
Redundant
StateDiffs entry written for every anchor state
insert_state_with_diff always writes a StateDiffs row, even when is_anchor is true and a full States snapshot is also written. In reconstruct_state the loop checks States[cursor] before StateDiffs[cursor], so when reconstruction reaches an anchor it breaks immediately on the snapshot — the diff row is never read. Over a long-running chain each anchor accumulates a useless diff entry (anchor-slot states get two on-disk representations).
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/storage/src/store.rs
Line: 1213-1235
Comment:
**Redundant `StateDiffs` entry written for every anchor state**
`insert_state_with_diff` always writes a `StateDiffs` row, even when `is_anchor` is true and a full `States` snapshot is also written. In `reconstruct_state` the loop checks `States[cursor]` before `StateDiffs[cursor]`, so when reconstruction reaches an anchor it breaks immediately on the snapshot — the diff row is never read. Over a long-running chain each anchor accumulates a useless diff entry (anchor-slot states get two on-disk representations).
How can I resolve this? If you propose a fix, please make it concise.| fn reconstruct_state(&self, root: &H256) -> Option<State> { | ||
| // Walk back collecting diffs until we reach a snapshot. | ||
| let mut diffs: Vec<StateDiff> = Vec::new(); | ||
| let mut cursor = *root; | ||
| let snapshot = loop { | ||
| if let Some(snapshot) = self.get_ssz::<State>(Table::States, &cursor) { | ||
| break snapshot; | ||
| } | ||
| let diff = self.get_ssz::<StateDiff>(Table::StateDiffs, &cursor)?; | ||
| cursor = diff.base_root; | ||
| diffs.push(diff); | ||
| }; | ||
|
|
||
| // `diffs` runs target -> snapshot child; reverse to snapshot child -> target. | ||
| diffs.reverse(); | ||
|
|
||
| // The latest block header lives in BlockHeaders; the stored state caches | ||
| // the real state_root there, so it equals the header byte-for-byte. | ||
| let latest_block_header = self.get_block_header(root)?; | ||
|
|
||
| Some(crate::state_diff::reconstruct( | ||
| snapshot, | ||
| &diffs, | ||
| latest_block_header, | ||
| )) | ||
| } |
There was a problem hiding this comment.
reconstruct_state backward walk skips the LRU cache for intermediate states
The loop only checks States (the DB snapshot table) to decide whether to stop. If an intermediate state (e.g. the direct parent, already imported a moment ago) is present in the LRU cache, the walk continues all the way back to the nearest DB snapshot anyway, doing up to SNAPSHOT_ANCHOR_INTERVAL extra DB reads. Checking self.state_cache.lock().unwrap().get(&cursor) at the top of each iteration would short-circuit the walk and use the cached state as the reconstruction base, which is safe because states are immutable per root.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/storage/src/store.rs
Line: 1164-1189
Comment:
**`reconstruct_state` backward walk skips the LRU cache for intermediate states**
The loop only checks `States` (the DB snapshot table) to decide whether to stop. If an intermediate state (e.g. the direct parent, already imported a moment ago) is present in the LRU cache, the walk continues all the way back to the nearest DB snapshot anyway, doing up to `SNAPSHOT_ANCHOR_INTERVAL` extra DB reads. Checking `self.state_cache.lock().unwrap().get(&cursor)` at the top of each iteration would short-circuit the walk and use the cached state as the reconstruction base, which is safe because states are immutable per root.
How can I resolve this? If you propose a fix, please make it concise.| fn reconstruct_state(&self, root: &H256) -> Option<State> { | ||
| // Walk back collecting diffs until we reach a snapshot. | ||
| let mut diffs: Vec<StateDiff> = Vec::new(); | ||
| let mut cursor = *root; | ||
| let snapshot = loop { | ||
| if let Some(snapshot) = self.get_ssz::<State>(Table::States, &cursor) { | ||
| break snapshot; | ||
| } | ||
| let diff = self.get_ssz::<StateDiff>(Table::StateDiffs, &cursor)?; | ||
| cursor = diff.base_root; | ||
| diffs.push(diff); | ||
| }; |
There was a problem hiding this comment.
Unbounded backward walk with no cycle detection or depth limit
The loop in reconstruct_state terminates only when it finds a snapshot in States or a missing diff. With DB corruption (e.g. a StateDiff whose base_root points to itself or forms a multi-hop cycle), the loop runs indefinitely and hangs the calling thread. A simple guard — capping iterations at SNAPSHOT_ANCHOR_INTERVAL + 1 and returning None if exceeded — would bound the worst case and surface corruption as a clean error rather than a hang.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/storage/src/store.rs
Line: 1164-1175
Comment:
**Unbounded backward walk with no cycle detection or depth limit**
The `loop` in `reconstruct_state` terminates only when it finds a snapshot in `States` or a missing diff. With DB corruption (e.g. a `StateDiff` whose `base_root` points to itself or forms a multi-hop cycle), the loop runs indefinitely and hangs the calling thread. A simple guard — capping iterations at `SNAPSHOT_ANCHOR_INTERVAL + 1` and returning `None` if exceeded — would bound the worst case and surface corruption as a clean error rather than a hang.
How can I resolve this? If you propose a fix, please make it concise.
🤖 Claude Code ReviewNow I have everything to write the final review. PR Review: feat(storage): diff-layer state storage with an in-memory LRU cacheThis PR replaces time-windowed state pruning with a parent-linked diff-layer model: every non-genesis block gets a Finding 1 — Cache populated before DB commit (liveness hazard on crash)
self.state_cache.lock().unwrap().put(root, state.clone()); // ← cache, in-memory only
let diff_bytes = StateDiff::from_base(..., state).to_ssz();
let mut batch = self.backend.begin_write()...;
batch.put_batch(Table::StateDiffs, ...)...;
batch.commit().expect("commit"); // ← DB commitIf the process is killed (OOM, SIGKILL, panic from Finding 2 — Undocumented write-order precondition for snapshot/reconstruction consistency
post_state.latest_block_header.state_root = state_root; // blockchain/src/store.rs:571
// ...
store.insert_state_with_diff(block_root, diff_base, post_state); // :585If Finding 3 — No depth cap or cycle guard in
|
Closes #238
Summary
Replaces aggressive state pruning with a diff-layer storage model plus an in-memory cache, so the full state history stays available cheaply and no state is ever pruned. Builds on the block-signature pruning from #453 (merged to
main): keeping block headers/bodies forever is what lets historical states be reconstructed.State storage
StateDiff(StateDiffstable, never pruned), preserving full state history.Statestable) are written only at anchors: the bootstrap state plus one perSNAPSHOT_ANCHOR_INTERVAL(1024-slot) window crossing. Also never pruned. There is no separate anchor table; an anchor is just a state whose slot crosses a 1024 boundary relative to its parent.StateDiffstoresslot, both checkpoints, and the justification fields in full, plus the appendedhistorical_block_hashestail.config/validatorscome from the nearest ancestor snapshot (they never change);latest_block_headeris read back fromBlockHeaders(the stored state caches the realstate_rootthere, so it matches byte-for-byte). These assumptions are documented onStateDiff::from_base.get_stateresolves in three tiers: (1) in-memory LRU cache, (2) a full snapshot inStates, else (3) reconstruct by walkingbase_rootback to the nearest ancestor snapshot and replaying appended tails forward. Results are memoized in the cache (STATE_CACHE_CAPACITY = 32) so recent/repeated reads stay hot.DiffBasecaptures the parent(root, hbh_len, slot)before it is consumed into the post-state;StateDiff/DiffBaselive in the storage crate.Why this over snapshot eviction
The earlier approach pruned old snapshots on a hot window and scanned the whole
Statestable to do it. Anchors are kept forever and spread across all slots, so aslot||root-keyed range scan still touches every anchor: bounding the scan doesn't help. Keeping diffs-only (never pruned) plus an LRU for hot reads removes the periodic scan entirely and keeps full history, at the cost of on-demand reconstruction for cold, non-anchor states.Tests
StateDiffbuild/SSZ round-trip; state reconstruction (single-diff and multi-diff) verified via a cold store with an empty cache so the diff walk is actually exercised; snapshot-only-on-boundary-crossing; bootstrap-anchor snapshot.-D warningsclean.Validation