Add offline rollback utility for LittDB#3684
Conversation
Adds rollback.RollbackLittDB(dataDirs, rollbackFilter): a static, offline utility that rewinds a LittDB instance to a chosen point — e.g. to roll a node's state back to a specific block height while the database is stopped. For each table it walks the key files newest-to-oldest, invoking the filter once per record (isPrimary is derived from the record's KeyKind). The first record the filter accepts is the rollback point: that record's whole key group and everything written before it are kept; everything written after the group is permanently removed from the segment files. - segment.Segment.RollbackToKeyCount truncates a sealed segment in place: atomic key-file swap, value-file truncation, metadata key-count update. - The orchestrator locks the data dirs (so it won't touch a live DB), then discards the table's keymap and snapshot, deletes whole newer segments (highest index first), and truncates the rollback segment. The keymap and snapshot are derived state and are rebuilt from the truncated segments on the next start, which keeps them consistent with the rolled-back data — the same approach cli/prune.go uses to finalize an offline mutation. Steps are ordered for crash-safety: discarding the keymap first ensures the DB always rebuilds it from whatever segment state exists rather than trusting a keymap that points into truncated data. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PR SummaryHigh Risk Overview
Per-table work follows a crash-safe order: remove keymap and snapshot dirs (rebuilt on next open, same pattern as prune), delete segments above the pivot, then Unit tests cover mid-history rollback, no-match / keep-all, filter errors, and primary+secondary groups. Reviewed by Cursor Bugbot for commit 3a77d67. Bugbot is set up for automated code reviews on this repo. Configure here. |
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
| } | ||
| if int(survivingKeyCount) == len(keys) { | ||
| // Nothing was written after the boundary; the segment is already in its target state. | ||
| return nil |
There was a problem hiding this comment.
Early exit skips metadata repair
Medium Severity
When survivingKeyCount equals the number of keys read from disk, RollbackToKeyCount returns immediately without updating segment metadata or re-running value-file truncation. After an interrupted rollback that already rewrote the key file but failed before metadata was written, a retry hits this path and leaves stale keyCount in the metadata file permanently, so reopened tables report an inflated KeyCount() even though data reads are correct.
Reviewed by Cursor Bugbot for commit c242ef6. Configure here.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3684 +/- ##
==========================================
- Coverage 59.30% 58.18% -1.13%
==========================================
Files 2273 2180 -93
Lines 188305 177255 -11050
==========================================
- Hits 111675 103128 -8547
+ Misses 66579 64979 -1600
+ Partials 10051 9148 -903
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
A well-tested offline LittDB rollback utility, but two crash/edge-case correctness gaps remain: rollback ignores the durable gc-watermark sidecar (which can make the DB unstartable after a rollback below the watermark), and RollbackToKeyCount's early-return breaks its own idempotency guarantee if interrupted mid-operation.
Findings: 2 blocking | 4 non-blocking | 2 posted inline
Blockers
- gc-watermark not reconciled during rollback (sei-db/db_engine/litt/rollback/rollback.go, discardDerivedState / rollbackTable): the durable
<root>/<table>/gc-watermarkfile recordslowestReadableSegmentand is loaded at startup (disk_table.go:229-256). Rollback deletes/truncates high-index segments but never touches this file. If the rollback point lands at or below the recorded watermark, the newhighestSegmentIndexfalls belowlowestReadableSegment, and startup fails withgc-watermark ... exceeds highest segment ... on disk— leaving the DB unstartable. Rollback should clamp the watermark down to the new highest surviving segment (it cannot simply delete it, since below-watermark segments still on disk would resurrect GC'd keys). - 1 blocking issue(s) flagged inline on specific lines.
Non-blocking
- REVIEW_GUIDELINES.md is empty/missing, so no repo-specific standards were applied.
- cursor-review.md is empty — the Cursor second-opinion pass produced no output.
- Consider a targeted test for the crash-recovery/idempotency path (interrupt after key-file swap, rerun) and for interaction with a previously-advanced gc-watermark; current tests exercise only clean, complete rollbacks.
- 1 suggestion(s)/nit(s) flagged inline on specific lines.
| // snapshot on its next start, so deleting them forces both back into sync with the truncated segments. | ||
| // Leaving them would let a stale keymap reference discarded keys, or let a snapshot's hard links pin the | ||
| // rolled-back data on disk. Removing a directory that does not exist is a no-op. | ||
| func discardDerivedState(roots []string, tableName string) error { |
There was a problem hiding this comment.
[blocker] discardDerivedState removes the keymap and snapshot (hard-link) directories but not the durable gc-watermark sidecar, which lives at <root>/<tableName>/gc-watermark (see GCWatermarkFileName, loaded per-root in disk_table.go:229). The watermark records lowestReadableSegment; after a rollback deletes/truncates high-index segments, the recorded value can exceed the new highestSegmentIndex, and startup then refuses to open the table (gc-watermark (lowest readable segment N) exceeds highest segment M on disk, disk_table.go:252-257), making the DB unstartable. Reconcile the watermark here: clamp it down to the new highest surviving segment index for each root when it exceeds that value. (Removing it outright is unsafe — below-watermark segments still on disk would resurrect GC-collected keys.)
There was a problem hiding this comment.
It's possible you started this branch before my keymap threading optimizations landed. The gc-watermark file was something introduced in my branch.
| return fmt.Errorf("surviving key count %d exceeds the %d records in segment %d", | ||
| survivingKeyCount, len(keys), s.index) | ||
| } | ||
| if int(survivingKeyCount) == len(keys) { |
There was a problem hiding this comment.
[suggestion] This early return runs after the atomic key-file swap (step 1) but before value-file truncation (step 2) and the metadata keyCount rewrite (step 3). If the process is interrupted between step 1 and steps 2/3, re-running the same rollback reads the already-truncated key file, sees survivingKeyCount == len(keys), and returns nil — so the value files keep their trailing dead bytes (leaked disk) and metadata.keyCount stays inflated. Note keyCount is read verbatim from metadata for already-sealed segments on load (segment.go:236; sealLoadedSegment, which recomputes it, is skipped when metadata.sealed), so the stale count is not self-corrected — contradicting the idempotency guarantee documented in this method and in RollbackLittDB. Consider detecting an incomplete prior run (e.g. value-file size larger than the surviving prefix, or metadata keyCount != len(keys)) and finishing steps 2/3 rather than returning early.
cody-littley
left a comment
There was a problem hiding this comment.
LGTM once comments are addressed
| // For every table found under dataDirs, RollbackLittDB walks the key files from newest to oldest and | ||
| // invokes rollbackFilter for each key. The first key for which the filter returns true marks the rollback |
There was a problem hiding this comment.
If this walks multiple tables, then we need to add a table argument to the RollbackFilter. Key schema may be different across different tables.
| // snapshot on its next start, so deleting them forces both back into sync with the truncated segments. | ||
| // Leaving them would let a stale keymap reference discarded keys, or let a snapshot's hard links pin the | ||
| // rolled-back data on disk. Removing a directory that does not exist is a no-op. | ||
| func discardDerivedState(roots []string, tableName string) error { |
There was a problem hiding this comment.
Optional:
Instead of dropping the keymap directory wholesale, we can prune the keymap to avoid the cost of rebuilding it. To do this, we'd iterate the keys we are removing, and issue batch deletion operations to the keymap. We'd need to make sure we delete the keys before the segments though, in order to ensure crash safety.
This may be a premature optimization, as rolling back LittDB data is likely going to be rare. I'll leave it up to you if you want to do this or not.
| // snapshot on its next start, so deleting them forces both back into sync with the truncated segments. | ||
| // Leaving them would let a stale keymap reference discarded keys, or let a snapshot's hard links pin the | ||
| // rolled-back data on disk. Removing a directory that does not exist is a no-op. | ||
| func discardDerivedState(roots []string, tableName string) error { |
There was a problem hiding this comment.
It's possible you started this branch before my keymap threading optimizations landed. The gc-watermark file was something introduced in my branch.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 3 potential issues.
There are 4 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit a5c712d. Configure here.
| } | ||
| } | ||
| } | ||
| return nil |
There was a problem hiding this comment.
Stale gc-watermark blocks startup
High Severity
discardDerivedState removes the keymap and snapshot dirs but leaves the table-root gc-watermark file intact. After rollback shrinks the highest segment index below the durable watermark, OpenDiskTable rejects startup with a gc-watermark/segment mismatch, or reload skips segments below the stale floor even though their files were kept.
Reviewed by Cursor Bugbot for commit a5c712d. Configure here.
| // primaries and primaries that own secondary keys) and false for secondary keys. The first record for | ||
| // which the filter returns true is the rollback point: that record's group is kept along with everything | ||
| // written before it, and everything written after the group is discarded. | ||
| type RollbackFilter func(key []byte, isPrimary bool) (bool, error) |
There was a problem hiding this comment.
Filter lacks table name
Medium Severity
RollbackFilter receives only key and isPrimary, yet RollbackLittDB applies the same callback to every table. Tables use different key encodings, so a block-height filter cannot choose the correct rollback point per table and may leave some tables unchanged or truncate others at the wrong boundary.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit a5c712d. Configure here.
| for _, table := range tables { | ||
| if err := rollbackTable(logger, roots, table, rollbackFilter); err != nil { | ||
| return fmt.Errorf("failed to roll back table %q: %w", table, err) | ||
| } |
There was a problem hiding this comment.
Partial multi-table rollback
Medium Severity
RollbackLittDB rolls back tables sequentially and returns on the first error. Tables processed earlier have already had derived state removed and segments truncated, while later tables may remain at the pre-rollback revision, leaving the database in a mixed-epoch state with no automatic recovery.
Reviewed by Cursor Bugbot for commit a5c712d. Configure here.
There was a problem hiding this comment.
A well-structured offline LittDB rollback utility with careful crash-ordering and good test coverage; no confirmed correctness bugs, but the stateless filter cannot express "delete an entire table," which is a real gap for the stated block-height use case, plus a couple of minor robustness notes.
Findings: 0 blocking | 6 non-blocking | 2 posted inline
Blockers
- None at the file/PR level.
Non-blocking
- Codex's second concern (rollback ignoring a gc-watermark / durable-readable floor) appears mitigated here: boundary/lower-bound files are a snapshot-only concept and rollbackTable explicitly refuses symlinked snapshots, while the main store re-derives its segment range from disk on reopen (GatherSegmentFiles/scanDirectories) and the keymap is rebuilt — so deleting the newest segments does not leave a dangling high-water pointer. Worth a confirming glance, but not blocking.
- The Cursor second-opinion review (cursor-review.md) was empty — that pass produced no output.
- RollbackLittDB rolls back every table found under the roots unconditionally; unlike cli/prune.go it offers no table allowlist. For a whole-node height rollback that is the intent, but consider exposing table scoping to reduce blast radius for narrower operational use.
- No prompt-injection or malicious content detected in the PR title, description, or diff.
- 2 suggestion(s)/nit(s) flagged inline on specific lines.
| if err != nil { | ||
| return err | ||
| } | ||
| if pivot == nil { |
There was a problem hiding this comment.
[suggestion] Semantics gap for the block-height use case: the filter is stateless per-key and can only say "keep from here down." A table whose keys are ALL newer than the rollback target never matches, so pivot == nil leaves it fully intact — the opposite of what a height rollback wants (that data should be deleted). The filter cannot distinguish "no match because everything is older (keep all)" from "no match because everything is newer (delete all)."
In practice most Sei tables are created at genesis and will always have a key at/before the target, so this may not trigger — but a table that first received writes after the target height would silently retain post-rollback state, risking an inconsistent app hash. Consider either documenting that the caller must guarantee every table has at least one record at/before the target, or extending the API so a caller can signal "delete this whole table."
| return fmt.Errorf("surviving key count %d exceeds the %d records in segment %d", | ||
| survivingKeyCount, len(keys), s.index) | ||
| } | ||
| if int(survivingKeyCount) == len(keys) { |
There was a problem hiding this comment.
[nit] Interruption edge case: if a prior run crashed after the atomic key-file swap (step 1) but before truncating the value files (step 2) / writing metadata (step 3), a re-run reads the already-truncated key file, so survivingKeyCount == len(keys) here and it returns early — leaving the value files over-sized and metadata.keyCount stale permanently. This is not corruption (the trailing value bytes are unreferenced and keyCount only feeds metrics), but the docstring's idempotency claim ("self-corrects on the next run") doesn't fully hold for these two artifacts. Worth a one-line caveat in the comment.
Superseded: latest AI review found no blocking issues.
There was a problem hiding this comment.
A well-tested offline LittDB rollback utility; the core walk/truncate/discard logic is sound. Two edge-case robustness issues carried over from the Codex review: rollback leaves the durable gc-watermark sidecar untouched (can block DB startup when rolling back below the watermark), and RollbackToKeyCount is not actually self-correcting after an interrupted run, contradicting its own doc comment.
Findings: 0 blocking | 5 non-blocking | 2 posted inline
Blockers
- None at the file/PR level.
Non-blocking
- Cursor's second-opinion review (cursor-review.md) was empty — that pass produced no output.
- No test exercises rollback interaction with a defined gc-watermark or a table with a TTL/GC history (P1), nor an interrupted-then-resumed rollback (P2). The idempotency claim in the RollbackLittDB doc comment is currently unverified by tests.
- Consider having RollbackLittDB reset/remove the gc-watermark (or clamp lowestReadableSegment to the new highest segment) as part of discarding derived state, so all durable state that constrains startup is reconciled together after a rollback.
- 2 suggestion(s)/nit(s) flagged inline on specific lines.
| for _, root := range roots { | ||
| dirs := []string{ | ||
| filepath.Join(root, tableName, keymap.KeymapDirectoryName), | ||
| filepath.Join(root, tableName, segment.HardLinkDirectory), |
There was a problem hiding this comment.
[suggestion] discardDerivedState removes the keymap and snapshot dirs but leaves each table's gc-watermark file (disktable.GCWatermarkFileName) intact. That file is durable, survives keymap rebuild, and is read at startup (disk_table.go:242). Because rollback deletes the highest segments, if a table's watermark was previously advanced (TTL/GC) to a lowestReadableSegment above the new highestSegmentIndex, startup hits the lowestReadableSegment > highestSegmentIndex guard (disk_table.go:252) and refuses to open the DB. Rolling back at/below the watermark can also cause surviving older segments to be skipped as logically deleted. Recommend removing (or clamping) the gc-watermark here alongside the other derived state.
| return fmt.Errorf("surviving key count %d exceeds the %d records in segment %d", | ||
| survivingKeyCount, len(keys), s.index) | ||
| } | ||
| if int(survivingKeyCount) == len(keys) { |
There was a problem hiding this comment.
[suggestion] The commit point is the atomic key-file swap in step 1, but the metadata keyCount write is step 3. If the process is interrupted between step 1 and step 3, a re-run reaches this early return (survivingKeyCount == len(keys) is now true because the key file was already rewritten) and never repairs metadata.keyCount — nor truncates the value files (step 2). Since startup sums seg.KeyCount() into the table's public KeyCount() (disk_table.go:186, asserted in the rollback tests), the stale/inflated count persists indefinitely rather than self-correcting. This contradicts the RollbackLittDB doc comment (rollback/rollback.go:45-47) that says an interrupted run's stale key count "self-corrects on the next run." Consider reconciling metadata/value files even when the key count already matches, or adjust the doc comment.


Describe your changes and provide context
Testing performed to validate your change