Add gov proposal based migration trigger#3650
Conversation
PR SummaryHigh Risk Overview The SC stack gains
AGENTS.md now requires Reviewed by Cursor Bugbot for commit 16c93d0. Bugbot is set up for automated code reviews on this repo. Configure here. |
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3650 +/- ##
==========================================
- Coverage 59.29% 58.79% -0.50%
==========================================
Files 2272 2223 -49
Lines 188210 183096 -5114
==========================================
- Hits 111594 107647 -3947
+ Misses 66565 65839 -726
+ Partials 10051 9610 -441
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
| // composite SC backend drives the in-flight memiavl->flatkv migration that | ||
| // BeginBlock paces via the migration gov param. Fail fast if the legacy | ||
| // root multistore is somehow in use. | ||
| rs, ok := app.CommitMultiStore().(*rootmulti.Store) |
There was a problem hiding this comment.
If we need to cast like this, would it make sense to change the type returned by app.CommitMultiStore()?
| genesisImportConfig genesistypes.GenesisImportConfig | ||
|
|
||
| stateStore seidb.StateStore | ||
| rs *rootmulti.Store |
There was a problem hiding this comment.
nit, would it make sense to use a more descriptive name like rootStore? Out of context, it's not obvious what rs stands for.
There was a problem hiding this comment.
good suggestion, will change
| if migrationBatchSize < 0 { | ||
| return nil, fmt.Errorf("migration batch size must not be negative, got %d", migrationBatchSize) |
There was a problem hiding this comment.
Could we make migration batch size an unsigned integer?
There was a problem hiding this comment.
Yeah, that's a good point
There was a problem hiding this comment.
Tried that, looks like it would require a big refactory, will actually do a fallback here to 0 if it's negative
There was a problem hiding this comment.
A well-structured, thoroughly-tested change that moves the SC migration rate from a node-local config to a consensus-deterministic governance param (migration.NumKeysToMigratePerBlock), correctly read once per block in BeginBlock and pushed down to the composite SC store. No blocking correctness/security issues found; a few non-blocking notes, chiefly that the new subspace value is not genesis-exportable.
Findings: 0 blocking | 6 non-blocking | 2 posted inline
Blockers
- None at the file/PR level.
Non-blocking
- Migration param is not genesis-exportable (confirms Codex P2).
x/paramsExportGenesis(sei-cosmos/x/params/keeper/genesis.go) only exportsFeesParams/CosmosGasParams, and the newmigrationsubspace has no owning module's InitGenesis/ExportGenesis. After aseid export/import while a migration is in flight,NumKeysToMigratePerBlockis dropped and BeginBlock re-seeds the0(paused) default, halting the drain until governance re-submits a proposal. Data isn't lost (the boundary cursor lives in flatkv) and it's recoverable, but worth either adding genesis plumbing for this subspace or documenting the operational caveat. - Operational behavior change worth calling out in release notes: because the param defaults to
0(paused) and is the sole source of the rate, any chain already running in a migrate write-mode will pause its drain at the upgrade height until a param-change proposal raises the rate. Intentional per the consensus-safety design, but operators must know to submit the proposal. - REVIEW_GUIDELINES.md and cursor-review.md were empty/absent, so no repo-specific guidelines or Cursor second opinion were available; only the Codex pass (one P2, addressed above) contributed.
- Integration test nit (integration_test/gov_module/gov_proposal_test.yaml): the verifier
NEW_PARAM == 12345compares an unquoted numeric against a value the pipeline produces as a string (jq -r .value | tr -d "\""), unlike the sibling assertionNEW_ABCI_PARAM == "true". If the eval engine is type-strict this could mis-compare; quoting"12345"would match the established pattern. - 2 suggestion(s)/nit(s) flagged inline on specific lines.
| // composite SC backend drives the in-flight memiavl->flatkv migration that | ||
| // BeginBlock paces via the migration gov param. Fail fast if the legacy | ||
| // root multistore is somehow in use. | ||
| rs, ok := app.CommitMultiStore().(*rootmulti.Store) |
There was a problem hiding this comment.
[suggestion] This unconditional type assertion now panics in New() if the commit multistore is not *rootmulti.Store. New() is invoked by every seid subcommand that builds the app (export, query tooling, etc.), so please confirm no supported configuration (e.g. a legacy IAVL/storev2-disabled path) can reach here — otherwise this turns a previously-degraded path into a hard crash. If the legacy store is truly unsupported this is fine; a one-line note on why it's unreachable would help.
| } | ||
| numKeys := migration.DefaultNumKeysToMigratePerBlock | ||
| if subspace, ok := app.ParamsKeeper.GetSubspace(migration.SubspaceName); ok { | ||
| // The migration subspace has no owning module to seed it in InitGenesis, |
There was a problem hiding this comment.
[suggestion] The lazy seed correctly handles a fresh/never-set param deterministically, but because this subspace has no module ExportGenesis (x/params only exports Fees/CosmosGas params), the value is lost across a seid export/import: a mid-migration rate resets to the 0 default here and the drain pauses until governance re-raises it. Consider adding genesis import/export plumbing for this subspace, or documenting the caveat.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Want higher recall? High effort reviews run extra passes and find more bugs. A team admin can switch effort levels in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 91e1766. Configure here.
There was a problem hiding this comment.
The PR cleanly moves the SC migration rate from a node-local config to a consensus-safe governance param, with thorough plumbing and tests. However, an accidental rename of the receipt-store backend constant breaks an existing test (and decouples the test harness from the real config key), which blocks merge.
Findings: 1 blocking | 4 non-blocking | 2 posted inline
Blockers
- None at the file/PR level.
- 1 blocking issue(s) flagged inline on specific lines.
Non-blocking
- Cursor's second-opinion pass produced no output (cursor-review.md is empty); REVIEW_GUIDELINES.md is also empty, so no repo-specific standards could be applied.
- validateNumKeysToMigratePerBlock accepts any uint64. Because the value is cast to int before NextBatch (migration_manager.go:299), a governance value above math.MaxInt64 becomes negative and halts commit deterministically. Consider bounding the param to a sane operational maximum.
- app/abci_test.go's comment that math.MaxUint64 is 'forwarded verbatim with no conversion or clamping' is misleading — the app layer stores it verbatim, but migration_manager converts it to int downstream.
- 1 suggestion(s)/nit(s) flagged inline on specific lines.
|
|
||
| const ( | ||
| receiptStoreBackendKey = "receipt-store.rs-backend" | ||
| receiptStoreBackendKey = "receipt-store.rootStore-backend" |
There was a problem hiding this comment.
[blocker] This key must stay receipt-store.rs-backend. The parser in sei-db/config/receipt_config.go reads flagRSBackend = "receipt-store.rs-backend" (and the rs-backend mapstructure tag), so renaming this constant to receipt-store.rootStore-backend decouples the app-layer key from what ReadReceiptConfig actually reads. Concretely, TestParseReceiptConfigs_RejectsInvalidBackend (app/seidb_test.go) now sets "rocksdb" under a key the parser ignores, the config falls back to the default pebbledb, and assert.Error fails — breaking CI. This looks like collateral damage from a global rs→rootStore rename (the comment at app/test_helpers.go:486 was hit too). Revert to receipt-store.rs-backend.
| if advanceMigration { | ||
| // Get the next batch of keys to migrate. | ||
| valuesToMigrate, newBoundary, err := m.iterator.NextBatch(m.migrationBatchSize) | ||
| valuesToMigrate, newBoundary, err := m.iterator.NextBatch(int(m.migrationBatchSize)) |
There was a problem hiding this comment.
[suggestion] int(m.migrationBatchSize) truncates a uint64 gov param. On 64-bit, a value above math.MaxInt64 becomes negative, NextBatch rejects size <= 0, and ApplyChangeSets returns an error during commit — a deterministic chain halt from a governance typo. Since validateNumKeysToMigratePerBlock accepts any uint64, consider bounding the param to a sane maximum (well under math.MaxInt) at validation time.
This reverts commit 91e1766.
There was a problem hiding this comment.
Converts the SC migration per-block rate from a consensus-unsafe node-local config into a deterministic governance param read in BeginBlock; the design is sound, AppHash-safe (all nodes read the same value, lazy-default-set runs identically), and very thoroughly tested at unit, store, and integration levels. Only minor, non-blocking observations.
Findings: 0 blocking | 5 non-blocking | 1 posted inline
Blockers
- None at the file/PR level.
Non-blocking
- New() now hard-panics if the commit multistore is not *rootmulti.Store (app/app.go:560). This is intentional fail-fast, but it is a behavior change: any App-construction path that could use a non-storev2 store (offline/CLI commands, tooling) would now crash rather than degrade. Worth confirming every code path that builds App goes through the storev2 rootmulti store.
- The two later casts in app.go (~line 770 and ~2685) still re-derive the store via app.CommitMultiStore().(*rootmulti.Store) instead of reusing the new app.rootStore field; minor redundancy now that the type is guaranteed at construction.
- Cursor's second-opinion review file (cursor-review.md) was empty — that pass produced no output. Codex reported no material issues.
- Consider a brief note/log when the lazy default-set writes the param the first time (app/abci.go), purely for operator visibility on the upgrade block; not required for correctness.
- 1 suggestion(s)/nit(s) flagged inline on specific lines.
Superseded: latest AI review found no blocking issues.
There was a problem hiding this comment.
The plumbing of the per-block migration rate into a governance param is clean, well-documented, and well-tested, but the PR also flips the default WriteMode to migrate_evm and relies on the claim that "paused migrate_evm produces the same app hash as memiavl_only" — which the routing code contradicts: brand-new EVM keys are written to flatkv yet excluded from the AppHash while paused, creating a fork/integrity risk. REVIEW_GUIDELINES.md and cursor-review.md were both empty; only Codex provided a second opinion, and its two P1 findings are corroborated.
Findings: 5 blocking | 4 non-blocking | 3 posted inline
Blockers
- Default WriteMode + migration trigger are not consistently network-wide. The default flips to migrate_evm (sc_config.go) but sc-write-mode remains node-local (app/seidb.go:109 only overrides when the toml key is non-empty). A node that picks up the new default runs migrate_evm while a peer with an explicit
sc-write-mode = memiavl_onlystays memiavl_only; once the gov param is raised, only the migrate_evm nodes drain. Because the paused-migrate_evm AppHash is NOT equivalent to memiavl_only (see inline finding), this mixed fleet can fork even before the migration is triggered. The PR's framing of the gov param as the 'sole' migration trigger overstates this — the write mode is still a consensus-relevant local switch. (Corroborates Codex P1 #2.) - Missing test coverage for the central safety claim: there is no test asserting that a paused migrate_evm store and a memiavl_only store produce identical AppHashes when brand-new EVM keys are written each block. Existing migration tests all set a positive batch size and drive the migration; the steady-state/paused path with new-key writes (the exact rollout state introduced by the default change) is unverified.
- 3 blocking issue(s) flagged inline on specific lines.
Non-blocking
- REVIEW_GUIDELINES.md is empty/missing, so no repo-specific standards were applied. cursor-review.md is empty — the Cursor pass produced no output. Only the Codex pass produced findings (both corroborated here).
- app/app.go: the fail-fast panic message says
expected *storev2_rootmulti.Storebut the import alias was changed torootmultiin this PR — cosmetic mismatch only. - integration_test/gov_module/gov_proposal_test.yaml: the verifier
NEW_PARAM == 12345compares a string (fromjq -r .value | tr -d '"') against an integer literal, whereas the sibling test uses a quoted string (NEW_ABCI_PARAM == "true"). Confirm the eval framework coerces types here, otherwise the assertion may not behave as intended. There is also a strayq gov proposal ... statusquery inserted between the two node votes — harmless but reads oddly. - app/abci.go applyMigrationBatchSize lazily persists the default param via subspace.Set on the first block it sees it unset. This is deterministic across nodes, but it means BeginBlock writes to the params store at the first post-upgrade height; worth a brief note that this is intended and idempotent.
| // all_migrated_but_bank, migrate_bank, flatkv_only, test_only_dual_write, auto. | ||
| // defaults to memiavl_only. | ||
| // defaults to migrate_evm. While the NumKeysToMigratePerBlock gov param is 0 | ||
| // (the default), migrate_evm is paused and produces the same app hash as |
There was a problem hiding this comment.
[blocker] This claim is not borne out by the routing code. In migrate_evm, MigrationManager.shouldForwardWriteToNewDB routes any caller write whose key is absent from memiavl directly to flatkv (migration_manager.go:431-435), and EVM creates brand-new keys (storage slots, accounts, receipts) every block. While paused (batch size 0) the boundary never advances, so shouldAppendLatticeHash returns false (composite/store.go:870-882) and flatkv is excluded from CommitInfo/AppHash. Result: paused migrate_evm commits new EVM state into flatkv that is outside the AppHash, whereas memiavl_only puts that same key into memiavl which IS in the AppHash — so the two AppHashes differ. This breaks the equivalence the rollout depends on (fork/halt risk in a mixed fleet) and leaves committed state uncovered by the merkle root. Please either route brand-new keys to memiavl while paused, or do not enable migrate_evm by default until the param is raised.
| return StateCommitConfig{ | ||
| Enable: true, | ||
| WriteMode: types.MemiavlOnly, | ||
| WriteMode: types.MigrateEVM, |
There was a problem hiding this comment.
[blocker] Changing the default WriteMode from MemiavlOnly to MigrateEVM is a consensus-relevant default flip. Combined with the node-local sc-write-mode override (app/seidb.go:109) and the AppHash non-equivalence of paused migrate_evm (see comment above), nodes that adopt this default will diverge from peers still on memiavl_only. Please confirm the upgrade story: existing nodes with an explicit memiavl_only in app.toml keep that value, but any node without it now silently enters migrate_evm. This needs to be coordinated as a network-wide switch rather than a per-node default.
| // still route below, but no keys are pulled forward and no boundary | ||
| // metadata is rewritten, so the migration holds at its current cursor | ||
| // until the batch size is raised again. | ||
| advanceMigration := firstBatchInBlock && m.migrationBatchSize > 0 |
There was a problem hiding this comment.
[blocker] advanceMigration gates only the boundary advance and metadata writes; caller-write routing below (shouldForwardWriteToNewDB) still runs unconditionally, sending brand-new keys to flatkv even when paused. Since the boundary metadata is never written in the paused state, shouldAppendLatticeHash never latches, so those flatkv writes never enter the AppHash. This is the mechanism behind the AppHash divergence flagged on sc_config.go and matches Codex P1 #1. Consider also pausing the new-key forwarding (route everything to the old DB) while migrationBatchSize == 0, so paused migrate_evm is truly AppHash-identical to memiavl_only.
There was a problem hiding this comment.
The PR cleanly replaces the node-local sc-keys-to-migrate-per-block config with a consensus-deterministic governance param and is well-tested, but the migration kick-off still depends on each node's local sc-write-mode: a node that keeps a fixed memiavl_only config (the value existing app.toml files already carry after upgrade) cannot honor the gov trigger, its failed kick-off is only logged, and it will diverge from Auto-mode nodes once the param is raised. That residual per-node divergence axis plus the silent error handling are blocking for a consensus path.
Findings: 4 blocking | 3 non-blocking | 2 posted inline
Blockers
- Upgrade/operational fork risk:
DefaultStateCommitConfig.WriteModechanges frommemiavl_onlytoauto, but already-running validators keepsc-write-mode = "memiavl_only"baked into their app.toml (rendered from the old default). The default change does not rewrite their config, so after upgrade many nodes will be fixedmemiavl_only. When the gov param is later raised, Auto nodes migrate (changing the memiavl/flatkv contents that feed the AppHash) while fixedmemiavl_onlynodes silently fail the kick-off and never migrate — i.e. the chain forks. The PR's stated goal is to eliminate exactly this per-node divergence, but it introduces a new one (Auto vs fixed). At minimum this needs to be enforced (refuse to start / fail loud if a non-Auto store is in use while migration is/was requested) or made an explicit, hard upgrade prerequisite, not just relied upon operationally. - REVIEW_GUIDELINES.md is empty and codex-review.md is empty/absent, so no repo-specific review standards and no OpenAI Codex second-opinion pass were available for this synthesis (noting per instructions; not itself a code defect).
- 2 blocking issue(s) flagged inline on specific lines.
Non-blocking
- app.go
New()now unconditionallypanics ifCommitMultiStore()is not*rootmulti.Store. This is intentional fail-fast, but confirm no supported code path (export/CLI tooling, tests, legacy-store startup) constructs the App with a different commit multistore, since this turns a previously-guardedif okinto a hard panic. - Cursor P2 (ordering):
applyMigrationBatchSizeruns afterforkInitializerandHardForkManager.ExecuteForTargetHeight. The kick-off'sSetWriteModerejects when any commitment storeHasPendingChanges(). This appears safe because BeginBlock writes are buffered in the cache-multistore and only flushed into the commitment stores at Commit (soHasPendingChanges()should be false here), but it's worth confirming no hard-fork handler writes directly into a commitment store's pending buffer in the same block a positive param first applies. MigrationManager.SetMigrationBatchSizewritesm.migrationBatchSizewithout atomics; this is safe only because the manager is always wrapped bythreadSafeRouter(write lock serializes it againstApplyChangeSets). Fine as-is, but the invariant is load-bearing and worth a brief assertion/test that migration routers are never used unwrapped.
| return fmt.Errorf("failed to set SC store migration batch size: %w", err) | ||
| } | ||
| if batchSize > 0 { | ||
| if mode, ok := rs.GetWriteMode(); ok && mode == sctypes.MemiavlOnly { |
There was a problem hiding this comment.
[blocker] The kick-off fires whenever the effective mode is MemiavlOnly, but a composite store under a fixed memiavl_only config also reports GetWriteMode() == MemiavlOnly (see composite store.go:428-429 / resolveCurrentWriteMode). So for such a node this calls SetWriteMode(MigrateEVM), which returns the "write mode is fixed by configuration" error. Combined with the log-only handling in BeginBlock, that node never migrates while Auto nodes do — an AppHash divergence. The doc comment above ("every other SC store ... never reports a memiavl_only effective mode, so the kick-off is skipped") is inaccurate for fixed memiavl_only. Either special-case/skip non-Auto stores here, or treat the "fixed" failure as fatal so the divergence can't go unnoticed.
There was a problem hiding this comment.
For fixed MemiavlOnly node, we intentionally want to diverge the app hash and skip the migration. So the switch should only work for auto mode
| numKeys = uint64(math.MaxInt64) | ||
| } | ||
| if err := app.rootStore.SetMigrationBatchSize(int(numKeys)); err != nil { | ||
| logger.Error("failed to set SC migration batch size", "err", err) |
There was a problem hiding this comment.
[blocker] A failure to apply the consensus-relevant migration rate is only logged and execution continues. The level-triggered retry justifies swallowing transient failures (e.g. pending changes), but a permanent failure — chiefly a node pinned to a fixed sc-write-mode retained across upgrade — will silently fork from nodes that do migrate, one log line per block. Recommend distinguishing the non-retryable case (e.g. the "write mode is fixed" error) and halting/panicking, or validating at startup that the store is Auto when migration may be driven by governance.
| - type: eval | ||
| expr: NEW_ABCI_PARAM == "true" | ||
|
|
||
| - name: Test migration param change proposal should update NumKeysToMigratePerBlock |
There was a problem hiding this comment.
it looks the migration gov tests triggers a real evm migration but only asserts the param value, not the migration. want to make sure it's on purpose
There was a problem hiding this comment.
There's no good way to check if real migration started or not, I decided to add a check to look for logs signals
There was a problem hiding this comment.
Solidly-engineered change that moves the SC memiavl→flatkv migration rate from a node-local config to a governance param and makes auto the default write mode, with thorough tests and unusually detailed rationale comments. No correctness blockers found; the main risk is the "auto by default" upgrade path silently overriding explicitly-pinned non-memiavl write modes, which is documented but has no runtime guard.
Findings: 0 blocking | 5 non-blocking | 1 posted inline
Blockers
- None at the file/PR level.
Non-blocking
- Cursor second-opinion review (cursor-review.md) was empty — that pass produced no output.
- Genesis-export gap (documented in app/migration/params.go): the
migrationsubspace has no owning AppModule, soseid exportomitsNumKeysToMigratePerBlock; a chain bootstrapped from such a genesis silently re-seeds the default (0/paused) and operators must re-issue the proposal. Consider seeding it via the params ExportGenesis path or documenting it in operator runbooks so a mid-migration export doesn't silently reset the drain. - WriteMode resolution is now computed in two places — sei-cosmos/server/config/config.go:GetConfig and app/seidb.go:parseSCConfigs — both calling ApplyWriteModeAuto. They agree today; consider a shared helper or a test asserting parity so the two parse paths can't drift and produce a config divergence.
- Codex finding #2 (app/abci.go applyMigrationBatchSize logs-and-continues on SetMigrationBatchSize failure, which can include the consensus-critical memiavl_only→migrate_evm kick-off): the author's in-code rationale is sound — a partial failure diverges AppHash and Tendermint halts the affected minority, while an all-fail keeps AppHash identical and the level-triggered kick-off re-fires. Acceptable as-is; flagging that Codex raised it and it was consciously resolved.
- 1 suggestion(s)/nit(s) flagged inline on specific lines.
There was a problem hiding this comment.
Well-structured, thoroughly tested, and heavily documented change that moves the consensus-relevant SC migration rate from a node-local config to a governance param read each BeginBlock. No correctness or security blockers found; the concurrency, overflow-clamp, determinism, and crash/restart paths are all covered by tests. Only minor non-blocking notes.
Findings: 0 blocking | 4 non-blocking | 0 posted inline
Blockers
- None at the file/PR level.
Non-blocking
- Second-opinion status: Codex reported "no material findings" but explicitly could not run any tests (the sandbox blocked the go1.25.6 toolchain download), so its pass is effectively static-only. The Cursor review file (cursor-review.md) is empty — that pass produced no output. Neither tool contributed findings to merge.
- Log noise on opt-out nodes: applyMigrationBatchSize runs every BeginBlock, so a node pinned to fixed memiavl_only (sc-write-mode-enable-auto=false) hits the "skipping migration kick-off" logger.Info in rootmulti Store.SetMigrationBatchSize on every block once the gov param goes positive. In practice such a node diverges/halts on AppHash mismatch soon after the activation height, so exposure is bounded, but consider rate-limiting or edge-triggering that Info log to avoid per-block spam in the window before divergence.
- Documented export caveat: the new
migrationsubspace has no owning AppModule, so x/params ExportGenesis does not serialize NumKeysToMigratePerBlock. Aseid exporttaken mid-migration omits the rate and a chain bootstrapped from it re-seeds the default (0, paused). This is clearly documented in app/migration/params.go, but it is worth surfacing in operator/runbook docs since a fork/recovery must re-issue the ParameterChangeProposal to resume the drain. - Behavior change: App.New() now panics if CommitMultiStore() is not *storev2/rootmulti.Store. This is intentional (storev2 is the only supported backend) and fine, but it converts a previously-tolerated legacy-store configuration into a hard startup failure — ensure no supported deployment/test path still constructs the legacy commit multistore.
There was a problem hiding this comment.
Solid, well-tested, thoroughly documented change that moves the SC migration rate from a node-local config to a consensus-safe governance param and makes auto the default write mode. No correctness or compile-completeness defects found; the notes below are operational cautions and documented caveats.
Findings: 0 blocking | 4 non-blocking | 0 posted inline
Blockers
- None at the file/PR level.
Non-blocking
- Upgrade footgun (auto-default):
WriteModeEnableAutonow defaults to true and an absentsc-write-mode-enable-autokey resolves to true, which silently overrides any explicitsc-write-mode. Old docs told operators that aflatkv_onlynode must stayflatkv_onlyforever; after this upgrade such a node (withoutsc-write-mode-enable-auto = false) flips toauto, which the PR's own toml comment says can fail every commit with a version-mismatch or serve reads from an empty memiavl. Not reachable in production today (no flatkv_only/evm_migrated nodes exist pre-migration) so not a blocker, but the release/upgrade notes MUST call out that any explicitly-pinned node needs to addsc-write-mode-enable-auto = false. - ExportGenesis omission (already documented in app/migration/params.go): the
migrationsubspace has no owning module, soseid exportdoes not serializeNumKeysToMigratePerBlock. A chain bootstrapped from a mid-migration export re-seeds the default (0/paused) and requires re-issuing the proposal. Acceptable and documented; consider a follow-up to include it in export or a runbook note for operators forking/recovering via export. - Second-opinion passes produced no findings: Codex reported no material findings but noted it could not run targeted Go tests (sandbox could not create the module cache / download the Go 1.25.6 toolchain — network blocked), so its pass was static-only; the Cursor review file was empty.
REVIEW_GUIDELINES.md(from the base branch) was also empty, so no repo-specific standards were applied. - Nit:
MigrationManager.SetMigrationBatchSizemutatesm.migrationBatchSizenon-atomically and relies entirely on thethreadSafeRouterwrite lock for safety (correctly documented). SinceApplyChangeSets/Readare only serialized when wrapped byNewThreadSafeRouter, keep the invariant that migration-mode routers are always thread-safe-wrapped enforced/asserted so a future direct use can't introduce a data race under-race.
There was a problem hiding this comment.
Solidly-designed, well-tested change that moves the SC-store migration rate from a node-local config to a governance param read each BeginBlock, correctly eliminating a per-node AppHash divergence risk. No blocking issues; the main note is an upgrade footgun where the new default-true sc-write-mode-enable-auto silently overrides explicitly pinned non-memiavl write modes.
Findings: 0 blocking | 5 non-blocking | 1 posted inline
Blockers
- None at the file/PR level.
Non-blocking
- Upgrade footgun (agrees with Codex, narrowed): sc-write-mode-enable-auto defaults to true and treats an absent key as true, so ApplyWriteModeAuto forces any node with an explicit sc-write-mode to auto on upgrade. This is safe for the current all-memiavl fleet and for evm_migrated nodes (auto correctly derives evm_migrated), but a flatkv_only-origin node (history began in flatkv, no memiavl) upgraded without first setting sc-write-mode-enable-auto=false will be flipped to auto — which the toml comment itself warns either fails every commit with a version-mismatch or silently serves reads from an empty memiavl. Recommend an explicit release/upgrade note calling this out, and consider failing loudly (or scoping the auto-fallback to absent/memiavl_only) when an explicit flatkv_only is present without the opt-out key.
- Cursor's second-opinion review file (cursor-review.md) was empty — that pass produced no output. Codex produced a single finding, which is incorporated above.
- app.go New() now hard-panics if the commit multistore is not *rootmulti.Store, whereas sibling call sites (trace snapshot) handle the !ok case gracefully. Verified safe for production since SetupSeiDB always prepends the storev2 rootmulti store as CMS, but the divergence in handling style is worth a brief comment for future maintainers.
- Migration subspace has no owning AppModule, so its param is omitted from x/params ExportGenesis; a chain bootstrapped from a mid-migration
seid exportsilently re-seeds the default (0/paused). This is documented in params.go but operators forking via export must re-issue the ParameterChangeProposal — worth surfacing in operator docs. - 1 suggestion(s)/nit(s) flagged inline on specific lines.
Comments that couldn't be anchored to the diff
integration_test/gov_module/gov_proposal_test.yaml:929-- [nit] Strayseid q gov proposal ... .statusline sits between the node-0 and node-1 votes with noenvcapture and nonodeselector, so it does nothing. Looks like a leftover; remove it to keep the case tidy.
| // sc-write-mode-enable-auto key, and must still resolve to auto so a | ||
| // governance-driven migration can start without an app.toml edit. Only an | ||
| // explicit key flips it. | ||
| if v := appOpts.Get(FlagSCWriteModeEnableAuto); v != nil { |
There was a problem hiding this comment.
[suggestion] sc-write-mode-enable-auto defaults to true and an absent key stays true, so any node carrying an explicit sc-write-mode is forced into auto on upgrade. That's the intended behavior for the memiavl fleet (and evm_migrated is safely derived by auto), but a flatkv_only-origin node upgraded without pre-setting this to false gets silently flipped to auto — which the toml docs describe as unsafe (version-mismatch failures or empty-memiavl reads). Consider failing loudly for an explicit non-derivable mode (notably flatkv_only) that lacks the opt-out key, and/or add an explicit upgrade note. Non-blocking given the current all-memiavl fleet.
There was a problem hiding this comment.
Solid, well-tested refactor that moves the SC migration rate from a node-local config to a deterministic governance param read each BeginBlock; concurrency, clamping, and determinism reasoning all check out. No correctness blockers — only operational and test-coverage notes remain.
Findings: 0 blocking | 7 non-blocking | 3 posted inline
Blockers
- None at the file/PR level.
Non-blocking
- Second-opinion passes: Codex reported "No material findings" and the Cursor review file (cursor-review.md) was empty — that pass produced no output.
- Operational upgrade risk: WriteModeEnableAuto now defaults to true, which forces every node to
autoand silently ignores an explicitly pinned sc-write-mode on upgrade. This is safe for the current all-memiavl fleet (auto derives memiavl_only), but a node legitimately pinned to flatkv_only/evm_migrated (e.g. from a flatkv-era state-sync) would break unless it addssc-write-mode-enable-auto = false. It is documented in toml.go/sc_config.go, but deserves an explicit release-note / operator callout since it flips behavior for existing app.toml files with no edit. - Test-coverage gap: the actual production trigger — an auto node kicking off memiavl_only→migrate_evm when the gov param is raised from 0 — is covered only by rootmulti/composite unit tests. verify_flatkv_evm_migrate.sh boots nodes pinned (enable-auto=false) and flips sc-write-mode to migrate_evm via app.toml, then uses the gov proposal only for the rate, so the auto kick-off path is never exercised end-to-end in integration.
- The genesis-export caveat (the migration subspace has no owning module, so
seid exportmid-migration omits NumKeysToMigratePerBlock and a re-import resets it to 0/paused) is acknowledged in params.go and is not consensus-fatal, but operators forking/recovering via export must re-issue the proposal — worth surfacing in operator docs. - 3 suggestion(s)/nit(s) flagged inline on specific lines.
| // AppHash. 0 (the default until a gov proposal raises it) leaves the migration | ||
| // paused; it is the sole source of the rate (there is no node-local fallback). | ||
| func (app *App) applyMigrationBatchSize(ctx sdk.Context) { | ||
| if app.rootStore == nil { |
There was a problem hiding this comment.
[nit] This if app.rootStore == nil guard is effectively dead code: New() panics unless the commit multistore is a *rootmulti.Store, so app.rootStore is always non-nil here. Harmless as defense-in-depth, but consider dropping it or adding a comment noting it can only be nil in a partially-constructed app (never in normal operation).
| return StateCommitConfig{ | ||
| Enable: true, | ||
| WriteMode: types.MemiavlOnly, | ||
| WriteModeEnableAuto: true, |
There was a problem hiding this comment.
[suggestion] Defaulting WriteModeEnableAuto to true (and treating an absent key as true) means every existing node flips to auto on upgrade, ignoring any explicitly pinned sc-write-mode. This is intentional and safe for the current all-memiavl fleet, but it silently overrides a pinned flatkv_only/evm_migrated node — which the toml comment warns will "fail every commit with a version-mismatch error or silently serve reads from an empty memiavl." Please ensure the upgrade release notes call this out prominently so operators of any legitimately-pinned node add sc-write-mode-enable-auto = false before upgrading.
| # sei-node-0 vote yes | ||
| - cmd: seidbin=seid; chainid=sei; source integration_test/utils/_tx_helpers.sh && submit_tx_and_wait node_admin gov vote $PROPOSAL_ID yes --fees 2000usei | ||
| node: sei-node-0 | ||
| # sei-node-1 vote yes |
There was a problem hiding this comment.
[nit] The # sei-node-1 vote yes comment sits above a stray seid q gov proposal ... .status query (line 62); the actual node-1 vote is on line 63. The intervening status query has no env binding so it's a harmless no-op, but the comment is misplaced — move it down one line (or drop the redundant query) for clarity.
## Summary The state-commitment (SC) store's in-flight `memiavl → flatkv` migration was previously paced by a **node-local** config (`sc-keys-to-migrate-per-block`). Because migration writes feed the AppHash, a per-node rate is consensus-relevant and a divergence risk: two validators draining at different rates fork the chain. This PR makes the per-block migration rate a **module-agnostic governance parameter** (`migration.NumKeysToMigratePerBlock`) that every validator reads from chain state each `BeginBlock` and applies identically. The gov param also serves as the migration *trigger*: it defaults to `0` (paused), and raising it via a param-change proposal starts the drain fleet-wide at a deterministic height. The old node-local config and its fallback are removed entirely — the gov param is now the **sole** source of the rate. ## Key changes ### New generic governance parameter - `app/migration/params.go` — new module-agnostic `migration` params subspace defining `NumKeysToMigratePerBlock` (default `0`), its `KeyTable`, and validation. Deliberately *not* EVM-specific so future module migrations can reuse it. - `app/app.go` — registers the `migration` subspace; stores the `*rootmulti.Store` on the app and fails fast if the (unsupported) legacy commit multistore is in use. - `app/abci.go` — `BeginBlock` reads `NumKeysToMigratePerBlock` from chain state and pushes it into the SC store before the block's first write (`applyMigrationBatchSize`). ### Plumbing: push the rate down to the migration router - `sei-db/state_db/sc/types/types.go` — `Committer` interface gains `SetMigrationBatchSize(int) error`. - `sei-cosmos/storev2/rootmulti/store.go` — forwards `SetMigrationBatchSize` to the SC store; adds `GetMigrationBatchSize` for observability/tests. - `sei-db/state_db/sc/composite/store.go` — holds the gov-set batch size (`atomic.Int64`); `buildRouter` and `SetMigrationBatchSize` use it directly. **Removed the `effectiveMigrationBatchSize` config fallback** — `0` means paused, full stop. - `sei-db/state_db/sc/migration/*` — `Router` interface gains `SetMigrationBatchSize`; all router types implement it. Only the migration manager acts on it; every other router (module/passthrough/dual-write/thread-safe) treats it as a no-op. `router_builder.go` now allows a `0` batch size (paused) instead of rejecting it. ### Removed the node-local config + fallback - `sei-db/config/sc_config.go`, `sei-db/config/toml.go`, `docker/localnode/config/app.toml`, `app/seidb.go` — dropped the `sc-keys-to-migrate-per-block` field, default, validation, TOML entry, and flag parsing. ### Tests - `app/migration/params_test.go` — unit tests for the new param (default, key-table registration, validation). - `app/abci_test.go` — subspace registration, `applyMigrationBatchSize` (defaults/set/clamp), and cross-block "param set in block N takes effect in N+1" coverage. - SC/rootmulti unit tests now seed the rate via `SetMigrationBatchSize(...)` after construction instead of the removed config field. The random-migration framework re-applies the rate on every store (re)open, faithfully mirroring how production re-pushes the gov param after each restart. ### Integration test - `integration_test/gov_module/` — new `ParameterChangeProposal` test that raises `NumKeysToMigratePerBlock` and asserts it takes effect. - `integration_test/contracts/verify_flatkv_evm_migrate.sh` — rewritten to drive the migration via a governance param-change proposal (submit → deposit → quorum vote → poll for `PASSED` → verify on-chain) instead of injecting the removed config. ### Docs - `AGENTS.md` — Code style now mandates running both `gofmt` and `goimports` on every touched `.go` file. --------- Co-authored-by: Masih H. Derkani <m@derkani.org>

Summary
The state-commitment (SC) store's in-flight
memiavl → flatkvmigration was previously paced by a node-local config (sc-keys-to-migrate-per-block). Because migration writes feed the AppHash, a per-node rate is consensus-relevant and a divergence risk: two validators draining at different rates fork the chain.This PR makes the per-block migration rate a module-agnostic governance parameter (
migration.NumKeysToMigratePerBlock) that every validator reads from chain state eachBeginBlockand applies identically. The gov param also serves as the migration trigger: it defaults to0(paused), and raising it via a param-change proposal starts the drain fleet-wide at a deterministic height.The old node-local config and its fallback are removed entirely — the gov param is now the sole source of the rate.
Key changes
New generic governance parameter
app/migration/params.go— new module-agnosticmigrationparams subspace definingNumKeysToMigratePerBlock(default0), itsKeyTable, and validation. Deliberately not EVM-specific so future module migrations can reuse it.app/app.go— registers themigrationsubspace; stores the*rootmulti.Storeon the app and fails fast if the (unsupported) legacy commit multistore is in use.app/abci.go—BeginBlockreadsNumKeysToMigratePerBlockfrom chain state and pushes it into the SC store before the block's first write (applyMigrationBatchSize).Plumbing: push the rate down to the migration router
sei-db/state_db/sc/types/types.go—Committerinterface gainsSetMigrationBatchSize(int) error.sei-cosmos/storev2/rootmulti/store.go— forwardsSetMigrationBatchSizeto the SC store; addsGetMigrationBatchSizefor observability/tests.sei-db/state_db/sc/composite/store.go— holds the gov-set batch size (atomic.Int64);buildRouterandSetMigrationBatchSizeuse it directly. Removed theeffectiveMigrationBatchSizeconfig fallback —0means paused, full stop.sei-db/state_db/sc/migration/*—Routerinterface gainsSetMigrationBatchSize; all router types implement it. Only the migration manager acts on it; every other router (module/passthrough/dual-write/thread-safe) treats it as a no-op.router_builder.gonow allows a0batch size (paused) instead of rejecting it.Removed the node-local config + fallback
sei-db/config/sc_config.go,sei-db/config/toml.go,docker/localnode/config/app.toml,app/seidb.go— dropped thesc-keys-to-migrate-per-blockfield, default, validation, TOML entry, and flag parsing.Tests
app/migration/params_test.go— unit tests for the new param (default, key-table registration, validation).app/abci_test.go— subspace registration,applyMigrationBatchSize(defaults/set/clamp), and cross-block "param set in block N takes effect in N+1" coverage.SetMigrationBatchSize(...)after construction instead of the removed config field. The random-migration framework re-applies the rate on every store (re)open, faithfully mirroring how production re-pushes the gov param after each restart.Integration test
integration_test/gov_module/— newParameterChangeProposaltest that raisesNumKeysToMigratePerBlockand asserts it takes effect.integration_test/contracts/verify_flatkv_evm_migrate.sh— rewritten to drive the migration via a governance param-change proposal (submit → deposit → quorum vote → poll forPASSED→ verify on-chain) instead of injecting the removed config.Docs
AGENTS.md— Code style now mandates running bothgofmtandgoimportson every touched.gofile.