Skip to content

feat(rpc): add GET /lean/v0/node/{identity,syncing}#458

Open
MegaRedHand wants to merge 3 commits into
mainfrom
feat/lean-api-node
Open

feat(rpc): add GET /lean/v0/node/{identity,syncing}#458
MegaRedHand wants to merge 3 commits into
mainfrom
feat/lean-api-node

Conversation

@MegaRedHand

Copy link
Copy Markdown
Collaborator

Adds two node-info endpoints:

  • GET /lean/v0/node/identity — returns peer ID and ENR.
  • GET /lean/v0/node/syncing — returns current sync status (head slot, finalized slot, sync lag).

Both are useful for health dashboards and orchestration tooling. Has unit tests and passed clippy.

Stacked on #454.

Split the monolithic lib.rs API router into focused modules: core.rs
holds the finalized-state/finalized-block/justified-checkpoint handlers
and shared response helpers; blocks.rs, fork_choice.rs, and admin.rs
each expose pub(crate) routes() -> Router<Store>. build_api_router
merges them with .with_state(store). No behavior change.
Exposes node identity (build version) and sync status (head_slot,
sync_distance, is_syncing) derived from store.time() / INTERVALS_PER_SLOT.
@github-actions

Copy link
Copy Markdown

🤖 Codex Code Review

Findings

  1. get_syncing is not actually measuring wall-clock lag, so it can report a stale node as synced after restart/offline gaps. In node.rs:24-28 the code derives wall_slot from store.time(), but the consensus code already documents that store.time() lags real wall clock after downtime and is not suitable for this purpose in lib.rs:92-97. A node restarted far behind can therefore answer is_syncing = false until the first tick catches it up. This endpoint should either compute current slot from real time (genesis_time + unix_now_ms) or expose the actor-maintained sync state directly.

  2. /lean/v0/node/syncing diverges from the node’s real sync-status semantics, so operators can get the opposite answer from what the client is using internally. node.rs:20-31 only checks head_slot lag against SYNC_LAG_THRESHOLD, while the real SyncStatusTracker in sync_status.rs:56-62 also applies the network-stall override and hysteresis band. During a chain-wide stall, or while recovering near the threshold, the RPC endpoint can say “syncing” even when the node has switched to Synced and allows duties. If this endpoint is meant for automation, it should share the same source of truth.

  3. /lean/v0/node/identity exposes only the crate semver, not the actual client build identity. node.rs:34-36 returns env!("CARGO_PKG_VERSION"), while the binary and metrics use the richer CLIENT_VERSION string with git SHA/target info in version.rs:1-12 and main.rs:163-174. That is not consensus-critical, but it weakens operational debugging and provenance checks.

No direct fork-choice, attestation, STF, XMSS, or SSZ logic changed in this PR; the risk is mainly incorrect operational signaling via the new RPC surface.

I could not run the added tests here because cargo test hit a sandboxed rustup write failure under /home/runner/.rustup.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@github-actions

Copy link
Copy Markdown

🤖 Kimi Code Review

The PR introduces RPC endpoints for node syncing status and identity. Overall the implementation is sound, but there are layout, correctness, and testing gaps to address.

1. Import placement breaks documentation association

File: crates/blockchain/src/lib.rs (Line 52)

The new pub use sync_status::SYNC_LAG_THRESHOLD; is inserted between MAX_ATTESTATIONS_DATA and the doc comment for GOSSIP_ATTESTATION_FUTURE_TOLERANCE. This separates the doc comment from its associated constant.

Fix: Move the export to the end of the re-exports block or group it with other sync-related exports.

2. Race condition in sync status calculation

File: crates/net/rpc/src/node.rs (Lines 25-28)

let head_slot = store.head_slot();
let wall_slot = store.time() / INTERVALS_PER_SLOT;
let sync_distance = wall_slot.saturating_sub(head_slot);

head_slot and store.time() are read in separate calls. If the chain advances between these calls, head_slot could exceed wall_slot, causing sync_distance to be 0 (due to saturating_sub) and incorrectly reporting is_syncing: false even when the node is behind.

Fix: Use a single atomic read or snapshot method that returns both values together, or document that this endpoint provides a best-effort snapshot (the current comment alludes to this but the race condition should be explicitly noted).

3. Insufficient test coverage for sync edge cases

File: crates/net/rpc/src/node.rs (Tests)

The node_syncing_reports_head_slot test only validates the fresh store case (slot 0). It does not test:

  • The threshold boundary (exactly 4 slots behind vs 5 slots behind)
  • The case where head_slot > wall_slot (clock skew or reorg)
  • Non-zero sync distances

Fix: Add tests for sync_distance = SYNC_LAG_THRESHOLD and sync_distance = SYNC_LAG_THRESHOLD + 1 to verify the boolean threshold logic.

4. Clock skew vulnerability

File: crates/net/rpc/src/node.rs (Line 28)

If the system clock is slow (wall_slot < head_slot), saturating_sub returns 0, reporting "not syncing". If validators rely on this endpoint (rather than the internal SyncStatusTracker), they might attest to stale heads.

Fix: Consider logging a warning when head_slot > wall_slot to detect clock skew, or return an is_clock_skewed flag in the response.

5. Minor: Unused test import path

File: crates/net/rpc/src/node.rs (Test module)

The import use crate::test_utils::create_test_state; assumes test_utils exists. Ensure this module is compiled only under #[cfg(test)] to avoid leaking test helpers into production builds.

Consensus Safety Note

The endpoint documentation correctly notes this is a "simplified" stateless check without hysteresis. Ensure no validator logic uses this RPC endpoint for attestation decisions; it should only be used for monitoring. The production path should use the SyncStatusTracker (referenced in sync_status.rs) which includes stall-override and hysteresis logic required for consensus safety.

Verdict

Code is functionally correct for a monitoring endpoint. Address Item 1 (layout) and Item 2 (race condition documentation or fix) before merging. Items 3 and 4 are recommended follow-ups.


Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

@greptile-apps

greptile-apps Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds two read-only HTTP endpoints — GET /lean/v0/node/syncing and GET /lean/v0/node/identity — backed by the existing Store and wired into the main API router.

  • get_syncing computes a stateless head-vs-wall-clock lag and compares it to SYNC_LAG_THRESHOLD; it intentionally omits the hysteresis and network-stall override logic present in SyncStatusTracker, which means is_syncing can disagree with the operational sync gate in edge cases.
  • get_identity currently returns only the cargo package version; the PR description mentions peer ID and ENR, and the syncing response description mentions a finalized slot, but neither field appears in the implementation.

Confidence Score: 3/5

The new endpoints are additive and read-only, so they cannot corrupt state, but is_syncing can disagree with the node's actual operational sync gate in concrete scenarios.

The main concern is in get_syncing: omitting the network-stall override and hysteresis band means is_syncing diverges from what the node actually uses to gate validator duties — during a network stall the tracker returns Synced while this endpoint may return is_syncing: true. The identity response is also missing the peer_id and enr fields the PR description promises, and the syncing response omits the finalized_slot it claims to include.

crates/net/rpc/src/node.rs — sync logic, identity fields, and syncing response shape all need alignment with either the spec or the PR description.

Important Files Changed

Filename Overview
crates/net/rpc/src/node.rs New file implementing /lean/v0/node/syncing and /lean/v0/node/identity endpoints; sync logic diverges from SyncStatusTracker (no hysteresis/stall override), identity response is missing peer_id/enr, and syncing response is missing finalized_slot
crates/blockchain/src/sync_status.rs SYNC_LAG_THRESHOLD made pub to allow reuse in the RPC layer; no logic changes
crates/blockchain/src/lib.rs Re-exports SYNC_LAG_THRESHOLD at the crate root for downstream consumers
crates/net/rpc/src/lib.rs Registers node::routes() in build_api_router; minimal, low-risk change

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Client
    participant RPC as axum Router
    participant NodeHandler as node.rs handlers
    participant Store

    Client->>RPC: GET /lean/v0/node/syncing
    RPC->>NodeHandler: get_syncing(State(store))
    NodeHandler->>Store: store.head_slot()
    Store-->>NodeHandler: head_slot: u64
    NodeHandler->>Store: store.time()
    Store-->>NodeHandler: intervals: u64
    NodeHandler->>NodeHandler: "wall_slot = time / INTERVALS_PER_SLOT"
    NodeHandler->>NodeHandler: "sync_distance = wall_slot.saturating_sub(head_slot)"
    NodeHandler->>NodeHandler: "is_syncing = sync_distance > SYNC_LAG_THRESHOLD (4)"
    NodeHandler-->>RPC: "JSON { is_syncing, head_slot, sync_distance }"
    RPC-->>Client: 200 OK

    Client->>RPC: GET /lean/v0/node/identity
    RPC->>NodeHandler: get_identity()
    NodeHandler->>NodeHandler: "version = env!(CARGO_PKG_VERSION)"
    NodeHandler-->>RPC: "JSON { version }"
    RPC-->>Client: 200 OK
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Client
    participant RPC as axum Router
    participant NodeHandler as node.rs handlers
    participant Store

    Client->>RPC: GET /lean/v0/node/syncing
    RPC->>NodeHandler: get_syncing(State(store))
    NodeHandler->>Store: store.head_slot()
    Store-->>NodeHandler: head_slot: u64
    NodeHandler->>Store: store.time()
    Store-->>NodeHandler: intervals: u64
    NodeHandler->>NodeHandler: "wall_slot = time / INTERVALS_PER_SLOT"
    NodeHandler->>NodeHandler: "sync_distance = wall_slot.saturating_sub(head_slot)"
    NodeHandler->>NodeHandler: "is_syncing = sync_distance > SYNC_LAG_THRESHOLD (4)"
    NodeHandler-->>RPC: "JSON { is_syncing, head_slot, sync_distance }"
    RPC-->>Client: 200 OK

    Client->>RPC: GET /lean/v0/node/identity
    RPC->>NodeHandler: get_identity()
    NodeHandler->>NodeHandler: "version = env!(CARGO_PKG_VERSION)"
    NodeHandler-->>RPC: "JSON { version }"
    RPC-->>Client: 200 OK
Loading
Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
crates/net/rpc/src/node.rs:7-18
**`is_syncing` diverges from `SyncStatusTracker` in two concrete cases**

The `SyncStatusTracker` applies two overrides that this stateless endpoint ignores: (1) **network-stall override** — when `network_lag > 8`, the tracker forces `syncing = false` even if the local head is far behind; this endpoint will still return `is_syncing: true`. (2) **hysteresis** — once syncing, the tracker keeps `syncing = true` until `head_lag <= threshold - band = 2`; this endpoint flips back to `false` at `head_lag <= 4`. Dashboards or orchestrators relying on this field could therefore disagree with the actual operational sync gate (used to suppress validator duties), leading to false alerts or missed gates.

### Issue 2 of 3
crates/net/rpc/src/node.rs:15-18
**`IdentityResponse` missing `peer_id` and `enr` fields**

The PR description states this endpoint "returns peer ID and ENR," but `IdentityResponse` only exposes `version` (the cargo package version). Consumers expecting a peer identifier or ENR for node discovery will find neither. Either the response struct needs `peer_id` and `enr` fields, or the endpoint description should be corrected to reflect its current scope.

### Issue 3 of 3
crates/net/rpc/src/node.rs:8-13
**`finalized_slot` missing from `SyncingResponse`**

The PR description lists "finalized slot" as one of the three pieces of sync info this endpoint returns, but `SyncingResponse` only has `head_slot` and `sync_distance`. If the omission is intentional, the description should be updated; if it was accidentally left out, `store.latest_finalized().slot` is already accessible and the field can be added straightforwardly.

Reviews (1): Last reviewed commit: "feat(rpc): add GET /lean/v0/node/{identi..." | Re-trigger Greptile

Comment on lines +7 to +18

#[derive(Serialize)]
struct SyncingResponse {
is_syncing: bool,
head_slot: u64,
sync_distance: u64,
}

#[derive(Serialize)]
struct IdentityResponse {
version: &'static str,
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 is_syncing diverges from SyncStatusTracker in two concrete cases

The SyncStatusTracker applies two overrides that this stateless endpoint ignores: (1) network-stall override — when network_lag > 8, the tracker forces syncing = false even if the local head is far behind; this endpoint will still return is_syncing: true. (2) hysteresis — once syncing, the tracker keeps syncing = true until head_lag <= threshold - band = 2; this endpoint flips back to false at head_lag <= 4. Dashboards or orchestrators relying on this field could therefore disagree with the actual operational sync gate (used to suppress validator duties), leading to false alerts or missed gates.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/net/rpc/src/node.rs
Line: 7-18

Comment:
**`is_syncing` diverges from `SyncStatusTracker` in two concrete cases**

The `SyncStatusTracker` applies two overrides that this stateless endpoint ignores: (1) **network-stall override** — when `network_lag > 8`, the tracker forces `syncing = false` even if the local head is far behind; this endpoint will still return `is_syncing: true`. (2) **hysteresis** — once syncing, the tracker keeps `syncing = true` until `head_lag <= threshold - band = 2`; this endpoint flips back to `false` at `head_lag <= 4`. Dashboards or orchestrators relying on this field could therefore disagree with the actual operational sync gate (used to suppress validator duties), leading to false alerts or missed gates.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +15 to +18
#[derive(Serialize)]
struct IdentityResponse {
version: &'static str,
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 IdentityResponse missing peer_id and enr fields

The PR description states this endpoint "returns peer ID and ENR," but IdentityResponse only exposes version (the cargo package version). Consumers expecting a peer identifier or ENR for node discovery will find neither. Either the response struct needs peer_id and enr fields, or the endpoint description should be corrected to reflect its current scope.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/net/rpc/src/node.rs
Line: 15-18

Comment:
**`IdentityResponse` missing `peer_id` and `enr` fields**

The PR description states this endpoint "returns peer ID and ENR," but `IdentityResponse` only exposes `version` (the cargo package version). Consumers expecting a peer identifier or ENR for node discovery will find neither. Either the response struct needs `peer_id` and `enr` fields, or the endpoint description should be corrected to reflect its current scope.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +8 to +13
#[derive(Serialize)]
struct SyncingResponse {
is_syncing: bool,
head_slot: u64,
sync_distance: u64,
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 finalized_slot missing from SyncingResponse

The PR description lists "finalized slot" as one of the three pieces of sync info this endpoint returns, but SyncingResponse only has head_slot and sync_distance. If the omission is intentional, the description should be updated; if it was accidentally left out, store.latest_finalized().slot is already accessible and the field can be added straightforwardly.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/net/rpc/src/node.rs
Line: 8-13

Comment:
**`finalized_slot` missing from `SyncingResponse`**

The PR description lists "finalized slot" as one of the three pieces of sync info this endpoint returns, but `SyncingResponse` only has `head_slot` and `sync_distance`. If the omission is intentional, the description should be updated; if it was accidentally left out, `store.latest_finalized().slot` is already accessible and the field can be added straightforwardly.

How can I resolve this? If you propose a fix, please make it concise.

@github-actions

Copy link
Copy Markdown

🤖 Claude Code Review

All three top candidates confirmed. Here is the review.


Review: feat(rpc): add GET /lean/v0/node/{identity,syncing}

This PR adds two new HTTP endpoints to the API server: /lean/v0/node/syncing for real-time sync status and /lean/v0/node/identity for node identification. The implementation is clean and well-structured, but three concrete correctness issues were found — all confirmed against the codebase.


crates/net/rpc/src/node.rs:25store.time() is not wall-clock time; syncing status is wrong after any offline gap

get_syncing computes:

let wall_slot = store.time() / INTERVALS_PER_SLOT;

store.time() is a stored counter incremented only on each tick. After an offline gap it freezes at the last pre-gap value — wall_slot ends up equal to head_slot, sync_distance collapses to 0, and the endpoint reports is_syncing: false for a node that is actively catching up. This failure mode is explicitly documented in the codebase: lib.rs:93 has the comment "store.time() doesn't work here: after an offline gap it lags wall-clock by exactly the gap we need to catch up through" — which is why BlockChain::new uses unix_now_ms() instead.

Fix: derive the current slot the same way lib.rs:95–97 does:

let now_ms = unix_now_ms();
let wall_slot = (now_ms.saturating_sub(store.config().genesis_time * 1000)
    / MILLISECONDS_PER_SLOT) as u64;

crates/net/rpc/src/node.rs:14–17/node/identity delivers the wrong data; peer ID and ENR are absent

The PR description states the endpoint "returns peer ID and ENR." The implementation returns only:

struct IdentityResponse { version: &'static str }

The Store type has no peer ID or ENR fields, and start_rpc_server receives no P2P handle — the libp2p PeerId and local ENR live exclusively inside the swarm (built in crates/net/p2p/src/lib.rs), which is never passed to the RPC layer. The test only asserts json["version"].is_string(), so it validates the wrong shape.

Either the endpoint name (/identity) and the PR description should be corrected to reflect that this is a /version endpoint, or the P2P identity must be threaded through to the RPC layer — which requires adding peer_id and enr to BuiltSwarm and passing them into start_rpc_server. Until then, the endpoint is misnamed.


crates/net/rpc/src/node.rs:8–12SyncingResponse is missing finalized_slot promised by the PR description

The PR description says the endpoint returns "head slot, finalized slot, sync lag." The struct has no finalized_slot:

struct SyncingResponse { is_syncing: bool, head_slot: u64, sync_distance: u64 }

store.latest_finalized().slot is directly accessible from the handler — the omission is accidental. Fix: add finalized_slot: u64 to the struct and populate it. The test (node_syncing_reports_head_slot) should also assert json["finalized_slot"] == 0 for the initial state.


Minor — behavioral divergence from SyncStatusTracker is real but documented

The stateless endpoint has no hysteresis band (SYNC_HYSTERESIS_BAND = 2) and no network-stall override (NETWORK_STALL_THRESHOLD = 8). A node at lag = 3–4 slots will see is_syncing flap per-request, while the internal tracker holds its previous state. During a network-wide stall, the tracker correctly reports Synced; this endpoint may report Syncing. The doc-comment on line 21 acknowledges this, but external consumers (dashboards, orchestrators) that mix both signals will see contradictions. Consider either delegating to the tracker's result (store it and expose it) or expand the comment to warn callers about this divergence.


Automated review by Claude (Anthropic) · sonnet · custom prompt

… finalized_slot)

- Replace store.time()-based wall_slot with real SystemTime::now() so
  sync_distance is correct after an offline gap (store.time() freezes
  during downtime, causing false is_syncing=false on restart)
- Add finalized_slot field to SyncingResponse from store.latest_finalized()
- Split node syncing tests: far-behind case (genesis_time=1000) asserts
  is_syncing=true; up-to-date case (genesis_time year 2100) asserts
  is_syncing=false and sync_distance=0
- Add one-line doc comment to SYNC_LAG_THRESHOLD re-export in blockchain lib.rs
Base automatically changed from feat/lean-api-router-scaffold to main June 24, 2026 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant