Skip to content

feat(rpc): add /health endpoint on the metrics server#462

Open
MegaRedHand wants to merge 3 commits into
mainfrom
feat/metrics-health
Open

feat(rpc): add /health endpoint on the metrics server#462
MegaRedHand wants to merge 3 commits into
mainfrom
feat/metrics-health

Conversation

@MegaRedHand

Copy link
Copy Markdown
Collaborator

Adds a plain GET /health liveness probe on the metrics port (5054), alongside /metrics.

The handler reuses the existing get_health function, so the response body and content-type are identical to the API server's /lean/v0/health. This lets k8s/ops probe the metrics port independently of the API server.

  • GET /health → 200 {"status":"healthy","service":"lean-rpc-api"} with application/json
  • GET /metrics unchanged

Two unit tests cover the new route and regression-check /metrics.

Stacked on #454.

Split the monolithic lib.rs API router into focused modules: core.rs
holds the finalized-state/finalized-block/justified-checkpoint handlers
and shared response helpers; blocks.rs, fork_choice.rs, and admin.rs
each expose pub(crate) routes() -> Router<Store>. build_api_router
merges them with .with_state(store). No behavior change.
…eckpoint handler name

- Rename `mod core` → `mod base` to avoid shadowing Rust's built-in `core` crate
- Fix stale comment on `get_latest_finalized_block`: genesis returns 200 with a
  synthesized blank proof, not 404
- Rename handler `get_latest_justified_state` → `get_latest_justified_checkpoint`
  to reflect that it returns a Checkpoint, not a State
@github-actions

Copy link
Copy Markdown

🤖 Kimi Code Review

The PR looks good. This is a straightforward change that wires up an existing get_health handler to the router and adds appropriate integration tests.

Minor observations:

crates/net/rpc/src/metrics.rs

  • Line 38-40: The body_bytes test helper uses .unwrap() which will panic on body collection errors. This is acceptable for tests, though consider returning Result if you want explicit test failure messages rather than panics.

  • Line 60-62: The JSON body assertion uses exact byte string matching (br#"..."#). This is brittle against formatting changes (whitespace, field ordering). Consider parsing the JSON and checking field values, or use serde_json::from_slice to validate structure if flexibility is desired. For a fixed health check response this is fine, but document the expectation that the format is stable.

  • Line 64-77: Good addition of a regression test ensuring /metrics still responds after adding the /health route.

Security/Operational notes:

  • The health endpoint exposes service name ("lean-rpc-api") which is low-risk, but ensure this doesn't leak internal versioning or commit hashes that could aid targeted attacks. The current implementation appears static and safe.
  • Consider adding a readiness probe endpoint (distinct from liveness) if this will run in Kubernetes, though that's outside this PR's scope.

Style:

  • Tests are well-structured with clear naming (health_returns_200_with_json_body).
  • Proper use of tower::ServiceExt::oneshot for axum testing.

Consensus relevance: None (RPC layer change, no consensus logic).

Verdict: Approved with optional suggestion to loosen JSON assertion if future formatting changes are anticipated.


Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

@github-actions

Copy link
Copy Markdown

🤖 Codex Code Review

No findings.

crates/net/rpc/src/metrics.rs:5-31 only adds a /health route to the metrics router and reuses the existing get_health handler already exposed elsewhere, so I don’t see a correctness, security, performance, or consensus-layer risk from this change. The added tests in crates/net/rpc/src/metrics.rs:34-88 are appropriate and cover the new route plus the existing /metrics path.

Residual risk: I could not run the targeted RPC tests in this environment because cargo needed to fetch the lean-multisig git dependency and outbound network access is blocked, so this review is based on static inspection rather than execution.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@greptile-apps

greptile-apps Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds a GET /health liveness probe to the existing Prometheus metrics server (port 5054) by routing it to the already-present get_health handler, enabling ops/k8s to probe the metrics port independently of the API server.

  • Adds .route("/health", get(get_health)) to start_prometheus_metrics_api(), reusing the handler already used by /lean/v0/health on the API server — no logic duplication.
  • Includes two unit tests: one verifies the new endpoint returns 200 with the correct JSON body and application/json content-type; the other is a regression guard confirming /metrics still returns 200.

Confidence Score: 5/5

One-line router addition reusing an existing, already-tested handler; the change is self-contained and the new tests cover both the new route and the existing one.

The change is a single .route() call wiring an already-exercised handler. Both added tests pass structurally (the /metrics regression test and the full /health test checking status, content-type, and body), and the handler itself was pre-existing and used on the API server path.

No files require special attention.

Important Files Changed

Filename Overview
crates/net/rpc/src/metrics.rs Adds /health route to the metrics router reusing the existing get_health handler, plus two unit tests covering the new route and a regression check on /metrics.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Client as k8s / ops
    participant MetricsServer as Metrics Server :5054

    Client->>MetricsServer: GET /health
    MetricsServer-->>Client: "200 application/json {"status":"healthy","service":"lean-rpc-api"}"

    Client->>MetricsServer: GET /metrics
    MetricsServer-->>Client: 200 text/plain (Prometheus exposition)
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Client as k8s / ops
    participant MetricsServer as Metrics Server :5054

    Client->>MetricsServer: GET /health
    MetricsServer-->>Client: "200 application/json {"status":"healthy","service":"lean-rpc-api"}"

    Client->>MetricsServer: GET /metrics
    MetricsServer-->>Client: 200 text/plain (Prometheus exposition)
Loading

Reviews (1): Last reviewed commit: "feat(rpc): add /health endpoint on the m..." | Re-trigger Greptile

@github-actions

Copy link
Copy Markdown

🤖 Claude Code Review

Now I have a clear picture. Let me compile the verified findings.

Key observations from the codebase:

  • base.rs:13 already routes /lean/v0/healthcrate::metrics::get_health, so the handler is genuinely shared (PR description is accurate)
  • lib.rs has a test_utils module but no shared body_bytes helper — the pattern is used inline 7+ times
  • When api_port == metrics_port, the merged router has both /lean/v0/health and /health at different paths — no conflict
  • JSON_CONTENT_TYPE is "application/json; charset=utf-8" — all ASCII, from_static is safe

PR Review: feat(rpc): add /health endpoint on the metrics server

This is a small, focused PR: it wires the existing get_health handler onto the metrics router's /health route and adds two unit tests. The change is straightforward and the handler genuinely is shared with the API server's /lean/v0/health. Three points worth addressing:


crates/net/rpc/src/metrics.rs:87metrics_still_returns_200 only checks the HTTP status code

The regression test for /metrics does not validate content-type or body structure. A future change that accidentally returns HTML or an empty body at /metrics (e.g. the unwrap_or_default() path silently returning an empty string) would pass this test. The health_returns_200_with_json_body test in the same module shows the better pattern — at minimum, verify content-type: text/plain and a non-empty body.


crates/net/rpc/src/metrics.rs:43-44body_bytes helper belongs in the shared test_utils module

The exact pattern resp.into_body().collect().await.unwrap().to_bytes() already appears at least 7 times inline in lib.rs tests (lines 226, 272, 330, 351, 367, 479, 527). The new helper in metrics.rs duplicates the concept but lives in a module-local scope. Moving it into the existing crate::test_utils block in lib.rs would let all test modules share it rather than each re-defining or inlining it.


crates/net/rpc/src/metrics.rs:12 — health response body reports "service":"lean-rpc-api" on the metrics port

The health response body is {"status":"healthy","service":"lean-rpc-api"}. When probed on the metrics port (:5054), the service field identifies this as the lean-rpc-api, which is the API server's identity. The PR description calls this an intentional reuse, but for liveness probes keyed on the metrics port the response is semantically misleading — ops tooling or alerting that inspects the service field to distinguish API health from metrics health will get the wrong label. Consider either returning a distinct name (lean-metrics) on this route, or accepting the current behaviour and documenting the deliberate choice.


Automated review by Claude (Anthropic) · sonnet · custom prompt

Adds a plain GET /health liveness probe on the metrics port (5054),
reusing the existing get_health handler. Covers the route with two
unit tests: one asserting status/content-type/body, one regression
check that /metrics still returns 200.
@MegaRedHand MegaRedHand force-pushed the feat/metrics-health branch from f0a68ae to e958de0 Compare June 24, 2026 19:02
Base automatically changed from feat/lean-api-router-scaffold to main June 24, 2026 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant