Skip to content

Add chain_id label to OTel metrics#3692

Merged
amir-deris merged 2 commits into
mainfrom
amir/plt-783-fix-otel-chain-id
Jul 1, 2026
Merged

Add chain_id label to OTel metrics#3692
amir-deris merged 2 commits into
mainfrom
amir/plt-783-fix-otel-chain-id

Conversation

@amir-deris

@amir-deris amir-deris commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Describe your changes

Attaches a chain_id label to every OTel metric series emitted by the chain.

Previously SetupOtelMetricsProvider created a meter provider with no resource
attributes, so metrics scraped via Prometheus carried no way to distinguish
which chain they came from — a problem when multiple chains scrape into the
same backend.

This PR:

  • Adds a chainID parameter to SetupOtelMetricsProvider and wires
    bApp.ChainID in at application construction (app/app.go).
  • Builds an OTel resource with a chain_id attribute and registers it on the
    meter provider.
  • Uses WithResourceAsConstantLabels (scoped to an allow-list of just
    chain_id) so the attribute surfaces as a constant label on every scraped
    Prometheus series.
  • Fails closed: returns an error when chainID is empty rather than emitting
    unlabeled metrics.

Testing

Adds utils/metrics/metrics_util_test.go:

  • TestSetupOtelMetricsProviderRequiresChainID — empty chainID returns an error.
  • TestSetupOtelMetricsProviderAttachesChainID — emits a metric through the
    installed provider and asserts the scraped Prometheus series carries the
    expected chain_id label.

Linear

PLT-783

@amir-deris amir-deris self-assigned this Jul 1, 2026
@amir-deris amir-deris changed the title Added chain-id to otel metrics Add chain_id label to OTel metrics Jul 1, 2026
@cursor

cursor Bot commented Jul 1, 2026

Copy link
Copy Markdown

PR Summary

Low Risk
Observability-only change to metric labeling; no consensus, auth, or state logic. Setup errors on empty chain ID are logged at app startup without blocking node boot.

Overview
OTel metrics now carry a chain_id label so Prometheus series from different chains can be distinguished when they share one scrape backend.

SetupOtelMetricsProvider takes chainID, rejects an empty value, builds an OTel resource with a chain_id attribute, attaches it to the meter provider, and uses WithResourceAsConstantLabels (allow-list chain_id only) so that attribute appears as a constant label on every scraped OTel series. Application startup passes bApp.ChainID into setup at construction.

Tests cover the empty-chainID error path and assert a probe counter scraped via Prometheus includes the expected chain_id label.

Reviewed by Cursor Bugbot for commit 6cbf270. Bugbot is set up for automated code reviews on this repo. Configure here.

@amir-deris amir-deris requested a review from bdchatham July 1, 2026 19:07
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedJul 1, 2026, 7:17 PM

@amir-deris amir-deris requested a review from masih July 1, 2026 19:08
@codecov

codecov Bot commented Jul 1, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 85.00000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.34%. Comparing base (7529e2e) to head (6cbf270).

Files with missing lines Patch % Lines
utils/metrics/metrics_util.go 89.47% 1 Missing and 1 partial ⚠️
app/app.go 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3692      +/-   ##
==========================================
- Coverage   59.29%   58.34%   -0.95%     
==========================================
  Files        2272     2185      -87     
  Lines      188193   178443    -9750     
==========================================
- Hits       111580   104121    -7459     
+ Misses      66560    65072    -1488     
+ Partials    10053     9250     -803     
Flag Coverage Δ
sei-chain-pr 51.16% <85.00%> (-13.81%) ⬇️
sei-db 70.41% <ø> (ø)
sei-db-state-db ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
app/app.go 71.28% <0.00%> (ø)
utils/metrics/metrics_util.go 88.53% <89.47%> (-0.03%) ⬇️

... and 87 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@seidroid seidroid Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small, well-scoped change that attaches a chain_id constant label to every OTel/Prometheus metric series, with good tests. No blocking issues found; a couple of minor, non-blocking notes.

Findings: 0 blocking | 4 non-blocking | 0 posted inline

Blockers

  • None at the file/PR level.

Non-blocking

  • The Cursor second-opinion review file (cursor-review.md) was empty — that pass produced no output. Codex reported no material findings (it also could not run tests due to a Go 1.25.6 toolchain/network restriction).
  • Fail-closed behavior is safe in practice: baseapp.NewBaseApp already panics when ChainID is empty (sei-cosmos/baseapp/baseapp.go:304-306), so bApp.ChainID is guaranteed non-empty by the time SetupOtelMetricsProvider is called. The empty-string guard is effectively unreachable defensive code — fine to keep, but worth knowing it won't silently drop metrics in a running node.
  • The chain_id value comes from the --chain-id CLI flag / client.toml (used to seed bApp.ChainID at construction), not from the genesis ChainId seen at InitChain (abci.go:39). If the two ever diverge, the label reflects the flag value. Likely intended, but worth a comment.
  • Test hygiene nit: TestSetupOtelMetricsProviderAttachesChainID installs a real global otel MeterProvider and registers a collector on prometheus.DefaultGatherer as a persistent side effect. This is fine for the single test that does it, but be mindful if future tests in the same package add another SetupOtelMetricsProvider call (duplicate default-registry registration could error).

@seidroid seidroid Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, well-tested change that attaches a chain_id constant label to every OTel/Prometheus metric series. No blocking issues; a couple of minor observations only.

Findings: 0 blocking | 3 non-blocking | 0 posted inline

Blockers

  • None at the file/PR level.

Non-blocking

  • Behavior change worth noting: SetupOtelMetricsProvider now returns an error path (empty chainID or a future resource.Merge failure), and app.go only logs the error. On error, no meter provider is installed at all, whereas previously one was always set up. In practice this is safe — bApp.ChainID is guaranteed non-empty because NewBaseApp panics when --chain-id is unset (sei-cosmos/baseapp/baseapp.go:305) — so the new empty-string guard is effectively unreachable defensive code.
  • Codex second-opinion review reported no material findings (it also noted it could not run focused tests due to sandboxed network access blocking the Go 1.25.6 toolchain download). The Cursor review file was empty, so that pass produced no output.
  • TestSetupOtelMetricsProviderAttachesChainID relies on global state (otel global MeterProvider + prometheus.DefaultGatherer/DefaultRegisterer). It works today because only one test installs a provider, but registering the OTel Prometheus exporter against the default registerer is process-global and could become order/parallelism-sensitive if more provider tests are added later; consider a dedicated registry if this grows.

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — straightforward observability change that adds a chain_id label to OTel metrics.

Extended reasoning...

Overview

This PR adds a chain_id label to OTel metrics so Prometheus series from multiple chains can be distinguished when they share the same scrape backend. The change is limited to three files: utils/metrics/metrics_util.go (adds a chainID parameter to SetupOtelMetricsProvider, builds an OTel resource with chain_id, and uses WithResourceAsConstantLabels scoped to just chain_id), app/app.go (passes bApp.ChainID at construction), and a new metrics_util_test.go covering both the empty-chainID error path and the label attachment.

Security risks

None identified. This is observability-only plumbing — no consensus, auth, crypto, or state logic is touched. The empty-chainID path fails closed by returning an error rather than silently emitting unlabeled metrics. The startup call in app/app.go logs the error rather than crashing, which is consistent with the prior behavior of that call site.

Level of scrutiny

Low. The change is small, mechanical, and follows the OTel SDK's documented pattern for constant labels via a resource attribute plus an allow-list filter. The Prometheus namespace and translation strategy are unchanged, so existing series names are unaffected apart from the new constant label.

Other factors

Tests exercise both the error path and the success path via the actual global provider and Prometheus gatherer. The bug-hunting system found no issues. Codecov shows 85% patch coverage with the uncovered lines being the resource.Merge error branch and the app.go error-log branch — both defensive paths.

@amir-deris amir-deris enabled auto-merge July 1, 2026 19:43
@amir-deris amir-deris added this pull request to the merge queue Jul 1, 2026
Merged via the queue into main with commit 1c66d87 Jul 1, 2026
69 of 70 checks passed
@amir-deris amir-deris deleted the amir/plt-783-fix-otel-chain-id branch July 1, 2026 20:15
yzang2019 added a commit that referenced this pull request Jul 2, 2026
* main:
  Add chain_id label to OTel metrics (#3692)
  Close temporary rootmulti store in channel types setup (#3690)
  Update changelog in prep to cut 6.6 RC1 (#3685)
  Pin priority-fee assertions to stable heads (#3680)
  Retry npm install for disable wasm integration test (#3674)
  State Store: Compact pruned key range after each prune (#3675)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants