Skip to content

feat(p2p): experimental ethp2p erasure-coded broadcast (off by default)#466

Draft
pablodeymo wants to merge 1 commit into
mainfrom
ethp2p-broadcast-adapter
Draft

feat(p2p): experimental ethp2p erasure-coded broadcast (off by default)#466
pablodeymo wants to merge 1 commit into
mainfrom
ethp2p-broadcast-adapter

Conversation

@pablodeymo

Copy link
Copy Markdown
Collaborator

🗒️ Description / Motivation

Adds an experimental, feature-gated path that also broadcasts gossip through ethp2p-rs's Reed–Solomon erasure-coded broadcast engine over a parallel QUIC network, alongside libp2p gossipsub. It's a step toward evaluating erasure-coded broadcast for block/data propagation in ethlambda.

  • Gated behind the ethp2p cargo feature, off by default — the normal build/CI is unaffected.
  • ethlambda ↔ ethlambda only; coexists with gossipsub (gossipsub is untouched), so mixed-client devnets keep working.
  • Not spec-conformant and makes no interop/bit-compat claim — the wire/stream model is a provisional Rust-native choice.

Follows the phased plan in ethp2p-rs/docs/ethlambda-integration-plan.md (Phases 1+2).

What Changed

  • crates/net/p2p/src/ethp2p/ (new): Ethp2pBroadcast adapter over ethp2p-rs's QuicNet transport + broadcast Engine; derive_peer_id (sha256(secp256k1 pubkey)[..8]); the engine task (run_engine_task); dispatch_delivered; channel constants + message_id (hex(sha256(ssz))).
  • crates/net/p2p/src/lib.rs: WrappedEthp2pDelivery actor message + Handler; P2PServer.ethp2p_publish; build_swarm derives the ethp2p peer mesh from the static bootnodes (ethp2p QUIC port = gossipsub port + 1); P2P::spawn spawns the engine task.
  • crates/net/p2p/src/gossipsub/handler.rs: publish_block/publish_attestation/publish_aggregated_attestation each tee the same snappy-compressed SSZ to the engine.
  • crates/net/p2p/Cargo.toml / Cargo.lock: the ethp2p feature + optional git deps (ethp2p-broadcast, ethp2p-transport, prost), pinned to ethp2p-rs ba3ed8b.

Correctness / Behavior Guarantees

  • Feature off (default): no behavior change. Every ethp2p reference is #[cfg(feature = "ethp2p")]; gossipsub is byte-for-byte unchanged.
  • Dual delivery is safe. Reconstructed messages flow into the same blockchain.new_block/new_attestation/new_aggregated_attestation handlers gossipsub uses, and block import (on_block_core: if store.has_state(&block_root) { return Ok(()) }) and attestation handling are idempotent.
  • The engine task shuts down cleanly if its command/delivery channels close (no spin).
  • An adversarial self-review (4 dimensions) ran over the wiring; the two real findings (select-loop channel-closure handling; stale module doc) are fixed.

Tests Added / Run

  • New isolated test (ethp2p::tests): peer-id derivation stability + a 64 KiB block-sized round trip over real QUIC. cargo test -p ethlambda-p2p --features ethp2p — green.
  • make fmt, make lint, and build all pass feature-on and feature-off.
  • Not yet runtime-validated on a devnet — end-to-end propagation (a block A→B over ethp2p; mixed-client devnet still finalizes) is the remaining step.

Related Issues / PRs

  • Related to the ethp2p-rs integration effort (ethp2p-rs/docs/ethlambda-integration-plan.md).

✅ Verification Checklist

  • Ran make fmt — clean
  • Ran make lint (clippy with -D warnings) — clean (feature on and off)
  • cargo build/test -p ethlambda-p2p — passing (feature on and off)

Notes for reviewers

ethp2p-rs is an internal repo, so building with the ethp2p feature needs git credentials for it (CI already sets CARGO_NET_GIT_FETCH_WITH_CLI=true) plus protoc on the build host. The default feature-off build needs neither. Reviewer focus: the feature-gating hygiene and the publish-tee / receive-inject symmetry.

Add an experimental, feature-gated ("ethp2p", off by default) path that
also broadcasts gossip through ethp2p-rs's Reed-Solomon broadcast engine
over a parallel QUIC network, alongside libp2p gossipsub (unchanged).
ethlambda <-> ethlambda only; coexists with gossipsub so mixed-client
devnets keep working.

- crates/net/p2p/src/ethp2p: the Ethp2pBroadcast adapter over ethp2p-rs's
  QuicNet transport + broadcast Engine, peer-id derivation
  (sha256(secp256k1 pubkey)[..8]), the engine task, and delivery dispatch.
- publish_block/attestation/aggregated tee the same snappy-compressed SSZ
  to the engine, keyed by sha256(ssz); reconstructed messages are decoded
  and fed into the same blockchain.new_* handlers gossipsub uses (block
  import and attestation handling are idempotent, so dual delivery is safe).
- peers are derived from the static bootnodes (ethp2p QUIC port = gossipsub
  port + 1); the engine runs in a background task that forwards
  reconstructed messages into the P2P actor.

Validated by compilation (feature on and off), clippy -D warnings, fmt, and
an isolated QUIC round-trip test. End-to-end (devnet) validation is a
follow-up. ethp2p-rs (internal repo) is pinned to rev ba3ed8b; building the
feature needs git access to it plus protoc on the build host.
@github-actions

Copy link
Copy Markdown

🤖 Kimi Code Review

Review: Experimental ethp2p broadcast feature (PR #466)

This PR introduces an experimental, feature-gated (ethp2p) erasure-coded broadcast layer using Reed-Solomon over QUIC, running parallel to libp2p gossipsub. The implementation is clean and well-documented for devnet/experimental use. Below are specific concerns and recommendations.


1. Critical: Port arithmetic overflow (Potential runtime misconfiguration)

File: crates/net/p2p/src/lib.rs
Lines: 299-304, 329-334

Using wrapping_add(1) to derive the ethp2p QUIC port from the gossipsub port can silently wrap to port 0 if the gossip port is 65535, or to port 1 if the gossip port is 0 (ephemeral). This breaks the static mesh assumption that peers listen on bootnode.quic_port + 1.

Recommendation: Use checked arithmetic or explicit bounds checking:

let ethp2p_port = bootnode.quic_port.checked_add(1)
    .expect("ethp2p port overflow: gossip port must be < 65535");

2. Consensus safety: Idempotency assumption

File: crates/net/p2p/src/ethp2p/mod.rs
Lines: 385-389

The comment states "Dual delivery is safe: block import and attestation handling are idempotent." This is a critical invariant. If the same attestation arrives via both gossipsub and ethp2p and the import logic is not strictly idempotent (e.g., double-counting in aggregation bits or reward accounting), this could cause consensus failures.

Recommendation: Add a debug assertion or metric to detect duplicate deliveries in the blockchain adapter, validating the idempotency claim during testing.


3. Performance: Unnecessary payload cloning

File: crates/net/p2p/src/gossipsub/handler.rs
Lines: 157-160, 197-200, 239-242

The code clones the compressed payload to create ethp2p_payload before sending to the unbounded channel. This doubles memory pressure temporarily.

Recommendation: Since publish_bytes takes &[u8] and the UnboundedSender owns the data via Vec<u8>, move the original compressed Vec into the PublishCmd instead of cloning, or use Arc<[u8]> if both paths must retain ownership.


4. Error handling: Silent dropping of delivered messages

File: crates/net/p2p/src/ethp2p/mod.rs
Lines: 347-386

In dispatch_delivered, decode failures and blockchain forward failures are logged but dropped. While consistent with gossipsub behavior, ethp2p's Reed-Solomon layer guarantees data integrity, so decode failures indicate corruption or spec version mismatch that might warrant different handling than network noise.

Recommendation: Consider incrementing a metric (e.g., ethp2p_decode_failures_total) to distinguish ethp2p delivery failures from gossipsub noise in monitoring.


5. Code quality: Peer ID derivation

File: crates/net/p2p/src/ethp2p/mod.rs
Line: 66

bytes.copy_from_slice(&digest[..8]);

This panics if digest is shorter than 8 bytes (impossible for SHA-256, but fragile).

Recommendation: Use explicit conversion for clarity:

u64::from_be_bytes(digest[..8].try_into().expect("sha256 output length"))

6. Code quality: Hex encoding inefficiency

File: crates/net/p2p/src/ethp2p/mod.rs
Lines: 293-296

digest.iter().map(|b| format!("{b:02x}")).collect()

This allocates a String per byte and is O(n²) due to repeated concatenation.

Recommendation: Use hex::encode(digest) (already a dependency via ethrex-common or similar) or const-hex.


7. Build/Dependency concerns

File: crates/net/p2p/Cargo.toml
Lines: 48-58

The feature depends on an internal git repo (ethp2p-rs) requiring credentials and protoc at build time. The lockfile pins to rev = "ba3ed8b".

Recommendation: Document the protoc requirement in the crate-level README or building instructions. Ensure CI has CARGO_NET_GIT_FETCH_WITH_CLI=true set as noted.


8. Minor: Static channel strings

File: crates/net/p2p/src/ethp2p/mod.rs
Lines: 24-26, 30-35

Channel IDs ("block", "aggregation", "attestation") are hardcoded strings. If these drift from the gossipsub topic names, messages will be delivered to the wrong handlers.

Recommendation: Define these as constants shared with the gossipsub module or derive them from the topic generators to ensure they remain synchronized.


Summary

The implementation is sound for an experimental feature. Item 1 (port overflow) should be fixed before any production-adjacent deployment. Item 2 (idempotency) should be verified with a test or assertion. The remaining items are optimizations or hardening suggestions.

The feature gating is correct, memory safety is maintained, and the integration with the actor model is idiomatic.


Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

@github-actions

Copy link
Copy Markdown

🤖 Codex Code Review

Findings

  1. A single dead bootnode disables the entire ethp2p transport. In ethp2p/mod.rs:138 / ethp2p/mod.rs:140, startup awaits net.connect(...)? for every configured peer and returns on the first failure. lib.rs:438 then drops the task entirely if start() fails, so one offline peer silently turns off the whole experimental path. For a static bootnode set, partial outage is normal; this should degrade per-peer, not globally.

  2. The new publish path is unbounded and can grow memory without limit under backpressure. lib.rs:412 creates an unbounded_channel, and every publish clones the compressed payload before enqueueing it in gossipsub/handler.rs:157, gossipsub/handler.rs:199, and gossipsub/handler.rs:239. If the QUIC engine stalls or falls behind, blocks and attestations accumulate indefinitely in heap memory. For consensus code, that is a straightforward DoS vector; use a bounded channel and an explicit drop/backpressure policy.

  3. The ethp2p path collapses all attestation subnets into one global channel, removing the existing subnet fanout guard. Gossipsub intentionally uses per-subnet topics in gossipsub/messages.rs:12 / gossipsub/messages.rs:35 and only subscribes selected subnets in lib.rs:338. The new adapter subscribes every node to a single "attestation" channel in ethp2p/mod.rs:48 / ethp2p/mod.rs:54 and publishes every attestation there in gossipsub/handler.rs:164. That means every node now verifies and forwards every attestation, including subnets it deliberately did not subscribe to. With XMSS verification on receipt, this is a significant CPU-amplification regression.

  4. Port derivation uses wrapping_add(1), which can silently turn port 65535 into 0. See lib.rs:301 and lib.rs:381. On the listener side that binds an ephemeral port; on the dial side it targets UDP/0. This is an avoidable correctness footgun; a checked add with a clear config error would be safer.

I did not run a full build in this environment: cargo check -p ethlambda-p2p --all-targets failed before compilation because rustup could not create temp files under /home/runner/.rustup due the read-only filesystem.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@pablodeymo pablodeymo marked this pull request as draft June 24, 2026 20:36
@greptile-apps

greptile-apps Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds an experimental, feature-gated (ethp2p) path that tees gossipsub broadcasts through ethp2p-rs's Reed-Solomon erasure-coded broadcast engine over a parallel QUIC network; the default build is completely unaffected.

  • New ethp2p module: Ethp2pBroadcast adapter, derive_peer_id (SHA-256 of secp256k1 pubkey → u64), run_engine_task (long-lived select loop driving the engine), dispatch_delivered (decompresses and routes reconstructed payloads to the same blockchain handlers gossipsub uses), plus an isolated round-trip test over real QUIC.
  • gossipsub/handler.rs: each of the three publish handlers tees the already-compressed SSZ bytes to the engine task via an unbounded channel, wrapped in #[cfg(feature = "ethp2p")] blocks that are completely inert in the default build.
  • lib.rs: build_swarm derives ethp2p peer IDs and addresses from the static bootnode list (ethp2p QUIC port = gossipsub port + 1 via wrapping_add); P2P::spawn creates the publish channel and spawns run_engine_task; Handler<WrappedEthp2pDelivery> forwards reconstructed messages into the blockchain pipeline.

Confidence Score: 4/5

Safe to merge for the default (feature-off) build; the ethp2p feature path has a few rough edges worth addressing before enabling in a real devnet.

The gossipsub path is byte-for-byte unchanged and the feature gate is consistent throughout. The three issues found — silent port wraparound via wrapping_add(1), double RS-encoding per publish, and the unguarded spin risk on persistent run_one_step errors — all live exclusively inside the ethp2p feature path that is off by default.

crates/net/p2p/src/ethp2p/mod.rs (double RS encode, select-loop spin risk) and crates/net/p2p/src/lib.rs (port wraparound in both the local bind address and bootnode peer addresses).

Important Files Changed

Filename Overview
crates/net/p2p/src/ethp2p/mod.rs New experimental module wiring ethp2p-rs Reed-Solomon broadcast; contains double RS-encode in publish_bytes and a select! loop with cancel-safety and spin-on-error concerns.
crates/net/p2p/src/lib.rs Wires the ethp2p engine into the P2P actor; uses wrapping_add(1) for QUIC port derivation, which silently wraps to 0 at port 65535.
crates/net/p2p/src/gossipsub/handler.rs Adds tee of snappy-compressed SSZ to ethp2p engine via cfg-gated blocks in publish_block/attestation/aggregated_attestation; gossipsub path is byte-for-byte unchanged.
crates/net/p2p/Cargo.toml Adds optional ethp2p feature with git-pinned ethp2p-broadcast/transport deps; prost version-matched to ethp2p-broadcast to avoid duplication; no changes to default build.
Cargo.lock Lockfile updated with new ethp2p-rs crates and transitive deps (quinn, rcgen, rustls, reed-solomon-erasure, prost 0.13.5); existing deps disambiguated with version suffixes.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant App as Application
    participant GH as gossipsub/handler.rs
    participant P2PS as P2PServer (actor)
    participant ETX as ethp2p_tx (UnboundedSender)
    participant ET as run_engine_task
    participant ENG as Ethp2pBroadcast (Engine)
    participant QUIC as QUIC Network
    participant ACT as P2PServer actor (delivery)
    participant BC as Blockchain

    App->>P2PS: PublishBlock / PublishAttestation / PublishAggregatedAttestation
    P2PS->>GH: publish_block / publish_attestation / publish_aggregated_attestation
    GH->>GH: SSZ encode + snappy compress
    GH-->>QUIC: swarm_handle.publish() (gossipsub, unchanged)
    GH->>ETX: "send(PublishCmd{channel, message_id, payload})"

    ETX-->>ET: publish_rx.recv()
    ET->>ENG: publish_bytes(channel, msg_id, payload)
    ENG->>ENG: rs_encode (preamble) + RsStrategy::new_origin (encode again)
    ENG->>QUIC: broadcast RS shards via QUIC

    QUIC-->>ENG: inbound RS shards from peers
    ET->>ENG: run_one_step() [drives engine]
    ENG-->>ET: DeliveredMessage (on delivery_rx)
    ET->>ACT: actor.send(WrappedEthp2pDelivery)
    ACT->>ACT: dispatch_delivered: decompress + SSZ decode
    ACT->>BC: blockchain.new_block / new_attestation / new_aggregated_attestation (idempotent)
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant App as Application
    participant GH as gossipsub/handler.rs
    participant P2PS as P2PServer (actor)
    participant ETX as ethp2p_tx (UnboundedSender)
    participant ET as run_engine_task
    participant ENG as Ethp2pBroadcast (Engine)
    participant QUIC as QUIC Network
    participant ACT as P2PServer actor (delivery)
    participant BC as Blockchain

    App->>P2PS: PublishBlock / PublishAttestation / PublishAggregatedAttestation
    P2PS->>GH: publish_block / publish_attestation / publish_aggregated_attestation
    GH->>GH: SSZ encode + snappy compress
    GH-->>QUIC: swarm_handle.publish() (gossipsub, unchanged)
    GH->>ETX: "send(PublishCmd{channel, message_id, payload})"

    ETX-->>ET: publish_rx.recv()
    ET->>ENG: publish_bytes(channel, msg_id, payload)
    ENG->>ENG: rs_encode (preamble) + RsStrategy::new_origin (encode again)
    ENG->>QUIC: broadcast RS shards via QUIC

    QUIC-->>ENG: inbound RS shards from peers
    ET->>ENG: run_one_step() [drives engine]
    ENG-->>ET: DeliveredMessage (on delivery_rx)
    ET->>ACT: actor.send(WrappedEthp2pDelivery)
    ACT->>ACT: dispatch_delivered: decompress + SSZ decode
    ACT->>BC: blockchain.new_block / new_attestation / new_aggregated_attestation (idempotent)
Loading

Comments Outside Diff (2)

  1. crates/net/p2p/src/lib.rs, line 1287-1290 (link)

    P2 Silent port-zero wraparound for QUIC bind and dial addresses

    Both the local bind address and each bootnode's dial address use wrapping_add(1) on a u16 port. If a gossipsub port is configured as 65535, the ethp2p port becomes 0. For the local bind (SocketAddr::new(ip, 0)), the OS interprets port 0 as "assign an ephemeral port", so the node silently listens on an unknown, unannounced port. For bootnode dial addresses, port 0 as a destination will fail to connect. Both failures are silent — no error is returned from build_swarm. A checked_add(1) with an explicit panic or propagated error would make misconfiguration immediately visible rather than degrading quietly.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: crates/net/p2p/src/lib.rs
    Line: 1287-1290
    
    Comment:
    **Silent port-zero wraparound for QUIC bind and dial addresses**
    
    Both the local bind address and each bootnode's dial address use `wrapping_add(1)` on a `u16` port. If a gossipsub port is configured as `65535`, the ethp2p port becomes `0`. For the local bind (`SocketAddr::new(ip, 0)`), the OS interprets port 0 as "assign an ephemeral port", so the node silently listens on an unknown, unannounced port. For bootnode dial addresses, port 0 as a destination will fail to connect. Both failures are silent — no error is returned from `build_swarm`. A `checked_add(1)` with an explicit panic or propagated error would make misconfiguration immediately visible rather than degrading quietly.
    
    How can I resolve this? If you propose a fix, please make it concise.
  2. crates/net/p2p/src/ethp2p/mod.rs, line 989-1031 (link)

    P2 Two concerns in the select! loop: cancel safety and persistent error spin

    First, tokio::select! cancels the non-selected futures on each iteration. If run_one_step is not cancel-safe — for example, if it reads a QUIC chunk and then yields before integrating it into the engine state — cancellation when a PublishCmd or a delivered message arrives could silently drop in-flight chunks and corrupt RS reassembly. tokio::select! documentation requires callers to verify cancel-safety; this should be confirmed against the ethp2p-rs engine's guarantee.

    Second, if run_one_step returns errors synchronously (e.g., after the QUIC transport is permanently broken), the loop will spin at full CPU speed while printing warn! on every iteration, because there is no break or backoff path for repeated errors — only the publish_rx and delivered_rx None branches exit the loop. Adding a consecutive-error counter with a break or an exponential back-off would avoid CPU saturation on transport failure.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: crates/net/p2p/src/ethp2p/mod.rs
    Line: 989-1031
    
    Comment:
    **Two concerns in the `select!` loop: cancel safety and persistent error spin**
    
    First, `tokio::select!` cancels the non-selected futures on each iteration. If `run_one_step` is not cancel-safe — for example, if it reads a QUIC chunk and then yields before integrating it into the engine state — cancellation when a `PublishCmd` or a delivered message arrives could silently drop in-flight chunks and corrupt RS reassembly. `tokio::select!` documentation requires callers to verify cancel-safety; this should be confirmed against the `ethp2p-rs` engine's guarantee.
    
    Second, if `run_one_step` returns errors synchronously (e.g., after the QUIC transport is permanently broken), the loop will spin at full CPU speed while printing `warn!` on every iteration, because there is no `break` or backoff path for repeated errors — only the `publish_rx` and `delivered_rx` `None` branches exit the loop. Adding a consecutive-error counter with a break or an exponential back-off would avoid CPU saturation on transport failure.
    
    How can I resolve this? If you propose a fix, please make it concise.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
crates/net/p2p/src/lib.rs:1287-1290
**Silent port-zero wraparound for QUIC bind and dial addresses**

Both the local bind address and each bootnode's dial address use `wrapping_add(1)` on a `u16` port. If a gossipsub port is configured as `65535`, the ethp2p port becomes `0`. For the local bind (`SocketAddr::new(ip, 0)`), the OS interprets port 0 as "assign an ephemeral port", so the node silently listens on an unknown, unannounced port. For bootnode dial addresses, port 0 as a destination will fail to connect. Both failures are silent — no error is returned from `build_swarm`. A `checked_add(1)` with an explicit panic or propagated error would make misconfiguration immediately visible rather than degrading quietly.

### Issue 2 of 3
crates/net/p2p/src/ethp2p/mod.rs:188-199
**Payload is Reed-Solomon encoded twice per `publish_bytes` call**

`rs_encode(payload, &self.config)` encodes the full payload into preamble + shards, but only `preamble` is used — `_shards` is dropped. Then `RsStrategy::new_origin(payload, self.config)` is called on the same `payload`, which internally performs the same RS encoding again to produce the shards it will actually broadcast. For a 64 KiB block this doubles the encoding CPU cost per publish. Consider whether `ethp2p-rs` exposes a constructor that accepts an already-encoded `(preamble, shards)` pair so the work can be shared, or whether `new_origin` can return the preamble directly.

### Issue 3 of 3
crates/net/p2p/src/ethp2p/mod.rs:989-1031
**Two concerns in the `select!` loop: cancel safety and persistent error spin**

First, `tokio::select!` cancels the non-selected futures on each iteration. If `run_one_step` is not cancel-safe — for example, if it reads a QUIC chunk and then yields before integrating it into the engine state — cancellation when a `PublishCmd` or a delivered message arrives could silently drop in-flight chunks and corrupt RS reassembly. `tokio::select!` documentation requires callers to verify cancel-safety; this should be confirmed against the `ethp2p-rs` engine's guarantee.

Second, if `run_one_step` returns errors synchronously (e.g., after the QUIC transport is permanently broken), the loop will spin at full CPU speed while printing `warn!` on every iteration, because there is no `break` or backoff path for repeated errors — only the `publish_rx` and `delivered_rx` `None` branches exit the loop. Adding a consecutive-error counter with a break or an exponential back-off would avoid CPU saturation on transport failure.

Reviews (1): Last reviewed commit: "feat(p2p): experimental ethp2p erasure-c..." | Re-trigger Greptile

Comment on lines +188 to +199
let (preamble, _shards) =
rs_encode(payload, &self.config).map_err(|e| Ethp2pError::Encode(format!("{e:?}")))?;
let mut preamble_bytes = Vec::with_capacity(preamble.encoded_len());
preamble
.encode(&mut preamble_bytes)
.map_err(|e| Ethp2pError::Encode(e.to_string()))?;
let strategy = RsStrategy::new_origin(payload, self.config)
.map_err(|e| Ethp2pError::Encode(format!("{e:?}")))?;
self.engine
.publish(
&channel.to_string(),
message_id.to_string(),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Payload is Reed-Solomon encoded twice per publish_bytes call

rs_encode(payload, &self.config) encodes the full payload into preamble + shards, but only preamble is used — _shards is dropped. Then RsStrategy::new_origin(payload, self.config) is called on the same payload, which internally performs the same RS encoding again to produce the shards it will actually broadcast. For a 64 KiB block this doubles the encoding CPU cost per publish. Consider whether ethp2p-rs exposes a constructor that accepts an already-encoded (preamble, shards) pair so the work can be shared, or whether new_origin can return the preamble directly.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/net/p2p/src/ethp2p/mod.rs
Line: 188-199

Comment:
**Payload is Reed-Solomon encoded twice per `publish_bytes` call**

`rs_encode(payload, &self.config)` encodes the full payload into preamble + shards, but only `preamble` is used — `_shards` is dropped. Then `RsStrategy::new_origin(payload, self.config)` is called on the same `payload`, which internally performs the same RS encoding again to produce the shards it will actually broadcast. For a 64 KiB block this doubles the encoding CPU cost per publish. Consider whether `ethp2p-rs` exposes a constructor that accepts an already-encoded `(preamble, shards)` pair so the work can be shared, or whether `new_origin` can return the preamble directly.

How can I resolve this? If you propose a fix, please make it concise.

@github-actions

Copy link
Copy Markdown

🤖 Claude Code Review

Now I have enough data to synthesize. Let me produce the final review.


Review: feat(p2p): experimental ethp2p erasure-coded broadcast (off by default)

This PR adds a feature-gated (ethp2p cargo feature, off by default) adapter that tees each gossipsub publish to a parallel Reed–Solomon broadcast engine running over its own QUIC network. The feature-gating hygiene is solid — every #[cfg(feature = "ethp2p")] guard is present and the default build is untouched. The select-loop shutdown on channel close is correct. The dispatch symmetry (gossipsub path → same blockchain handlers) and idempotent block/attestation import correctly protect against dual delivery.

Four findings survive verification:


crates/net/p2p/src/ethp2p/mod.rs:882–900publish_bytes encodes the payload twice, discarding the first result

rs_encode(payload, &self.config) is called and _shards is immediately discarded; then RsStrategy::new_origin(payload, self.config) is called on the same payload, which must re-encode internally (it builds an origin session from the raw bytes). The discarded _shards variable name confirms this is unintentional dead work. For a 64 KB block this is 2× the GF(2⁸) RS computation on every publish, paid in the actor task context. Depending on whether engine.publish expects the preamble to match what new_origin will produce independently, there may also be a latent consistency hazard if the two encode calls disagree on shard count.


crates/net/p2p/src/ethp2p/mod.rs:989–1031run_one_step errors spin the select loop at 100% CPU indefinitely

When broadcast.run_one_step() returns Err, the loop logs a warning and immediately re-enters tokio::select!. If the engine enters a persistent error state where run_one_step completes without suspending (e.g. internal QUIC socket closed, internal channel broken), the run_one_step arm is always immediately ready and wins the select on every iteration. publish_rx and delivered_rx are starved; the task burns 100% of a tokio worker thread logging warnings with no backoff, no break, and no exit until the P2P actor drops publish_rx. Add a consecutive-error counter with a cap and break (or an exponential backoff sleep) to bound the damage.


crates/net/p2p/src/lib.rs:1286–1306wrapping_add(1) silently produces port 0 when the gossipsub port is 65535

Both the local bind address and each peer's dial address are derived as gossipsub_port.wrapping_add(1). On a u16, 65535u16.wrapping_add(1) == 0. Port 0 is valid for OS-assigned ephemeral binding but is an illegal destination for an explicit QUIC connect. The engine would attempt to dial peer_ip:0 and either fail silently or connect to the wrong endpoint. Use checked arithmetic (port.checked_add(1).expect("ethp2p port overflow")) or validate at config parse time.


crates/net/p2p/src/gossipsub/handler.rs:1165–1171, 1208–1214, 1248–1254 — silent channel-closed failure violates CLAUDE.md observability pattern

All three publish tee paths do:

let _ = tx.send(crate::ethp2p::PublishCmd { ... });

UnboundedSender::send returns Err only when the engine task has exited. Discarding that error means: if the engine crashes (QUIC bind failure, internal panic, etc.), every subsequent publish silently fails with zero log output on the sender side. The node continues healthy on gossipsub while ethp2p is dead and invisible to operators.

CLAUDE.md (project root) is explicit: "AVOID: Using if let Err when only performing side effects" and "Use inspect_err for side-effect-only error handling." The inbound dispatch_delivered path in the same file correctly uses inspect_err for all three message types. Fix:

tx.send(crate::ethp2p::PublishCmd { ... })
    .inspect_err(|_| warn!("ethp2p: publish channel closed, engine stopped"));

Minor — crates/net/p2p/src/ethp2p/mod.rs:982–987 — verbose tracing field form (CLAUDE.md)

info!(
    local_peer = params.local_peer,
    peers = params.peers.len(),
    ...

CLAUDE.md quotes slot = %slot as the ❌ BAD verbose form. Extract locals before the macro call:

let local_peer = params.local_peer;
let peer_count = params.peers.len();
info!(local_peer, peer_count, bind = %params.bind_addr, "ethp2p broadcast engine started");

Automated review by Claude (Anthropic) · sonnet · custom prompt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant