Benchmark comparison gates by dahlia · Pull Request #804 · fedify-dev/fedify

dahlia · 2026-06-13T05:33:28Z

This adds fedify bench compare, a CI-oriented command for checking benchmark regressions between two Git refs. The command checks out the base/head refs in temporary worktrees, starts each target with the same command, waits for a ready URL, runs the same benchmark suite, and reports whether the head result stays within the configured regression allowance.

The comparison uses the explicit --max-regression value plus a measured noise band from repeated runs, so the gate can be stricter than a broad absolute threshold without failing on small local variance. When a scenario does not name performance metrics in expect, the comparison falls back to latency.p95 and throughputPerSec.

This also makes repeated runs real in fedify bench. Scenarios now run three times by default, aggregate latency and throughput with median values, keep correctness gates pessimistic, and preserve per-run data in the JSON report. Dry-run output now shows the run count, total scheduled duration, and open-loop request volume so users can review benchmark impact before sending load.

The report format is versioned as benchmark report schema v3 and comparison report schema v1. The published schemas live in schema/bench/report-v3.json and schema/bench/compare-report-v1.json, with user documentation in docs/manual/benchmarking.md and a release note in CHANGES.md.

Refs #744. Fixes #786.

Implement repeated benchmark runs and the fedify bench compare command so CI can compare a head ref against a base ref on the same runner. The comparison path checks out temporary worktrees, starts each target sequentially, and fails when performance regressions exceed the configured tolerance plus the measured noise band. Publish benchmark report schema version 3 and comparison report schema version 1, including fixtures and schema drift guards for both formats. fedify-dev#744 fedify-dev#786 Assisted-by: Codex:gpt-5.5

Explain repeated benchmark runs, the fedify bench compare workflow, and the CI versus controlled-runner profile split. Record the CLI changelog entry for the comparison command and the report schema updates. fedify-dev#744 fedify-dev#786 Assisted-by: Codex:gpt-5.5

Keep benchmark target output away from stdout, enforce readiness timeouts, terminate Windows target process trees, and avoid infinite noise allowances in comparison reports. fedify-dev#744 fedify-dev#786 Assisted-by: Codex:gpt-5.5

Treat missing server-side measurements in any repeated benchmark run as missing from the aggregate so signature and queue expectations cannot pass on a partial subset of runs. fedify-dev#744 fedify-dev#786 Assisted-by: Codex:gpt-5.5

Show repeated run counts, total duration, and open-loop request volume in benchmark dry-run plans. Compare benchmark reports by scenario position plus name and type so duplicate scenario names cannot be matched to the wrong base result. fedify-dev#744 fedify-dev#786 Assisted-by: Codex:gpt-5.5

coderabbitai · 2026-06-13T05:33:42Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a bench "compare" mode, enables repeated scenario runs (default 3), aggregates per-run metrics into report-v3, computes noise-adjusted regressions, implements worktree/target orchestration and rendering, and updates schemas, fixtures, tests, docs, and changelog.

Changes

Benchmark Compare and Repeated Runs

Layer / File(s)	Summary
Documentation and changelog `CHANGES.md`, `docs/manual/benchmarking.md`	Documents `fedify bench compare`, repeated-runs default/aggregation, report-v3 and compare-report schema references, CI instructions, and adds changelog references.
CLI Command Structure: Run and Compare Modes `packages/cli/src/bench/command.ts`, `packages/cli/src/bench/command.test.ts`	Refactors benchmark command parser to `or(compareParser, runParser)` with `mode: "run"`/`"compare"`, adds typed variants and parsing tests for compare flags.
Repeated Runs Foundation: Scenario Normalization `packages/cli/src/bench/scenario/normalize.ts`, `packages/cli/src/bench/scenario/normalize.test.ts`	Defaults `runs` to `3`, removes rejection of `runs > 1`, and updates tests for defaults and overrides.
Repeated Runs Execution: Action Loop and Dry-run Plan `packages/cli/src/bench/action.ts`, `packages/cli/src/bench/action.test.ts`	Changes `runBench` to accept `BenchRunCommand`, executes each scenario `scenario.runs` times collecting per-run measurements, and expands dry-run plan output to include runs and total duration; adds related tests.
Result Aggregation and Benchmark Report v3 `packages/cli/src/bench/result/build.ts`, `packages/cli/src/bench/result/model.ts`, `packages/cli/src/bench/result/schema.ts`, `packages/cli/src/bench/result/build.test.ts`, `packages/cli/src/bench/render/render.test.ts`	Aggregates repeated-run measurements (medians for throughput/latency, pessimistic successRate), adds `runCount` and optional `runs[]` to scenario results, bumps report `schemaVersion` to 3, updates schemas and validators, and adds aggregation tests including missing server stats scenarios.
Benchmark Compare: Orchestration and Process Management `packages/cli/src/bench/compare.ts`	Implements `runBenchCompare`: create/remove worktrees, start/stop target processes, waitReadyUrl polling, run benchmarks in worktrees, compute noise-aware comparisons, render/write outputs, and set exit codes.
Compare Helpers: start/stop, waitReadyUrl, worktree/run helpers `packages/cli/src/bench/compare.ts`	Exports `startBenchmarkTarget`, `stopTargetProcess` (platform-aware), `waitReadyUrl` (fetch with abort/backoff), `defaultRunBenchInWorktree`, git worktree helpers, and `defaultWriteOutput`.
Compare Report Rendering `packages/cli/src/bench/compare.ts`	Renders compare reports in `json`, `markdown`, or `text` with formatted numeric/regression/noise fields and pass/fail markers.
Compare Report Schema `packages/cli/src/bench/compare/schema.ts`, `schema/bench/compare-report-v1.json`	Adds compare-report schema (compare-report-v1) validating envelope, sides (embedded report-v3), comparisons array, and numeric constraints for noise/allowedRegression.
Published JSON Schema Definitions & Registry `packages/cli/src/bench/result/schema.ts`, `packages/cli/src/bench/schemas.ts`, `schema/bench/report-v3.json`	Introduces report-v3 schema and registers `report-v3.json` and `compare-report-v1.json` in PUBLISHED_SCHEMAS; adjusts v2/v3 IDs and schema objects.
Tests: compare, command, action, result, render, schema `packages/cli/src/bench/compare.test.ts`, `packages/cli/src/bench/command.test.ts`, `packages/cli/src/bench/action.test.ts`, `packages/cli/src/bench/result/build.test.ts`, `packages/cli/src/bench/render/render.test.ts`, `packages/cli/src/bench/schema.test.ts`	Adds comprehensive tests for compare mode, tolerance parsing, noise-band logic, process lifecycle (including Windows), waitReadyUrl behavior, command parsing, action dry-run and runs behavior, result aggregation, and schema guards.
Fixtures and Examples `packages/cli/src/bench/__fixtures__/reports/inbox-report.json`, `packages/cli/src/bench/__fixtures__/compare-reports/basic.json`	Updates inbox-report fixture to schema v3 with `runCount: 1` and adds a compare-report fixture demonstrating a single comparison entry.

Sequence Diagram(s)

sequenceDiagram
  participant CLI
  participant GitWorktree
  participant TargetProcess
  participant BenchRunner
  participant ReportWriter

  CLI->>GitWorktree: create worktree(base)
  CLI->>GitWorktree: create worktree(head)
  GitWorktree-->>CLI: worktrees ready
  CLI->>TargetProcess: startBenchmarkTarget (base worktree)
  TargetProcess-->>CLI: started (pid)
  CLI->>CLI: waitReadyUrl(base.readyUrl)
  CLI->>BenchRunner: defaultRunBenchInWorktree(base)
  BenchRunner-->>CLI: BenchReport(base)
  CLI->>TargetProcess: stopTargetProcess(base)
  CLI->>TargetProcess: startBenchmarkTarget (head worktree)
  CLI->>CLI: waitReadyUrl(head.readyUrl)
  CLI->>BenchRunner: defaultRunBenchInWorktree(head)
  BenchRunner-->>CLI: BenchReport(head)
  CLI->>ReportWriter: buildCompareReport(base, head) -> render/write (json/markdown/text)
  ReportWriter-->>CLI: written output
  CLI->>GitWorktree: remove worktrees

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

fedify-dev/fedify#795: Overlaps at run-mode dry-run/plan generation and safety gating around run execution.
fedify-dev/fedify#802: Related changes to benchmark result building and metric representations; both touch aggregation and schema boundaries.
fedify-dev/fedify#791: Earlier bench engine changes that this PR extends (run behavior and report aggregation).

Suggested reviewers

sij411
2chanhaeng

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 17.39% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Benchmark comparison gates' clearly and concisely summarizes the main feature addition of a CI-oriented comparison command for benchmark regression detection.
Description check	✅ Passed	The description provides comprehensive context about the new `fedify bench compare` command, repeated run aggregation, noise-aware gating, and all associated documentation/schema changes.
Linked Issues check	✅ Passed	The PR fully implements the acceptance criteria from issue `#786`: same-runner compare with noise-aware gating, median-of-N aggregation for latency/throughput, and documented CI-safe/perf-lab profiles.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to the benchmark comparison feature: new CLI command, schema versioning, documentation, aggregation logic, and report fixtures—no unrelated modifications present.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

dahlia · 2026-06-13T05:33:56Z

@codex review

gemini-code-assist

Code Review

This pull request introduces the fedify bench compare command to enable CI-friendly performance regression testing by comparing base and head git revisions. It also adds support for repeated benchmark runs, aggregates metrics (such as median latency and pessimistic success rates), and updates the benchmark report schema to version 3 alongside a new comparison report schema. The review feedback highlights opportunities to improve robustness, specifically by adding defensive checks in metricValue to prevent null pointer exceptions, implementing a secondary timeout in stopTargetProcess to avoid indefinite hangs, and prioritizing abort signal reasons over transport-level errors in waitReadyUrl.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

chatgpt-codex-connector · 2026-06-13T05:40:54Z

Codex Review: Didn't find any major issues. What shall we delve into next?

Reviewed commit: b18c937e5e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/cli/src/bench/command.test.ts`:
- Around line 108-120: Add a regression test that asserts a bare "bench compare"
subcommand without any flags or refs fails parsing to guard against parser
disambiguation; specifically, in the existing test file add a new test that
calls parse(benchCommand, [COMMAND, "compare"]) and asserts result.success is
false (using the same assert.ok(!result.success) style), ensuring you reference
the existing benchCommand, parse, and COMMAND symbols so this covers the bare
subcommand case.

In `@packages/cli/src/bench/compare/schema.ts`:
- Around line 83-106: The benchmarkReport schema currently marks "target" as
required but doesn't define its shape, allowing invalid values; update the
benchmarkReport.properties to include a "target" entry (e.g., "target": {
"$ref": "`#/`$defs/target" } or an appropriate explicit object/type schema) and
add a corresponding $defs.target definition (or refine the existing $defs) that
constrains allowed fields/types for the target object so null/number/other
invalid types are rejected; ensure the new target schema matches the project's
expected target fields and validation style used elsewhere (e.g., other $defs
like tool/environment) so the required "target" is properly typed and validated.

In `@packages/cli/src/bench/mod.ts`:
- Around line 7-11: Add a JSDoc comment above the exported runBench function
describing its purpose (dispatching to runBenchCompare or runBenchSuite based on
the command.mode), document the parameter (BenchCommand) including expected
properties used (e.g., mode values "compare" vs others), and specify the return
type Promise<void>; reference the exported symbol runBench so the comment is
placed immediately above that function and include any relevant tags like `@param`
and `@returns`.

In `@packages/cli/src/bench/result/schema.ts`:
- Around line 353-367: The schema for scenarioResult in reportSchemaV2 currently
adds "runCount" but leaves "runs" optional; update the scenarioResult required
list to include "runs" as well so that whenever runCount is present the runs
array is required — specifically modify the spread of
reportSchemaV2.$defs.scenarioResult.required inside the scenarioResult
definition to add "runs" (in addition to "runCount") so the properties runCount
and runs are both required in the scenarioResult schema.

In `@schema/bench/report-v3.json`:
- Around line 392-478: The schema allows runCount>1 without requiring per-run
data; update the "scenarioResult" schema so that when "runCount" is >= 2 the
"runs" property is required. Add an if/then at the same level as "properties" in
the "scenarioResult" definition: if {"properties":{"runCount":{"minimum":2}}}
then {"required":["runs"]}. Target the "scenarioResult" object and its
"runCount" and "runs" properties to enforce this invariant.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 74e2c8ec-7ff9-44f4-8596-5b8b851aff26

📥 Commits

Reviewing files that changed from the base of the PR and between d351e55 and b18c937.

📒 Files selected for processing (23)

CHANGES.md
docs/manual/benchmarking.md
packages/cli/src/bench/__fixtures__/compare-reports/basic.json
packages/cli/src/bench/__fixtures__/reports/inbox-report.json
packages/cli/src/bench/action.test.ts
packages/cli/src/bench/action.ts
packages/cli/src/bench/command.test.ts
packages/cli/src/bench/command.ts
packages/cli/src/bench/compare.test.ts
packages/cli/src/bench/compare.ts
packages/cli/src/bench/compare/schema.ts
packages/cli/src/bench/mod.ts
packages/cli/src/bench/render/render.test.ts
packages/cli/src/bench/result/build.test.ts
packages/cli/src/bench/result/build.ts
packages/cli/src/bench/result/model.ts
packages/cli/src/bench/result/schema.ts
packages/cli/src/bench/scenario/normalize.test.ts
packages/cli/src/bench/scenario/normalize.ts
packages/cli/src/bench/schema.test.ts
packages/cli/src/bench/schemas.ts
schema/bench/compare-report-v1.json
schema/bench/report-v3.json

codecov · 2026-06-13T05:44:13Z

Codecov Report

❌ Patch coverage is 78.22129% with 311 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
packages/cli/src/bench/compare.ts	67.47%	195 Missing and 61 partials ⚠️
packages/cli/src/bench/result/build.ts	84.97%	12 Missing and 14 partials ⚠️
packages/cli/src/bench/load/generator.ts	55.81%	15 Missing and 4 partials ⚠️
packages/cli/src/bench/action.ts	90.36%	4 Missing and 4 partials ⚠️
packages/cli/src/bench/load/clock.ts	90.00%	0 Missing and 2 partials ⚠️

Files with missing lines	Coverage Δ
packages/cli/src/bench/command.ts	`100.00% <100.00%> (ø)`
packages/cli/src/bench/compare/schema.ts	`100.00% <100.00%> (ø)`
packages/cli/src/bench/result/schema.ts	`100.00% <100.00%> (ø)`
packages/cli/src/bench/safety/gate.ts	`100.00% <100.00%> (ø)`
packages/cli/src/bench/scenario/normalize.ts	`100.00% <100.00%> (ø)`
packages/cli/src/bench/scenarios/failure.ts	`84.45% <100.00%> (+0.03%)`	⬆️
packages/cli/src/bench/scenarios/fanout.ts	`87.93% <100.00%> (+0.03%)`	⬆️
packages/cli/src/bench/scenarios/inbox.ts	`89.87% <100.00%> (+0.06%)`	⬆️
packages/cli/src/bench/scenarios/read.ts	`95.07% <100.00%> (+0.03%)`	⬆️
packages/cli/src/bench/scenarios/runner.ts	`95.12% <ø> (ø)`
... and 7 more

... and 17 files with indirect coverage changes

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Bound target shutdown after SIGKILL, keep readiness timeout errors tied back to the abort reason, and make malformed comparison metrics fail instead of throwing. Tighten benchmark report schemas so embedded targets are typed and repeated reports must include per-run data. Add parser and public-export coverage requested in review. fedify-dev#804 (comment) fedify-dev#804 (comment) fedify-dev#804 (comment) fedify-dev#804 (comment) fedify-dev#804 (comment) fedify-dev#804 (comment) fedify-dev#804 (comment) fedify-dev#804 (comment) Assisted-by: Codex:gpt-5.5

netlify · 2026-06-13T05:57:01Z

✅ Deploy Preview for fedify-json-schema ready!

Name	Link
🔨 Latest commit	`356539c`
🔍 Latest deploy log	https://app.netlify.com/projects/fedify-json-schema/deploys/6a2d3f5280431d0008503090
😎 Deploy Preview	https://deploy-preview-804--fedify-json-schema.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

dahlia · 2026-06-13T05:57:55Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a new fedify bench compare command for CI-friendly performance regression gates, allowing sequential benchmark runs of base and head git revisions inside temporary worktrees. It also adds support for repeated benchmark runs (defaulting to 3) and aggregates their results (median latency/throughput, pessimistic success rate, merged histograms) into a new benchmark report schema version 3. The review feedback focuses on improving the robustness of the comparison and process management logic in packages/cli/src/bench/compare.ts. Specifically, the reviewer suggests matching scenarios by name rather than relying on exact ordering, cleaning up partial git worktrees on checkout failures to prevent leaks, and resolving target process termination immediately if the process ID is null to avoid hanging.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

coderabbitai

♻️ Duplicate comments (1)

packages/cli/src/bench/mod.ts (1)

7-7: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Expand JSDoc to document parameters, return type, and dispatch behavior.

The JSDoc is too brief for a public API. It should document the command parameter (including the mode discriminator), the Promise<void> return, and clarify the dispatch behavior (compare vs. run modes).

📝 Proposed expanded JSDoc

-/** Runs a parsed benchmark command. */
+/**
+ * Runs a parsed benchmark command.
+ *
+ * Dispatches to {`@link` runBenchCompare} when `command.mode === "compare"`,
+ * otherwise dispatches to the suite runner for `mode === "run"`.
+ *
+ * `@param` command The parsed benchmark command with mode discriminator.
+ * `@returns` A promise that resolves when the benchmark completes.
+ */
 export function runBench(command: BenchCommand): Promise<void> {

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/cli/src/bench/mod.ts` at line 7, Replace the terse JSDoc comment
above the benchmark entry function with a full docblock that documents the
parameter `command` (describe its shape and the mode discriminator values, e.g.,
"run" vs "compare"), the return type (Promise<void>), and the dispatch behavior
(explain that when command.mode === "compare" it dispatches comparison logic and
when command.mode === "run" it executes a single benchmark run, including any
side effects such as logging or process exit). Mention any thrown errors or
rejection conditions and note that the function performs asynchronous work. Keep
the docblock adjacent to the same function so the symbols (the `command`
parameter and the function itself) are clearly described.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@packages/cli/src/bench/mod.ts`:
- Line 7: Replace the terse JSDoc comment above the benchmark entry function
with a full docblock that documents the parameter `command` (describe its shape
and the mode discriminator values, e.g., "run" vs "compare"), the return type
(Promise<void>), and the dispatch behavior (explain that when command.mode ===
"compare" it dispatches comparison logic and when command.mode === "run" it
executes a single benchmark run, including any side effects such as logging or
process exit). Mention any thrown errors or rejection conditions and note that
the function performs asynchronous work. Keep the docblock adjacent to the same
function so the symbols (the `command` parameter and the function itself) are
clearly described.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c2b9d19a-c0a4-4840-9225-3826a851781c

📥 Commits

Reviewing files that changed from the base of the PR and between b18c937 and 662c265.

📒 Files selected for processing (9)

packages/cli/src/bench/command.test.ts
packages/cli/src/bench/compare.test.ts
packages/cli/src/bench/compare.ts
packages/cli/src/bench/compare/schema.ts
packages/cli/src/bench/mod.ts
packages/cli/src/bench/result/schema.ts
packages/cli/src/bench/schema.test.ts
schema/bench/compare-report-v1.json
schema/bench/report-v3.json

Document every published benchmark schema on the schema landing page, including the current scenario and report versions plus the comparison report schema. Keep the schema directory README aligned with the same published files and source locations. Assisted-by: Codex:gpt-5.5

Match scenarios by name, type, and occurrence so reordered suites do not produce false missing-scenario comparisons. Clean partial worktree registrations after failed checkout and treat child processes without a pid as already stopped. fedify-dev#804 (comment) fedify-dev#804 (comment) fedify-dev#804 (comment) Assisted-by: Codex:gpt-5.5

dahlia · 2026-06-13T06:12:49Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces the fedify bench compare command to enable CI-friendly performance regression checks by comparing base and head git revisions. It also updates the benchmark report schema to version 3 and scenario schema to version 2 to support and aggregate repeated runs. Review feedback suggests gracefully handling cases where the base metric is zero in regressionRatio for 'higher-is-better' metrics, and ensuring that the Windows process tree is only force-killed with /F when the signal is SIGKILL to allow graceful shutdowns on SIGTERM.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

dahlia · 2026-06-13T06:16:19Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7b880cc7c4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Keep compare's derived target separate from the explicit CLI target signal, so unsafe overrides still require --target. Also handle zero throughput baselines as improvements and let Windows targets receive a graceful taskkill before forced cleanup. fedify-dev#804 (comment) fedify-dev#804 (comment) fedify-dev#804 (comment) Assisted-by: Codex:gpt-5.5

dahlia · 2026-06-13T06:35:39Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces the fedify bench compare command to facilitate CI-friendly performance regression comparisons between Git revisions, updating the benchmark report schema to version 3 to support repeated runs and adding a new comparison report schema. The review feedback suggests merging duplicate imports from node:fs/promises, adding a JSDoc comment to the public windowsTaskkillArgs function to adhere to the repository style guide, and using safer optional chaining when canceling response bodies to prevent potential runtime crashes.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ab34e0e986

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

The public-target unsafe override already required an explicit target, load, and duration. Include the repeated-run count in that contract so the default run multiplier cannot expand public benchmark traffic without an explicit operator choice. fedify-dev#804 (comment) Assisted-by: Codex:gpt-5.5

Merge the duplicate fs/promises imports, document the taskkill argument helper, and tolerate custom response bodies without cancel methods while polling readiness endpoints. fedify-dev#804 (comment) fedify-dev#804 (comment) fedify-dev#804 (comment) Assisted-by: Codex:gpt-5.5

dahlia · 2026-06-13T07:04:38Z

@codex review

dahlia · 2026-06-13T07:04:38Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces the fedify bench compare command to enable comparing performance regression between base and head Git revisions in CI environments. It also implements support for repeated benchmark runs (defaulting to 3 runs) and aggregates repeated runs using median metrics for latency/throughput and pessimistic correctness. This change updates the benchmark report schema to version 3 and introduces the comparison report schema version 1. Feedback on the changes highlights two key improvement opportunities: first, regressionRatio returns null when base is 0 and the direction is "lower-is-better", which can cause microbenchmarks with millisecond precision to fail on negligible differences; second, the partialField aggregation discards standard-specific or queue metrics entirely if they are missing in a single run, whereas it should filter out undefined values and compute the median of the present values.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 82c39d39c8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Treat one-millisecond latency movement from a zero baseline as noise, preserve by-standard report detail across repeated runs, and avoid publishing a pooled aggregate histogram that disagrees with the reported median run latency fields. fedify-dev#804 (comment) fedify-dev#804 (comment) fedify-dev#804 (comment) Assisted-by: Codex:gpt-5.5

dahlia · 2026-06-13T07:26:05Z

@codex review

dahlia · 2026-06-13T07:26:05Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces the fedify bench compare command to facilitate CI-friendly performance regression testing by checking out base and head refs into temporary worktrees, running the benchmark suite sequentially, and failing if regression exceeds a configured threshold. Additionally, benchmark scenarios now run three times by default to aggregate repeated runs with median latency/throughput and pessimistic correctness results, updating the report schema to version 3 and introducing comparison report schema version 1. Feedback on the implementation highlights a compatibility issue with toReversed() in Node.js 18, a potential CI failure when comparing newly introduced metrics without a baseline, and a TOCTOU security vulnerability caused by deleting temporary directories immediately after creation.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

chatgpt-codex-connector · 2026-06-13T07:34:11Z

Codex Review: Didn't find any major issues. Delightful!

Reviewed commit: de36221102

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Avoid Node 18-incompatible reverse iteration, let newly introduced head-side metrics pass when the baseline lacks that measurement, and keep temporary worktree directories intact until git owns them. fedify-dev#804 (comment) fedify-dev#804 (comment) fedify-dev#804 (comment) Assisted-by: Codex:gpt-5.5

dahlia · 2026-06-13T09:16:40Z

@codex review

dahlia · 2026-06-13T09:16:40Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces the fedify bench compare command to compare performance metrics between two git revisions (base and head) in a CI-friendly manner, failing if the head revision regresses beyond a specified threshold plus measured noise. It updates the benchmark report schema to version 3 to support repeated runs (defaulting to 3 runs) and aggregates their results, and adds a new comparison report schema version 1. Feedback on the changes highlights two key improvements: wrapping killTargetProcess in a try...catch block inside stopTargetProcess to prevent unhandled exceptions from crashing the CLI, and registering SIGINT/SIGTERM signal handlers to clean up temporary worktrees and stop active target processes upon user interruption.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1d2edea8be

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Race target exits against readiness, make interrupted comparisons stop the active target and clean temporary worktrees, and reject cleanly if forced target termination throws. fedify-dev#804 (comment) fedify-dev#804 (comment) fedify-dev#804 (comment) Assisted-by: Codex:gpt-5.5

dahlia · 2026-06-13T10:00:14Z

@codex review

dahlia · 2026-06-13T10:00:14Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces the fedify bench compare command to enable CI-friendly performance regression gates by comparing base and head git revisions. It also implements support for repeated benchmark runs (defaulting to three runs), aggregating their results (using median values for latency/throughput and pessimistic correctness results), and updates the benchmark report schema to version 3. Feedback on the changes suggests using a type assertion to cast fieldValues to readonly number[] in packages/cli/src/bench/result/build.ts to resolve a TypeScript compilation error under strict null checks.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 41d38f42bc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Carry interrupt and target-exit cancellation through compare readiness, in-process benchmark runs, runner fetches, and load-generator sleeps so raced benchmark work does not continue after cleanup starts. fedify-dev#804 (comment) fedify-dev#804 (comment) Assisted-by: Codex:gpt-5.5

dahlia · 2026-06-13T11:32:19Z

@codex review

dahlia · 2026-06-13T11:32:19Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces the fedify bench compare command, which enables CI-friendly performance regression testing by comparing a base and head git revision on the same runner. It automates checking out revisions into temporary worktrees, starting target processes, waiting for readiness, and running the benchmark suite. Additionally, the PR implements support for repeated benchmark runs (defaulting to three) and aggregates their metrics (using median values for latency/throughput and pessimistic success rates) to ensure stable results. To support these features, the benchmark report schema has been updated to version 3, a new comparison report schema version 1 has been introduced, and abort signals have been integrated to cancel in-flight benchmark tasks. Since no review comments were provided, there is no further feedback to address.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

chatgpt-codex-connector · 2026-06-13T11:42:10Z

Codex Review: Didn't find any major issues. Nice work!

Reviewed commit: 356539c6c8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

dahlia added 6 commits June 13, 2026 11:35

Harden benchmark comparison runtime

a4b2a5a

Keep benchmark target output away from stdout, enforce readiness timeouts, terminate Windows target process trees, and avoid infinite noise allowances in comparison reports. fedify-dev#744 fedify-dev#786 Assisted-by: Codex:gpt-5.5

Preserve missing repeated server metrics

675e939

Treat missing server-side measurements in any repeated benchmark run as missing from the aggregate so signature and queue expectations cannot pass on a partial subset of runs. fedify-dev#744 fedify-dev#786 Assisted-by: Codex:gpt-5.5

Add a PR link to the changelog

b18c937

dahlia added this to the Fedify 2.3 milestone Jun 13, 2026

dahlia self-assigned this Jun 13, 2026

dahlia added the component/cli CLI tools related label Jun 13, 2026

dahlia added this to STF 2025–2026 Milestone 6: Observability and Monitoring (OpenTelemetry) Jun 13, 2026

dahlia added the component/otel OpenTelemetry integration label Jun 13, 2026