Skip to content

fix: resolve all outstanding code quality issues#40

Merged
Patel230 merged 2 commits into
mainfrom
fix/all-issues
Jun 17, 2026
Merged

fix: resolve all outstanding code quality issues#40
Patel230 merged 2 commits into
mainfrom
fix/all-issues

Conversation

@Patel230

Copy link
Copy Markdown
Contributor
  • C4: silenced resp.Body.Close() → logged errors (14 sites)
  • H3: TTFT tracking added to Anthropic streaming
  • H5: circuit breaker Allow() pure predicate, Failure() HalfOpen fix
  • M6: sync.Mutex in RepeatDetector prevents data race
  • Removed dead recordRequest(), GetProviderModelCandidates,
    liveEntriesToCatalog shim
  • Merged GetPreferredProviderModel/GetProviderDefaultModel
  • Moved retriableCodes maps to package-level vars
  • Fixed replaceCI redundant nil-length check
  • Fixed sort.SliceStable race in shrink (moved to init())
  • Updated all callers and tests

- C4: silenced resp.Body.Close() → logged errors (14 sites)
- H3: TTFT tracking added to Anthropic streaming
- H5: circuit breaker Allow() pure predicate, Failure() HalfOpen fix
- M6: sync.Mutex in RepeatDetector prevents data race
- Removed dead recordRequest(), GetProviderModelCandidates,
  liveEntriesToCatalog shim
- Merged GetPreferredProviderModel/GetProviderDefaultModel
- Moved retriableCodes maps to package-level vars
- Fixed replaceCI redundant nil-length check
- Fixed sort.SliceStable race in shrink (moved to init())
- Updated all callers and tests
@Patel230

Copy link
Copy Markdown
Contributor Author

Review: functional changes

Walked the full functional diff (excluding formatting nits). One real bug found in a change that was labeled as a fix.

Blocker

H-F1 (High): CircuitHalfOpen is now unreachable (router/circuitbreaker.go)

The H5 fix rewrote Allow() as a "pure predicate" but in doing so removed the Open→HalfOpen state transition. No other code path sets the state to CircuitHalfOpen, so:

  • The constant CircuitHalfOpen is dead.
  • The case CircuitHalfOpen: branch in Allow() is dead.
  • The if cb.state == CircuitHalfOpen { ... return } branch in Failure() is dead.

Behavior regression. With cooldown elapsed and the circuit Open:

  • Old code: Allow() transitions to HalfOpen and returns true; the next call is a single probe; success closes, failure re-opens immediately.
  • New code: Allow() returns true but leaves state at Open; the next failure increments failureCount (default 5) and only re-opens when threshold is reached. Up to 4 additional failures slip through during what should be the probe window.

Tests don't catch this because no test exercises the half-open single-probe path. Existing tests still pass, so the regression is silent.

Suggested fix. Keep the Open→HalfOpen transition — either back in Allow() (the original approach) or in a new explicit method called once after cooldown elapses:

// Option A: transition in Allow
case CircuitOpen:
    if time.Since(cb.lastFailureTime) >= cb.cooldown {
        cb.state = CircuitHalfOpen
        return true
    }
    return false

// Option B: explicit method
func (cb *CircuitBreaker) OnCooldownElapsed() {
    cb.mu.Lock(); defer cb.mu.Unlock()
    if cb.state == CircuitOpen && time.Since(cb.lastFailureTime) >= cb.cooldown {
        cb.state = CircuitHalfOpen
    }
}

Add a test that drives Open→cooldown→probe-fail→Open and asserts the second failure re-opens the circuit immediately, not after threshold more failures.

Low-severity nits (non-blocking)

  • L-F1: TTFTms on the done event is 0 when no token arrived (immediate error, empty response, cancellation). Consumers can't distinguish "0ms TTFT" from "no TTFT measured". Use *int or document the convention.
  • L-F2: EyrieStreamEvent has both TTFT (on the ttft event) and TTFTms (on the done event). Both are milliseconds. Asymmetric naming will mislead consumers. Rename one for consistency.
  • L-F3: processAnthropicStreamWithOpts duplicates the processOpenAIStreamWithOpts test-seam pattern. Consider a shared clock interface so future providers don't copy-paste.
  • L-F4: RepeatDetector.GetRepeatness/IsRepeating hold the mutex for an O(n) walk of the suffix automaton. Fine for short streams; consider a copy-on-read snapshot for long streams.

Verified clean

  • client/anthropic.go json.Marshal error handling for cached and non-cached request bodies: correct, wrapped errors.
  • client/anthropic.go tool_use parse error logging: includes tool name for context.
  • client/repeat_detector.go mutex coverage: all four entry points (Feed, GetRepeatness, IsRepeating, Add) acquire the lock; the race is fixed.
  • internal/shrink/shrink.go sort.SliceStable move to init(): correct fix for a concurrent sort.
  • replaceCI redundant nil-length check fix: verified.
  • retriableCodes map hoisting to package-level var: correct; reduces per-call allocation.
  • Dead-code removal (recordRequest, GetProviderModelCandidates, liveEntriesToCatalog, merged GetPreferredProviderModel/GetProviderDefaultModel): no dangling references; callers updated consistently.
  • All builds, go vet, go test, go test -race, and gofmt -l pass.

Verdict: request changes

H-F1 is a correctness regression in a fix labeled "H5 HalfOpen fix" — the state machine is broken. Block merge until it's fixed and a test is added for the half-open single-probe path.

@Patel230 Patel230 left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: functional changes

Walked the full functional diff (excluding formatting nits). One real bug found in a change that was labeled as a fix.

Blocker

H-F1 (High): CircuitHalfOpen is now unreachable (router/circuitbreaker.go)

The H5 fix rewrote Allow() as a "pure predicate" but in doing so removed the Open→HalfOpen state transition. No other code path sets the state to CircuitHalfOpen, so:

  • The constant CircuitHalfOpen is dead.
  • The case CircuitHalfOpen: branch in Allow() is dead.
  • The if cb.state == CircuitHalfOpen { ... return } branch in Failure() is dead.

Behavior regression. With cooldown elapsed and the circuit Open:

  • Old code: Allow() transitions to HalfOpen and returns true; the next call is a single probe; success closes, failure re-opens immediately.
  • New code: Allow() returns true but leaves state at Open; the next failure increments failureCount (default 5) and only re-opens when threshold is reached. Up to 4 additional failures slip through during what should be the probe window.

Tests don't catch this because no test exercises the half-open single-probe path. Existing tests still pass, so the regression is silent.

Suggested fix. Keep the Open→HalfOpen transition — either back in Allow() (the original approach) or in a new explicit method called once after cooldown elapses:

// Option A: transition in Allow
case CircuitOpen:
    if time.Since(cb.lastFailureTime) >= cb.cooldown {
        cb.state = CircuitHalfOpen
        return true
    }
    return false

// Option B: explicit method
func (cb *CircuitBreaker) OnCooldownElapsed() {
    cb.mu.Lock(); defer cb.mu.Unlock()
    if cb.state == CircuitOpen && time.Since(cb.lastFailureTime) >= cb.cooldown {
        cb.state = CircuitHalfOpen
    }
}

Add a test that drives Open→cooldown→probe-fail→Open and asserts the second failure re-opens the circuit immediately, not after threshold more failures.

Low-severity nits (non-blocking)

  • L-F1: TTFTms on the done event is 0 when no token arrived (immediate error, empty response, cancellation). Consumers can't distinguish "0ms TTFT" from "no TTFT measured". Use *int or document the convention.
  • L-F2: EyrieStreamEvent has both TTFT (on the ttft event) and TTFTms (on the done event). Both are milliseconds. Asymmetric naming will mislead consumers. Rename one for consistency.
  • L-F3: processAnthropicStreamWithOpts duplicates the processOpenAIStreamWithOpts test-seam pattern. Consider a shared clock interface so future providers don't copy-paste.
  • L-F4: RepeatDetector.GetRepeatness/IsRepeating hold the mutex for an O(n) walk of the suffix automaton. Fine for short streams; consider a copy-on-read snapshot for long streams.

Verified clean

  • client/anthropic.go json.Marshal error handling for cached and non-cached request bodies: correct, wrapped errors.
  • client/anthropic.go tool_use parse error logging: includes tool name for context.
  • client/repeat_detector.go mutex coverage: all four entry points (Feed, GetRepeatness, IsRepeating, Add) acquire the lock; the race is fixed.
  • internal/shrink/shrink.go sort.SliceStable move to init(): correct fix for a concurrent sort.
  • replaceCI redundant nil-length check fix: verified.
  • retriableCodes map hoisting to package-level var: correct; reduces per-call allocation.
  • Dead-code removal (recordRequest, GetProviderModelCandidates, liveEntriesToCatalog, merged GetPreferredProviderModel/GetProviderDefaultModel): no dangling references; callers updated consistently.
  • All builds, go vet, go test, go test -race, and gofmt -l pass.

Verdict: request changes

H-F1 is a correctness regression in a fix labeled "H5 HalfOpen fix" — the state machine is broken. Block merge until it's fixed and a test is added for the half-open single-probe path.

- circuitbreaker.go: restore Open->HalfOpen transition in Allow()
- circuitbreaker_test.go: add half-open probe path tests
@Patel230

Copy link
Copy Markdown
Contributor Author

✅ Addressed all review feedback:

H-F1 (CircuitHalfOpen unreachable): Restored the Open→HalfOpen transition in Allow() — when cooldown has elapsed, Allow() now transitions to CircuitHalfOpen before returning true. Failure() immediately reopens on half-open (kept from previous fix). Added three new tests covering the full half-open lifecycle: probe→success closes, probe→failure reopens immediately, and probe-fail→cooldown→probe→succeed cycle.

Low-severity nits (L-F1 through L-F4): Noted for follow-up. The TTFTms naming asymmetry and nil-distinction concerns are worth addressing in a separate PR.

CI passes: go test -race, go vet, golangci-lint, govulncheck — all clean.

@Patel230

Copy link
Copy Markdown
Contributor Author

Updated summary of circuit breaker fix:

H-F1 fix in router/circuitbreaker.go:

case CircuitOpen:
    if time.Since(cb.lastFailureTime) >= cb.cooldown {
        cb.state = CircuitHalfOpen
        return true
    }
    return false

This restores the Open→HalfOpen transition. On cooldown elapsed, Allow() transitions to CircuitHalfOpen before granting the probe, fixing the regression where CircuitHalfOpen was unreachable.

New half-open tests (router/circuitbreaker_test.go):

  • Probe→success closes the circuit
  • Probe→failure immediately reopens (single failure, not threshold)
  • Full cycle: fail→open→cooldown→probe→fail→open→cooldown→probe→succeed→closed

The timing-dependent test is skipped with a manual alternative that uses lastFailureTime manipulation.

@Patel230 Patel230 merged commit 98e5781 into main Jun 17, 2026
15 checks passed
@Patel230 Patel230 deleted the fix/all-issues branch June 17, 2026 21:08
Patel230 added a commit that referenced this pull request Jun 18, 2026
* fix: resolve all outstanding issues in eyrie

- C4: silenced resp.Body.Close() → logged errors (14 sites)
- H3: TTFT tracking added to Anthropic streaming
- H5: circuit breaker Allow() pure predicate, Failure() HalfOpen fix
- M6: sync.Mutex in RepeatDetector prevents data race
- Removed dead recordRequest(), GetProviderModelCandidates,
  liveEntriesToCatalog shim
- Merged GetPreferredProviderModel/GetProviderDefaultModel
- Moved retriableCodes maps to package-level vars
- Fixed replaceCI redundant nil-length check
- Fixed sort.SliceStable race in shrink (moved to init())
- Updated all callers and tests

* fix: address PR review comments

- circuitbreaker.go: restore Open->HalfOpen transition in Allow()
- circuitbreaker_test.go: add half-open probe path tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant