fix(harbor): warn that sample_timeout is not enforced in Mode B by shehabyasser-scale · Pull Request #20 · scaleapi/vero

shehabyasser-scale · 2026-07-03T10:25:08Z

Stacked on #19. Found live: a Mode B config carried sample_timeout: 1200 through several optimization trials and it never did anything. In Mode B the nested harbor run applies each task's own harbor-configured timeouts; the only vero-side cap is timeout on the whole nested run. TB2 tasks can legitimately run 30-40 minutes, so an author relying on a silent no-op cap gets surprising wall-clock and cost.

The sidecar now warns at startup when sample_timeout is explicitly set (via model_fields_set, so the default doesn't warn) alongside a Mode B config, pointing at harbor.extra_args (e.g. --agent-timeout-multiplier) as the real lever. Mode A behavior unchanged.

1 test (warns in Mode B when set; silent for default and for Mode A). 13 pass.

🤖 Generated with Claude Code

Greptile Summary

Adds a startup warning when sample_timeout is explicitly set alongside a Mode B (harbor) config, where it has no effect. The check uses Pydantic v2's model_fields_set so only explicit author-supplied values trigger the warning, not the 180 s default.

_warn_mode_b_sample_timeout is a small, focused function placed in build_components after the existing Mode A integrity guard; it correctly short-circuits on both the absence of harbor and the absence of an explicit sample_timeout.
The new test covers all three cases: warns in Mode B when set explicitly, silent when using the default in Mode B, and silent for Mode A with harbor=None.

Confidence Score: 5/5

Safe to merge — adds a diagnostic-only warning with no changes to runtime behavior or data paths.

The change is purely additive: a warning emitted at startup when a no-op config key is detected. No evaluation logic, budget accounting, or API surface is modified. The model_fields_set gate correctly distinguishes explicit author intent from the default, and the test validates all three cases.

No files require special attention.

Important Files Changed

Filename	Overview
vero/src/vero/harbor/serve.py	Adds `_warn_mode_b_sample_timeout` helper and calls it in `build_components`; logic is correct, uses lazy `%s` log formatting, and `model_fields_set` detection is idiomatic Pydantic v2.
vero/tests/test_harbor_serve.py	New `test_mode_b_sample_timeout_warns` covers warn/no-warn paths correctly and uses `caplog.messages` (not `r.message`) per the existing review thread fix.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[build_components called] --> B{config.harbor is None?}
    B -- Yes, Mode A --> C{task_project set?}
    C -- No --> D[raise ValueError]
    C -- Yes --> E[_warn_mode_b_sample_timeout]
    B -- No, Mode B --> E
    E --> F{"sample_timeout in\nmodel_fields_set?"}
    F -- Yes --> G[logger.warning:\nsample_timeout not enforced\nin Mode B]
    F -- No --> H[Silent — default value,\nno author intent]
    G --> I[Continue: build workspace,\nledger, evaluator, engine]
    H --> I

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[build_components called] --> B{config.harbor is None?}
    B -- Yes, Mode A --> C{task_project set?}
    C -- No --> D[raise ValueError]
    C -- Yes --> E[_warn_mode_b_sample_timeout]
    B -- No, Mode B --> E
    E --> F{"sample_timeout in\nmodel_fields_set?"}
    F -- Yes --> G[logger.warning:\nsample_timeout not enforced\nin Mode B]
    F -- No --> H[Silent — default value,\nno author intent]
    G --> I[Continue: build workspace,\nledger, evaluator, engine]
    H --> I

_{Reviews (2): Last reviewed commit: "test(harbor): use caplog.messages instea..." | Re-trigger Greptile}

Found live: a Mode B config carried sample_timeout: 1200 through several optimization trials and it never did anything. The nested harbor run applies each task's own harbor-configured timeouts; the only vero-side cap is timeout on the whole nested run. An author who set sample_timeout expecting a per-task cap silently gets none, which matters when nested tasks can legitimately run 30-40 minutes (TB2). The sidecar now warns at startup when sample_timeout is explicitly set alongside a Mode B config, pointing at harbor.extra_args (e.g. --agent-timeout-multiplier) as the real lever. Mode A is unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Greptile on #20: .message is only set by Formatter.format(), which pytest's capture handler never calls; caplog.messages goes through getMessage() and is stable across pytest versions. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

greptile-apps Bot reviewed Jul 3, 2026

View reviewed changes

Comment thread vero/tests/test_harbor_serve.py Outdated

shehabyasser-scale and others added 2 commits July 3, 2026 15:58

shehabyasser-scale force-pushed the harbor-2-mode-b-sample-timeout-warning branch from ee72f91 to f93e25f Compare July 3, 2026 12:59

shehabyasser-scale mentioned this pull request Jul 3, 2026

feat(harbor): the agent's first baseline eval is budget-free #25

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(harbor): warn that sample_timeout is not enforced in Mode B#20

fix(harbor): warn that sample_timeout is not enforced in Mode B#20
shehabyasser-scale wants to merge 2 commits into
harbor-2-baseline-at-finalizefrom
harbor-2-mode-b-sample-timeout-warning

shehabyasser-scale commented Jul 3, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

shehabyasser-scale commented Jul 3, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shehabyasser-scale commented Jul 3, 2026 •

edited by greptile-apps Bot

Loading