Skip to content

fix(harbor): warn that sample_timeout is not enforced in Mode B#20

Open
shehabyasser-scale wants to merge 2 commits into
harbor-2-baseline-at-finalizefrom
harbor-2-mode-b-sample-timeout-warning
Open

fix(harbor): warn that sample_timeout is not enforced in Mode B#20
shehabyasser-scale wants to merge 2 commits into
harbor-2-baseline-at-finalizefrom
harbor-2-mode-b-sample-timeout-warning

Conversation

@shehabyasser-scale

@shehabyasser-scale shehabyasser-scale commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator

Stacked on #19. Found live: a Mode B config carried sample_timeout: 1200 through several optimization trials and it never did anything. In Mode B the nested harbor run applies each task's own harbor-configured timeouts; the only vero-side cap is timeout on the whole nested run. TB2 tasks can legitimately run 30-40 minutes, so an author relying on a silent no-op cap gets surprising wall-clock and cost.

The sidecar now warns at startup when sample_timeout is explicitly set (via model_fields_set, so the default doesn't warn) alongside a Mode B config, pointing at harbor.extra_args (e.g. --agent-timeout-multiplier) as the real lever. Mode A behavior unchanged.

1 test (warns in Mode B when set; silent for default and for Mode A). 13 pass.

🤖 Generated with Claude Code

Greptile Summary

Adds a startup warning when sample_timeout is explicitly set alongside a Mode B (harbor) config, where it has no effect. The check uses Pydantic v2's model_fields_set so only explicit author-supplied values trigger the warning, not the 180 s default.

  • _warn_mode_b_sample_timeout is a small, focused function placed in build_components after the existing Mode A integrity guard; it correctly short-circuits on both the absence of harbor and the absence of an explicit sample_timeout.
  • The new test covers all three cases: warns in Mode B when set explicitly, silent when using the default in Mode B, and silent for Mode A with harbor=None.

Confidence Score: 5/5

Safe to merge — adds a diagnostic-only warning with no changes to runtime behavior or data paths.

The change is purely additive: a warning emitted at startup when a no-op config key is detected. No evaluation logic, budget accounting, or API surface is modified. The model_fields_set gate correctly distinguishes explicit author intent from the default, and the test validates all three cases.

No files require special attention.

Important Files Changed

Filename Overview
vero/src/vero/harbor/serve.py Adds _warn_mode_b_sample_timeout helper and calls it in build_components; logic is correct, uses lazy %s log formatting, and model_fields_set detection is idiomatic Pydantic v2.
vero/tests/test_harbor_serve.py New test_mode_b_sample_timeout_warns covers warn/no-warn paths correctly and uses caplog.messages (not r.message) per the existing review thread fix.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[build_components called] --> B{config.harbor is None?}
    B -- Yes, Mode A --> C{task_project set?}
    C -- No --> D[raise ValueError]
    C -- Yes --> E[_warn_mode_b_sample_timeout]
    B -- No, Mode B --> E
    E --> F{"sample_timeout in\nmodel_fields_set?"}
    F -- Yes --> G[logger.warning:\nsample_timeout not enforced\nin Mode B]
    F -- No --> H[Silent — default value,\nno author intent]
    G --> I[Continue: build workspace,\nledger, evaluator, engine]
    H --> I
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[build_components called] --> B{config.harbor is None?}
    B -- Yes, Mode A --> C{task_project set?}
    C -- No --> D[raise ValueError]
    C -- Yes --> E[_warn_mode_b_sample_timeout]
    B -- No, Mode B --> E
    E --> F{"sample_timeout in\nmodel_fields_set?"}
    F -- Yes --> G[logger.warning:\nsample_timeout not enforced\nin Mode B]
    F -- No --> H[Silent — default value,\nno author intent]
    G --> I[Continue: build workspace,\nledger, evaluator, engine]
    H --> I
Loading

Reviews (2): Last reviewed commit: "test(harbor): use caplog.messages instea..." | Re-trigger Greptile

Comment thread vero/tests/test_harbor_serve.py Outdated
shehabyasser-scale and others added 2 commits July 3, 2026 15:58
Found live: a Mode B config carried sample_timeout: 1200 through several
optimization trials and it never did anything. The nested harbor run
applies each task's own harbor-configured timeouts; the only vero-side
cap is timeout on the whole nested run. An author who set sample_timeout
expecting a per-task cap silently gets none, which matters when nested
tasks can legitimately run 30-40 minutes (TB2).

The sidecar now warns at startup when sample_timeout is explicitly set
alongside a Mode B config, pointing at harbor.extra_args (e.g.
--agent-timeout-multiplier) as the real lever. Mode A is unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Greptile on #20: .message is only set by Formatter.format(), which
pytest's capture handler never calls; caplog.messages goes through
getMessage() and is stable across pytest versions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant