fix(harbor): allow viewable splits to carry budgets by shehabyasser-scale · Pull Request #13 · scaleapi/vero

shehabyasser-scale · 2026-07-03T03:03:22Z

Stacked on #7. Driven by a live experiment, not review: a GAIA prompt-optimization run needed a budgeted and viewable train split (per-task feedback; aggregate-only signal over 6 single-rollout tasks was too noisy for the optimizer to climb: trajectory 0.5 -> 0.33 -> 0.33, gave up with 7/10 budget unspent).

The old check conflated two orthogonal axes:

Information: what the agent sees per eval (the access tier's job)
Compute: how many paid evals the agent may run (the budget's job)

In Mode B each eval is a real nested benchmark run costing real money, so metering a fully-visible split is legitimate and often the correct experimental design (classic ML train visibility, bounded spend).

Changes: no_access + budget still raises (unusable budget = config bug). viewable + budget now warns loudly instead of raising. Auto-tiering of unlisted budgeted splits to non_viewable is unchanged, so the safe default stands; only an explicit author choice unlocks viewable.

Tests updated + new warning assertion. 12 policy/engine tests pass.

🤖 Generated with Claude Code

Greptile Summary

This PR relaxes the budget/split-access validation in Policy._validate_budget_splits so that a viewable split can carry a budget (logs a loud warning) while a no_access+budget combination still raises. The _ensure_budgeted_splits_tiered auto-tiering behaviour is unchanged — unlisted splits still default to non_viewable.

policy.py: The single if tier != non_viewable: raise guard is replaced by two targeted checks — no_access raises, viewable warns — so non_viewable remains the silent/safe path and viewable is an explicit, loudly-flagged opt-in.
test_policy_budget_splits.py: The former pytest.raises(ValueError) test is flipped to assert no exception plus a warning emission; the no_access rejection test is updated to match the new error message.

Confidence Score: 4/5

Safe to merge — the logic change is narrow, well-tested, and the engine's no_access guard is unchanged.

The two changed files are small and self-contained. The new two-branch check correctly maps all three SplitAccessLevel values, the auto-tiering default is untouched, and the existing test suite covers the key scenarios. The only things worth a second look are the stale first sentence in _ensure_budgeted_splits_tiered's docstring and the missing logger argument in caplog.at_level — neither affects runtime behaviour.

The _ensure_budgeted_splits_tiered docstring in policy.py now contradicts the new policy; minor but worth updating before the code drifts further.

Important Files Changed

Filename	Overview
vero/src/vero/policy.py	Logic in `_validate_budget_splits` correctly relaxed to allow `viewable`+budget (warn) while keeping `no_access`+budget as a hard error; `_ensure_budgeted_splits_tiered` docstring is now slightly stale
vero/tests/test_policy_budget_splits.py	Test correctly flipped from `pytest.raises` to a warning assertion; `caplog.at_level` would be more robust with a logger name

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[_validate_budget_splits called] --> B{budget empty?}
    B -- yes --> Z[return]
    B -- no --> C[resolve split tier]
    C --> D{tier == no_access?}
    D -- yes --> E[raise ValueError 'budget unusable']
    D -- no --> F{tier == viewable?}
    F -- yes --> G[logger.warning 'labels visible to agent']
    F -- no --> H[non_viewable silent]
    G --> I[continue loop]
    H --> I

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[_validate_budget_splits called] --> B{budget empty?}
    B -- yes --> Z[return]
    B -- no --> C[resolve split tier]
    C --> D{tier == no_access?}
    D -- yes --> E[raise ValueError 'budget unusable']
    D -- no --> F{tier == viewable?}
    F -- yes --> G[logger.warning 'labels visible to agent']
    F -- no --> H[non_viewable silent]
    G --> I[continue loop]
    H --> I

Prompt To Fix All With AI

Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
vero/src/vero/policy.py:512-518
The `_ensure_budgeted_splits_tiered` docstring still says a budgeted split "must be agent-evaluable with per-sample labels hidden, i.e. `non_viewable`", but after this PR `viewable` is also permitted (with a warning). The first sentence now contradicts the new policy and could mislead a future reader into thinking the auto-tiering logic should also gate out `viewable`.

```suggestion
        A budgeted split is one the agent may evaluate, so it must be
        agent-evaluable (``non_viewable`` or ``viewable``; ``no_access`` is
        rejected by the engine). Any budgeted split missing from
        ``split_accesses`` is auto-tiered ``non_viewable`` here, so a budget
        is never silently unusable (``no_access`` by omission) nor accidentally
        fully visible (``viewable`` by omission). Explicit author tiers are
        left untouched and checked in :meth:`_validate_budget_splits`.
```

### Issue 2 of 2
vero/tests/test_policy_budget_splits.py:56-58
`caplog.at_level("WARNING")` without a `logger` argument sets the level on the root logger but does not guarantee the `vero.policy` named logger emits at WARNING if it ever receives an explicit level configuration. Passing the logger name makes the test self-sufficient regardless of external log-level wiring.

```suggestion
    with caplog.at_level("WARNING", logger="vero.policy"):
        p._validate_budget_splits()  # no raise
    assert any("viewable" in r.message for r in caplog.records)
```

_{Reviews (1): Last reviewed commit: "fix(harbor): allow viewable splits to ca..." | Re-trigger Greptile}

A budget meters compute, not information: in Mode B every eval is a real, paid benchmark run (nested harbor job), so metering a fully-visible train split is legitimate and often exactly what an experimenter wants (classic ML train visibility with per-task feedback, bounded spend). Found operating the system: a GAIA optimization experiment needed 'access: viewable' on the budgeted train split so the optimizer gets per-task credit assignment (aggregate-only feedback was too noisy to climb: one number per round over 6 single-rollout tasks). The previous check rejected the config outright. - no_access + budget still raises (budget would be unusable). - viewable + budget now warns loudly instead of raising. - Auto-tiering of unlisted budgeted splits to non_viewable is unchanged (safe default; only an explicit author choice unlocks viewable). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(harbor): allow viewable splits to carry budgets#13

fix(harbor): allow viewable splits to carry budgets#13
shehabyasser-scale wants to merge 1 commit into
harbor-1-core-fixesfrom
harbor-1-core-viewable-budget

shehabyasser-scale commented Jul 3, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

shehabyasser-scale commented Jul 3, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shehabyasser-scale commented Jul 3, 2026 •

edited by greptile-apps Bot

Loading