fix(harbor): allow viewable splits to carry budgets#13
Open
shehabyasser-scale wants to merge 1 commit into
Open
fix(harbor): allow viewable splits to carry budgets#13shehabyasser-scale wants to merge 1 commit into
shehabyasser-scale wants to merge 1 commit into
Conversation
A budget meters compute, not information: in Mode B every eval is a real, paid benchmark run (nested harbor job), so metering a fully-visible train split is legitimate and often exactly what an experimenter wants (classic ML train visibility with per-task feedback, bounded spend). Found operating the system: a GAIA optimization experiment needed 'access: viewable' on the budgeted train split so the optimizer gets per-task credit assignment (aggregate-only feedback was too noisy to climb: one number per round over 6 single-rollout tasks). The previous check rejected the config outright. - no_access + budget still raises (budget would be unusable). - viewable + budget now warns loudly instead of raising. - Auto-tiering of unlisted budgeted splits to non_viewable is unchanged (safe default; only an explicit author choice unlocks viewable). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked on #7. Driven by a live experiment, not review: a GAIA prompt-optimization run needed a budgeted and viewable train split (per-task feedback; aggregate-only signal over 6 single-rollout tasks was too noisy for the optimizer to climb: trajectory 0.5 -> 0.33 -> 0.33, gave up with 7/10 budget unspent).
The old check conflated two orthogonal axes:
In Mode B each eval is a real nested benchmark run costing real money, so metering a fully-visible split is legitimate and often the correct experimental design (classic ML train visibility, bounded spend).
Changes:
no_access+ budget still raises (unusable budget = config bug).viewable+ budget now warns loudly instead of raising. Auto-tiering of unlisted budgeted splits tonon_viewableis unchanged, so the safe default stands; only an explicit author choice unlocks viewable.Tests updated + new warning assertion. 12 policy/engine tests pass.
🤖 Generated with Claude Code
Greptile Summary
This PR relaxes the budget/split-access validation in
Policy._validate_budget_splitsso that aviewablesplit can carry a budget (logs a loud warning) while ano_access+budget combination still raises. The_ensure_budgeted_splits_tieredauto-tiering behaviour is unchanged — unlisted splits still default tonon_viewable.policy.py: The singleif tier != non_viewable: raiseguard is replaced by two targeted checks —no_accessraises,viewablewarns — sonon_viewableremains the silent/safe path andviewableis an explicit, loudly-flagged opt-in.test_policy_budget_splits.py: The formerpytest.raises(ValueError)test is flipped to assert no exception plus a warning emission; theno_accessrejection test is updated to match the new error message.Confidence Score: 4/5
Safe to merge — the logic change is narrow, well-tested, and the engine's no_access guard is unchanged.
The two changed files are small and self-contained. The new two-branch check correctly maps all three SplitAccessLevel values, the auto-tiering default is untouched, and the existing test suite covers the key scenarios. The only things worth a second look are the stale first sentence in _ensure_budgeted_splits_tiered's docstring and the missing logger argument in caplog.at_level — neither affects runtime behaviour.
The _ensure_budgeted_splits_tiered docstring in policy.py now contradicts the new policy; minor but worth updating before the code drifts further.
Important Files Changed
_validate_budget_splitscorrectly relaxed to allowviewable+budget (warn) while keepingno_access+budget as a hard error;_ensure_budgeted_splits_tiereddocstring is now slightly stalepytest.raisesto a warning assertion;caplog.at_levelwould be more robust with a logger nameFlowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[_validate_budget_splits called] --> B{budget empty?} B -- yes --> Z[return] B -- no --> C[resolve split tier] C --> D{tier == no_access?} D -- yes --> E[raise ValueError 'budget unusable'] D -- no --> F{tier == viewable?} F -- yes --> G[logger.warning 'labels visible to agent'] F -- no --> H[non_viewable silent] G --> I[continue loop] H --> I%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%% flowchart TD A[_validate_budget_splits called] --> B{budget empty?} B -- yes --> Z[return] B -- no --> C[resolve split tier] C --> D{tier == no_access?} D -- yes --> E[raise ValueError 'budget unusable'] D -- no --> F{tier == viewable?} F -- yes --> G[logger.warning 'labels visible to agent'] F -- no --> H[non_viewable silent] G --> I[continue loop] H --> IPrompt To Fix All With AI
Reviews (1): Last reviewed commit: "fix(harbor): allow viewable splits to ca..." | Re-trigger Greptile