feat(harbor): expose score_baseline in the build YAML by shehabyasser-scale · Pull Request #24 · scaleapi/vero

shehabyasser-scale · 2026-07-03T12:25:26Z

Companion to #19 (sidecar stack); based on the compiler stack.

ServeConfig.score_baseline (admin-score the unmodified baseline at finalize, write <admin_volume>/baseline.json so regressions are visible) exists on the sidecar branch but was unreachable: BuildConfig had no such field, pydantic silently dropped it from YAML, and the compiler never emitted the key, so it was always False in a compiled task. The regression it detects is real: in our weak-model trial, auto_best's winner scored 0.2 on validation against the untouched repo's 0.3, invisibly.

This adds the field to BuildConfig and emits it into serve.json. Merge-order safe in both directions: where ServeConfig predates the field, the extra key is ignored (pydantic extra=ignore, pinned by a raw-JSON test); once #19 lands, the knob becomes live with no further change.

Tests: default false reaches serve.json through a full compile_task; score_baseline: true travels the actual YAML path (yaml.safe_load -> model_validate -> _serve_config).

Follow-ups deliberately not in scope: extra="forbid" on BuildConfig (a typo'd key anywhere in build.yaml is still silently dropped, worth its own PR), and documenting the knob in the harbor-4 docs branch.

🤖 Generated with Claude Code

Greptile Summary

This PR exposes score_baseline from build.yaml through the full compiler pipeline into serve.json, fixing a silent Pydantic drop that had kept the feature permanently False in compiled tasks.

config.py: Adds score_baseline: bool = False to BuildConfig with an explanatory comment.
compiler.py: Emits "score_baseline": config.score_baseline into the _serve_config dict, wiring it into the compiler-to-sidecar contract.
test_harbor_build.py: Adds three tests — default-False via raw JSON, True through the YAML→model_validate→_serve_config path, and True through the full compile_task pipeline — addressing the previous review comment about the True path lacking end-to-end coverage.

Confidence Score: 5/5

Safe to merge — the change is a minimal, additive field that defaults to False, leaving all existing behavior unchanged.

The diff is a one-field addition to BuildConfig, one-line addition to _serve_config, and three well-layered tests. The default is False, so no existing compiled tasks change behavior. Forward compatibility is covered by pydantic extra=ignore on ServeConfig. The previously flagged gap (True path not tested through compile_task) is now closed by test_score_baseline_true_through_compile_task.

No files require special attention.

Important Files Changed

Filename	Overview
vero/src/vero/harbor/build/config.py	Adds `score_baseline: bool = False` field to `BuildConfig` with a clear inline comment explaining its purpose; minimal and correct change.
vero/src/vero/harbor/build/compiler.py	Emits `score_baseline` into the `_serve_config` dict alongside `submit_enabled`; one-line change, correctly placed in the compiler-to-serve contract dict.
vero/tests/test_harbor_build.py	Adds three new tests covering default-False in raw JSON, True through YAML→`_serve_config`, and True through the full `compile_task` pipeline (addressing the previous review comment).

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant YAML as build.yaml
    participant BC as BuildConfig
    participant SC as _serve_config()
    participant SJ as serve.json
    participant SVC as ServeConfig (sidecar)

    YAML->>BC: "yaml.safe_load → model_validate<br/>(score_baseline: true)"
    BC->>SC: config.score_baseline passed in
    SC->>SJ: "{"score_baseline": true, ...}"
    SJ->>SVC: "parsed at sidecar runtime<br/>(extra=ignore if field absent)"

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant YAML as build.yaml
    participant BC as BuildConfig
    participant SC as _serve_config()
    participant SJ as serve.json
    participant SVC as ServeConfig (sidecar)

    YAML->>BC: "yaml.safe_load → model_validate<br/>(score_baseline: true)"
    BC->>SC: config.score_baseline passed in
    SC->>SJ: "{"score_baseline": true, ...}"
    SJ->>SVC: "parsed at sidecar runtime<br/>(extra=ignore if field absent)"

_{Reviews (2): Last reviewed commit: "test(harbor): drive score_baseline=True ..." | Re-trigger Greptile}

ServeConfig.score_baseline (baseline scored at finalize so regressions are visible) existed but was unreachable: BuildConfig had no such field, pydantic silently dropped it from YAML, and the compiler never emitted the key, so it was always False in a compiled task. The regression it detects is real: in a live weak-model trial, auto_best's winner scored 0.2 on validation against the untouched repo's 0.3, invisibly. Adds the field to BuildConfig and emits it into serve.json. On stacks where ServeConfig predates the field the extra key is ignored (pydantic extra=ignore), so this composes with the sidecar-side feature branch in either merge order. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Greptile on #24: only the default-False case went through the full pipeline; a compile_task refactor dropping the field would have kept the True test green. Now both paths hit the written serve.json. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

greptile-apps Bot reviewed Jul 3, 2026

View reviewed changes

Comment thread vero/tests/test_harbor_build.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(harbor): expose score_baseline in the build YAML#24

feat(harbor): expose score_baseline in the build YAML#24
shehabyasser-scale wants to merge 2 commits into
harbor-3-compiler-fixesfrom
harbor-3-score-baseline-yaml

shehabyasser-scale commented Jul 3, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

shehabyasser-scale commented Jul 3, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shehabyasser-scale commented Jul 3, 2026 •

edited by greptile-apps Bot

Loading