Skip to content

feat(harbor): expose score_baseline in the build YAML#24

Open
shehabyasser-scale wants to merge 2 commits into
harbor-3-compiler-fixesfrom
harbor-3-score-baseline-yaml
Open

feat(harbor): expose score_baseline in the build YAML#24
shehabyasser-scale wants to merge 2 commits into
harbor-3-compiler-fixesfrom
harbor-3-score-baseline-yaml

Conversation

@shehabyasser-scale

@shehabyasser-scale shehabyasser-scale commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator

Companion to #19 (sidecar stack); based on the compiler stack.

ServeConfig.score_baseline (admin-score the unmodified baseline at finalize, write <admin_volume>/baseline.json so regressions are visible) exists on the sidecar branch but was unreachable: BuildConfig had no such field, pydantic silently dropped it from YAML, and the compiler never emitted the key, so it was always False in a compiled task. The regression it detects is real: in our weak-model trial, auto_best's winner scored 0.2 on validation against the untouched repo's 0.3, invisibly.

This adds the field to BuildConfig and emits it into serve.json. Merge-order safe in both directions: where ServeConfig predates the field, the extra key is ignored (pydantic extra=ignore, pinned by a raw-JSON test); once #19 lands, the knob becomes live with no further change.

Tests: default false reaches serve.json through a full compile_task; score_baseline: true travels the actual YAML path (yaml.safe_load -> model_validate -> _serve_config).

Follow-ups deliberately not in scope: extra="forbid" on BuildConfig (a typo'd key anywhere in build.yaml is still silently dropped, worth its own PR), and documenting the knob in the harbor-4 docs branch.

🤖 Generated with Claude Code

Greptile Summary

This PR exposes score_baseline from build.yaml through the full compiler pipeline into serve.json, fixing a silent Pydantic drop that had kept the feature permanently False in compiled tasks.

  • config.py: Adds score_baseline: bool = False to BuildConfig with an explanatory comment.
  • compiler.py: Emits "score_baseline": config.score_baseline into the _serve_config dict, wiring it into the compiler-to-sidecar contract.
  • test_harbor_build.py: Adds three tests — default-False via raw JSON, True through the YAML→model_validate_serve_config path, and True through the full compile_task pipeline — addressing the previous review comment about the True path lacking end-to-end coverage.

Confidence Score: 5/5

Safe to merge — the change is a minimal, additive field that defaults to False, leaving all existing behavior unchanged.

The diff is a one-field addition to BuildConfig, one-line addition to _serve_config, and three well-layered tests. The default is False, so no existing compiled tasks change behavior. Forward compatibility is covered by pydantic extra=ignore on ServeConfig. The previously flagged gap (True path not tested through compile_task) is now closed by test_score_baseline_true_through_compile_task.

No files require special attention.

Important Files Changed

Filename Overview
vero/src/vero/harbor/build/config.py Adds score_baseline: bool = False field to BuildConfig with a clear inline comment explaining its purpose; minimal and correct change.
vero/src/vero/harbor/build/compiler.py Emits score_baseline into the _serve_config dict alongside submit_enabled; one-line change, correctly placed in the compiler-to-serve contract dict.
vero/tests/test_harbor_build.py Adds three new tests covering default-False in raw JSON, True through YAML→_serve_config, and True through the full compile_task pipeline (addressing the previous review comment).

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant YAML as build.yaml
    participant BC as BuildConfig
    participant SC as _serve_config()
    participant SJ as serve.json
    participant SVC as ServeConfig (sidecar)

    YAML->>BC: "yaml.safe_load → model_validate<br/>(score_baseline: true)"
    BC->>SC: config.score_baseline passed in
    SC->>SJ: "{"score_baseline": true, ...}"
    SJ->>SVC: "parsed at sidecar runtime<br/>(extra=ignore if field absent)"
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant YAML as build.yaml
    participant BC as BuildConfig
    participant SC as _serve_config()
    participant SJ as serve.json
    participant SVC as ServeConfig (sidecar)

    YAML->>BC: "yaml.safe_load → model_validate<br/>(score_baseline: true)"
    BC->>SC: config.score_baseline passed in
    SC->>SJ: "{"score_baseline": true, ...}"
    SJ->>SVC: "parsed at sidecar runtime<br/>(extra=ignore if field absent)"
Loading

Reviews (2): Last reviewed commit: "test(harbor): drive score_baseline=True ..." | Re-trigger Greptile

ServeConfig.score_baseline (baseline scored at finalize so regressions
are visible) existed but was unreachable: BuildConfig had no such field,
pydantic silently dropped it from YAML, and the compiler never emitted
the key, so it was always False in a compiled task. The regression it
detects is real: in a live weak-model trial, auto_best's winner scored
0.2 on validation against the untouched repo's 0.3, invisibly.

Adds the field to BuildConfig and emits it into serve.json. On stacks
where ServeConfig predates the field the extra key is ignored (pydantic
extra=ignore), so this composes with the sidecar-side feature branch in
either merge order.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread vero/tests/test_harbor_build.py
Greptile on #24: only the default-False case went through the full
pipeline; a compile_task refactor dropping the field would have kept the
True test green. Now both paths hit the written serve.json.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant