Skip to content

Fix BED-8727: Add a default schema contract for DLT collection pipeline#32

Open
d3vzer0 wants to merge 10 commits into
mainfrom
fix/schemacontracts
Open

Fix BED-8727: Add a default schema contract for DLT collection pipeline#32
d3vzer0 wants to merge 10 commits into
mainfrom
fix/schemacontracts

Conversation

@d3vzer0

@d3vzer0 d3vzer0 commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

This change sets the following (default) DLT schema contracts:

entity new old description
new tables evolve evolve Not changed, already default DLT behaviour
new columns evolve evolve Not changed, already default DLT behaviour
data type changes discard_row freeze Instead of failing the pipeline, skip/discard the resource

This prevents the pipeline from failing when the data type of a single column does not match our expected schema. This change will discard the row and continue processing the pipeline by default. This can still be modified by supplying the --data-type-contract=freeze CLI option when strict data checks are needed:

╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --progress                  [tqdm|log|alive_progress]    Select progress tracker option [default: tqdm]                                                                                                                                                                                                                                                                         │
│ --tables-contract           [evolve|freeze|discard_row]  DLT contract applied when data contains newly seen resources/tables previously not collected [default: evolve]                                                                                                                                                                                                         │
│ --columns-contract          [evolve|freeze|discard_row]  DLT contract applied when data contains values/keys not found in the Pydantic model [default: evolve]                                                                                                                                                                                                                  │
│ --data-type-contract        [evolve|freeze|discard_row]  DLT contract applied when fields do not match the data types defined in the Pydantic model [default: discard_row]       

@d3vzer0 d3vzer0 changed the title Add a default schema contract for DLT collection pipeline BED-8727: Add a default schema contract for DLT collection pipeline Jun 22, 2026
@d3vzer0 d3vzer0 changed the title BED-8727: Add a default schema contract for DLT collection pipeline Fix BED-8727: Add a default schema contract for DLT collection pipeline Jun 22, 2026
Comment thread src/openhound/core/app.py
class Contract(str, Enum):
evolve = "evolve"
freeze = "freeze"
discard_value = "discard_value"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discard_value is a supported contract but not supported when using Pydantic models, which is what we use for all the collectors

@d3vzer0 d3vzer0 marked this pull request as ready for review June 22, 2026 23:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant