From 51415b2bd960183cc49df0bec296b7e2d6bbb0da Mon Sep 17 00:00:00 2001
From: stacknil <stacknil@proton.me>
Date: Fri, 19 Jun 2026 19:52:18 +0800
Subject: [PATCH] docs(sbom): document risk model boundary

---
 docs/risk-model-boundary.md | 94 +++++++++++++++++++++++++++++++++++++
 1 file changed, 94 insertions(+)
 create mode 100644 docs/risk-model-boundary.md

diff --git a/docs/risk-model-boundary.md b/docs/risk-model-boundary.md
new file mode 100644
index 0000000..6c33fc6
--- /dev/null
+++ b/docs/risk-model-boundary.md
@@ -0,0 +1,94 @@
+# Risk model boundary
+
+This document defines the SBR-02 boundary for the SBOM risk model: which inputs
+can change risk findings, which inputs are context only, and which conclusions the
+tool must never infer.
+
+The current risk model is a deterministic heuristic layer implemented in
+`tools/sbom-diff-and-risk/src/sbom_diff_risk/risk.py`. It is not a vulnerability
+scanner, malware detector, legal reviewer, or package trust oracle.
+
+## Risk-affecting inputs
+
+Only the following inputs may affect emitted risk buckets.
+
+| Input | Risk effect | Boundary |
+| --- | --- | --- |
+| Diff category: added component | Emits `new_package`. | The component exists in the after input and not in the before input. This is a change signal only. |
+| Diff category: changed component | Enables version, hygiene, and stale-evaluation findings on the after component. | Removed and unchanged components are not evaluated by `evaluate_risks`. |
+| `ComponentChange.before.version` and `ComponentChange.after.version` | Emit `major_upgrade` or `version_change_unclassified`. | `major_upgrade` requires both versions to parse as strict SemVer `x.y.z` and the after major version to be higher. If both versions are present and changed but do not qualify, the finding is `version_change_unclassified`. |
+| `Component.license_id` | Emits `unknown_license`. | Missing, empty, `UNKNOWN`, and `NOASSERTION` are unknown. Other license strings are not interpreted for compliance or risk severity. |
+| `Component.purl` | Participates in `suspicious_source`. | If both `purl` and `source_url` are missing, source provenance is suspicious. If `purl` exists and `source_url` is missing, the source is not suspicious solely for missing `source_url`. |
+| `Component.source_url` | Emits `suspicious_source` when the value is missing with no `purl`, local, non-HTTPS, or otherwise suspicious. | Suspicious examples include `http://`, `git+`, `git://`, `ssh://`, `file://`, relative paths, absolute local paths, missing URL host, IP-address hosts, `localhost`, `localdomain`, and `.local` hosts. |
+| Source allowlist | Participates in `suspicious_source` for single-label hosts. | The allowlist is not a general denylist. In the current implementation, an unallowlisted host is suspicious only when an allowlist is configured and the host has no dot. |
+| `stale_enrichment_enabled` | Controls `not_evaluated` for stale package checks. | When false, the model emits `not_evaluated` instead of guessing staleness. When true, the placeholder finding is suppressed; the current model still does not infer `stale_package`. |
+
+## Context-only inputs
+
+These fields may be useful for display, parsing, reporting, policy evaluation, or
+future enrichment, but they do not currently select a risk bucket in the risk
+model.
+
+| Input | Current role |
+| --- | --- |
+| `Component.name` | Report identity and stable finding ordering. It does not by itself imply risk. |
+| `Component.ecosystem` | Normalized package context used elsewhere in the toolchain. It does not currently change risk buckets. |
+| `Component.supplier` | Context only. The risk model does not infer trust, ownership, or maintainer identity from it. |
+| `Component.bom_ref` | SBOM identity context only. |
+| `Component.raw_type` | Parser/source context only. |
+| `Component.evidence` | Parser evidence context only. |
+| `Component.provenance` | Enrichment evidence for reporting or policy layers only. The risk model does not convert provenance availability, attestation availability, or enrichment errors into risk buckets. |
+| `Component.scorecard` | Scorecard evidence for reporting or policy layers only. The risk model does not convert score, checks, or repository mapping into risk buckets. |
+| `ComponentChange.key` | Finding identity for changed components. It does not decide the bucket. |
+| `ComponentChange.classification` | Diff context only. The version values drive version-related risk findings. |
+| `ReportEnrichmentMetadata` | Report context only. Network flags, candidate counts, and status counts do not change risk buckets. |
+
+Policy evaluation is a separate layer. A policy may warn, fail, or suppress based
+on findings or enrichment evidence, but that does not change what the risk model
+itself is allowed to infer.
+
+## Never infer
+
+The risk model must never infer or imply any of the following unless a future,
+explicitly documented feature adds a dedicated evidence source and tests.
+
+- A package is vulnerable, exploitable, compromised, malicious, or safe.
+- A package has or does not have CVEs, advisories, exploit chains, or reachable
+  vulnerable code.
+- A package is trustworthy because it has a familiar name, domain, supplier,
+  repository, PyPI provenance record, or Scorecard result.
+- Missing metadata, missing provenance, missing attestations, or enrichment
+  errors prove compromise.
+- License compliance, legal acceptability, or redistribution permission beyond
+  the narrow `unknown_license` metadata check.
+- Maintainer identity, project ownership, organization affiliation, or source
+  authenticity from package names, supplier strings, URLs, or repository mapping.
+- Runtime reachability, deployment exposure, production usage, or transitive
+  impact.
+- Package staleness when stale enrichment is disabled. The correct output is
+  `not_evaluated`, not an invented stale or fresh conclusion.
+- Risk severity beyond the emitted bucket name and rationale.
+- Network-derived facts when enrichment has not explicitly performed network
+  access.
+
+## Bucket boundaries
+
+| Bucket | Allowed basis | Not a claim of |
+| --- | --- | --- |
+| `new_package` | Component appears only in the after input. | Vulnerability, maliciousness, or policy failure. |
+| `major_upgrade` | Strict SemVer major version increased. | Breaking change certainty or security risk. |
+| `version_change_unclassified` | Version changed but was not a parseable strict SemVer major upgrade. | Minor risk, safe upgrade, or unknown vulnerability state. |
+| `unknown_license` | License metadata is missing, empty, `UNKNOWN`, or `NOASSERTION`. | Legal non-compliance or prohibited redistribution. |
+| `suspicious_source` | Source provenance is missing or uses a suspicious scheme, host, or local path pattern. | Malware, compromise, or unsafe package content. |
+| `not_evaluated` | A check was intentionally not answered, currently stale-package evaluation in offline mode. | Safe, unsafe, stale, or fresh. |
+| `stale_package` | Reserved for future explicit stale-package enrichment. | Must not be emitted from missing data or guesswork. |
+
+## Maintenance checklist
+
+Update this document and the focused risk tests when any of these change:
+
+- `tools/sbom-diff-and-risk/src/sbom_diff_risk/risk.py`
+- `tools/sbom-diff-and-risk/src/sbom_diff_risk/models.py`
+- parser normalization that changes the populated `Component` fields
+- enrichment behavior that becomes a direct risk-model input
+- policy behavior that might be confused with risk-model bucket generation