diff --git a/config/navigation.json b/config/navigation.json index d73bf63..06a34ac 100644 --- a/config/navigation.json +++ b/config/navigation.json @@ -113,7 +113,8 @@ { "group": "Security", "pages": [ - "tutorials/unauthorized_iac_changes", + "tutorials/detecting_unexpected_statefile_changes", + "tutorials/detecting_non_terraform_changes", "tutorials/rotating_api_keys" ] }, diff --git a/config/redirects.json b/config/redirects.json index a88b2a0..a04693a 100644 --- a/config/redirects.json +++ b/config/redirects.json @@ -1,4 +1,12 @@ [ + { + "source": "/tutorials/unauthorized_iac_changes", + "destination": "/tutorials/detecting_unexpected_statefile_changes" + }, + { + "source": "/tutorials/terraform_drift_detection", + "destination": "/tutorials/detecting_non_terraform_changes" + }, { "source": "/getting_started/approvals", "destination": "/getting_started/attestations" diff --git a/images/authorized-iac-change.png b/images/authorized-iac-change.png deleted file mode 100644 index aee2786..0000000 Binary files a/images/authorized-iac-change.png and /dev/null differ diff --git a/images/unauthorized-iac-change.png b/images/unauthorized-iac-change.png deleted file mode 100644 index 2cfb4f3..0000000 Binary files a/images/unauthorized-iac-change.png and /dev/null differ diff --git a/tutorials/detecting_non_terraform_changes.mdx b/tutorials/detecting_non_terraform_changes.mdx new file mode 100644 index 0000000..b3757bc --- /dev/null +++ b/tutorials/detecting_non_terraform_changes.mdx @@ -0,0 +1,130 @@ +--- +title: Detecting non-Terraform changes +description: Detect infrastructure changes made outside Terraform — console, API, or CLI edits — with a scheduled plan whose result is attested into a Kosli Environment. +--- + +Terraform drift comes in two distinct types, and each is invisible to a detector built for the other: + +1. **Unexpected statefile changes** — someone runs `terraform apply` outside your pipeline, so the statefile and the world still agree and a plan comes back empty. See [Detecting unexpected statefile changes](/tutorials/detecting_unexpected_statefile_changes). +2. **Non-Terraform changes** — someone edits the world directly via the cloud console, API, or CLI: a hotfix in the console, a partial apply failure, an out-of-band automation. Reality no longer matches the statefile, so a `terraform plan` catches it. This page covers detecting this type. + +Both pages implement Kosli's [Drift Detection](https://sdlc.kosli.com/controls/runtime/drift_detection/) control (SDLC-CTRL-0018), a detective control that mitigates configuration drift risk under our secure SDLC framework. + +## How the detection works + +The detector is a scheduled `terraform plan` against the last-applied git SHA, with the result recorded in a small marker file that Kosli watches for tampering: + +- **At apply time**, the pipeline writes a fresh marker — `drift.plan.json`, stored next to the statefile — recording the applied SHA with `drift: false`, and attests it into your [Kosli Environment](/getting_started/environments): + + ```json + { + "sha": "abc123def456...", + "drift": false + } + ``` + +- **On a schedule**, the detector reads the marker, checks out the recorded SHA, and runs a read-only plan. The cleanest machine-readable signal is the plan exit code: + + ```shell + terraform plan -input=false -lock=false -detailed-exitcode -no-color -out=tfplan + # exit 0 -> no changes (no drift) + # exit 2 -> changes present (DRIFT) + # exit 1 -> error + ``` + + `-lock=false` means the read-only drift plan never contends with a real apply; `-input=false` means it can never hang waiting for a prompt. + +- **When drift is found**, the detector overwrites the marker in S3 with `{sha, drift: }` — fresh, un-attested content. On its next snapshot, the Kosli reporter Lambda sees a marker that no longer matches its attestation, and the Environment reports itself as **non-compliant**. + + + The detector never calls the Kosli API. It just rewrites the marker in S3; the reporter Lambda does the detection on its next snapshot. Detection and evidence stay decoupled — fewer moving parts, one less credential in the detector, and a single place (the Environment) that tells you whether the world still matches what was approved. The Environment's compliance state, backed by attested artifacts linked to the git SHA that produced them, is exactly the kind of evidence an auditor wants for SOC 2 (CC7.2, CC8.1) and NIST SP 800-53 (CM-2, CM-3, SI-7). + + +## Plan against the applied SHA, not against `main` + +This is the single most common false-positive source. If changes are merged to `main` but not yet applied — because the apply is gated behind a manual approval, or batched into a release — then planning against `main` shows a non-empty plan that reflects pending intentional changes, not drift. The marker exists precisely to record the *applied* SHA, and the detector always checks out that commit before planning. + +## Latch, don't spam + +Once drift is flagged, you usually don't want to re-plan and re-alert every cycle until someone acts. The marker doubles as a latch: the detector only plans while `drift` is `false`, and the next successful apply writes a fresh `{sha, drift: false}` marker to reset it. + +## Prerequisites + +- Terraform is applied through CI/CD, not from laptops, as the normal path — with remote, locked state (for example, an S3 backend with the native S3 lockfile or DynamoDB). +- Keyless CI authentication to your cloud (for example, GitHub OIDC) with a dedicated, read-capable role for the detector. The detector never needs apply permissions. +- A [Kosli account and API token](/getting_started/authenticating_to_kosli). +- A Kosli [Environment](/getting_started/environments) for each Terraform environment you want to protect. +- The Kosli reporter Lambda deployed to snapshot the drift marker (and statefile) into that Environment on a schedule. + + + Drift detection on top of an undisciplined apply process produces mostly noise. Fix the pipeline first. + + +## Setting it up with `kosli-dev/tf` + +Everything above is implemented at [github.com/kosli-dev/tf](https://github.com/kosli-dev/tf): a thin Terraform wrapper (`tf`) and a set of reusable GitHub Actions workflows, both open source under the MIT license. Two of the workflows carry this control: + +- **`apply.yml`** — the plan steps plus `tf apply`, then a reset-drift-detection job that writes a fresh `{sha, drift: false}` marker to S3 (the known-good baseline for the next drift run) and attests it, along with the plan, apply log, and statefile, into your Kosli Environment. See [Detecting unexpected statefile changes](/tutorials/detecting_unexpected_statefile_changes) for the caller workflow and flow template — the same apply setup covers both drift types. +- **`detect-drift.yml`** — the detector. Reads the baseline marker, and only if `drift == false` runs a plan against the baseline SHA. A non-empty plan overwrites the marker with `{sha, drift: }`; otherwise it records a no-drift summary. + +A scheduled caller that runs the detector (use a matrix to fan out across environments): + +```yaml +name: Drift +on: + schedule: + - cron: "*/15 * * * *" + workflow_dispatch: + +jobs: + drift: + uses: kosli-dev/tf/.github/workflows/detect-drift.yml@main + permissions: + id-token: write + contents: write + with: + aws_region: eu-west-1 + aws_role_arn: arn:aws:iam::111122223333:role/my-role + environment: production +``` + +## Hardening + +A detector that runs once and alerts once is easy. A detector you can depend on for an audit needs to handle the failure modes below. + + + + This is the most dangerous failure mode. If the scheduled job silently stops running, no new evidence arrives to contradict the last result — so the environment looks green forever, even as drift accumulates. Treating "the dashboard is green" as proof of cleanliness, without also verifying the underlying job is running on schedule, is a misuse of the control. Add a heartbeat or alert on "job has not run in N intervals" for both the detector workflow and the reporter Lambda. + + + + `terraform plan` can only see resources Terraform manages. A resource created entirely outside Terraform — say, an IAM user added by hand in the console with no corresponding Terraform resource — is invisible to this control. Closing that gap is the job of an Infrastructure-as-Code coverage policy (everything in production must be defined as code in the first place); drift detection assumes that policy holds and does not substitute for it. + + + + Worst-case detection latency is the check interval **plus the reporter Lambda's snapshot interval**. A ten-minute check with a five-minute reporter Lambda surfaces drift within fifteen minutes. Set the schedule from each environment's rate-of-change and blast radius rather than using one global value. + + + + Guard against overlapping runs for the same environment with a concurrency group. Scope the detector's cloud role tightly: it needs to read state and plan, plus write the marker file — nothing more. It must never hold apply permissions. + + + +## Implementation checklist + +- [ ] Terraform is applied through CI/CD, with remote, locked state. +- [ ] Each apply writes a fresh `{sha, drift: false}` marker and attests it into a Kosli Environment. +- [ ] A scheduled job plans against the applied SHA — not against `main` — using a read-only, lock-free plan. +- [ ] A non-empty plan overwrites the marker; the result latches until the next apply resets it. +- [ ] The Kosli reporter Lambda snapshots the marker from S3 into the Environment on a schedule. +- [ ] Both the detector workflow and the reporter Lambda are monitored for silent failure. +- [ ] The detector's cloud role can read and plan only — never apply. +- [ ] Cadence and concurrency are tuned per environment. + +## Related + +- [Drift Detection (SDLC-CTRL-0018)](https://sdlc.kosli.com/controls/runtime/drift_detection/) — the control both drift-detection tutorials implement. +- [Detecting unexpected statefile changes](/tutorials/detecting_unexpected_statefile_changes) — the other drift type: out-of-CI applies a plan can *never* catch. +- [`kosli-dev/tf`](https://github.com/kosli-dev/tf) — the reference wrapper and reusable workflows. +- [Environments](/getting_started/environments) — the Kosli primitive that carries the compliance signal. +- [Terraform `-detailed-exitcode`](https://developer.hashicorp.com/terraform/cli/commands/plan#detailed-exitcode) and [remote/locked S3 backends](https://developer.hashicorp.com/terraform/language/backend/s3). diff --git a/tutorials/detecting_unexpected_statefile_changes.mdx b/tutorials/detecting_unexpected_statefile_changes.mdx new file mode 100644 index 0000000..ef6bb5d --- /dev/null +++ b/tutorials/detecting_unexpected_statefile_changes.mdx @@ -0,0 +1,134 @@ +--- +title: Detecting unexpected statefile changes +description: Detect Terraform applies that bypass CI by attesting statefile provenance into a Kosli Environment — the class of drift a scheduled plan can never see. +--- + +Terraform drift comes in two distinct types, and each is invisible to a detector built for the other: + +1. **Unexpected statefile changes** — someone runs `terraform apply` outside your pipeline. A laptop apply updates the statefile and the world *together*, so they still agree and a scheduled `terraform plan` comes back empty. This page covers detecting this type. +2. **Non-Terraform changes** — someone edits the world directly via the cloud console, API, or CLI, so reality no longer matches the statefile. See [Detecting non-Terraform changes](/tutorials/detecting_non_terraform_changes). + +Both pages implement Kosli's [Drift Detection](https://sdlc.kosli.com/controls/runtime/drift_detection/) control (SDLC-CTRL-0018), a detective control that mitigates configuration drift risk under our secure SDLC framework. + +## Why a plan can never catch this + +`terraform plan` compares the statefile to the world. An out-of-CI apply changes both in lockstep, so the comparison stays clean — the plan is structurally blind to it. What *has* changed is the statefile itself: it is now a file your pipeline never produced. Detecting that requires a record of where every statefile came from — a provenance system. + +## How Kosli detects it + +The mechanism is **attestation plus continuous reporting** against a [Kosli Environment](/getting_started/environments): + +- **At apply time**, the pipeline attests the Terraform statefile as an artifact into the Kosli Environment. Attestation fingerprints the file and records that your pipeline produced it, linked to the git SHA — establishing its provenance. +- **Continuously**, a scheduled Kosli reporter Lambda snapshots the live statefile from S3 into the same Environment. The Environment's policy requires every artifact to have known provenance. + +The moment the statefile in S3 no longer matches an attestation — because an apply outside CI rewrote it — the next snapshot sees an unrecognized artifact and the Environment reports itself as **non-compliant**. No scheduled plan is involved; the reporter Lambda detects this entirely on its own. + + + "The plan was clean yesterday" is not evidence that the environment is clean today. A green dashboard can be stale (the job stopped running) or unverifiable (who can prove the statefile wasn't swapped?). Only a **current, verifiable** signal counts as evidence — which is why the signal here is a Kosli Environment's compliance state, backed by attested artifacts. That is exactly the kind of evidence an auditor wants for SOC 2 (CC7.2, CC8.1) and NIST SP 800-53 (CM-2, CM-3, SI-7). + + +## Prerequisites + +- Terraform is applied through CI/CD, not from laptops, as the normal path — with remote, locked state (for example, an S3 backend with the native S3 lockfile or DynamoDB). +- Keyless CI authentication to your cloud (for example, GitHub OIDC). +- A [Kosli account and API token](/getting_started/authenticating_to_kosli). +- A Kosli [Environment](/getting_started/environments) for each Terraform environment you want to protect. +- The Kosli reporter Lambda deployed to snapshot the statefile into that Environment on a schedule. + +## Setting it up with `kosli-dev/tf` + +Everything above is implemented at [github.com/kosli-dev/tf](https://github.com/kosli-dev/tf): a thin Terraform wrapper (`tf`) and a set of reusable GitHub Actions workflows, both open source under the MIT license. You can call the workflows directly, or borrow their shape for your own CI. + +### The wrapper + +`tf` is a drop-in replacement for `terraform` that removes the manual bookkeeping. It selects the correct `-var-file` for your active AWS profile and region, and injects the S3 backend config so you never hand-manage it. The backend is derived deterministically: + +```text +bucket = terraform-state- +key = terraform// # default: main.tfstate +region = encrypt = true # native S3 lockfile by default +``` + +`tf plan` saves a binary plan for later inspection; `tf apply` appends `-auto-approve` (the plan has already been reviewed, and CI has no interactive prompt). Locally you wrap it in your credential helper, for example `aws-vault exec staging -- tf plan`. + +### The apply workflow + +The reusable `apply.yml` workflow runs the plan steps plus `tf apply`, then attests the plan, the apply log, and the statefile into your Kosli Environment. A caller workflow that applies on merge: + +```yaml +name: Apply + +on: + push: + branches: [main] + +jobs: + apply: + uses: kosli-dev/tf/.github/workflows/apply.yml@main + permissions: + id-token: write + contents: write + pull-requests: read # needed by the PR-attestation step + with: + aws_region: eu-west-1 + aws_role_arn: arn:aws:iam::111122223333:role/my-role + environment: production + tf_version: v1.14.6 + kosli_template_file: kosli-apply-template.yml + secrets: + kosli_api_token: ${{ secrets.KOSLI_API_TOKEN }} + kosli_github_token: ${{ secrets.GITHUB_TOKEN }} +``` + +The Kosli [flow template](/template-reference/flow_template) declares every attestation and artifact the workflow emits: + +```yaml +# kosli-apply-template.yml +version: 1 +trail: + attestations: + - name: terraform-plan + type: generic + - name: terraform-apply + type: generic + artifacts: + - name: terraform-state + - name: drift-plan +``` + + + The `drift-plan` artifact belongs to the second drift type — the marker file used by the scheduled plan loop in [Detecting non-Terraform changes](/tutorials/detecting_non_terraform_changes). The same apply workflow attests both, so one setup covers both types. + + +## What a detection looks like + +Someone runs `terraform apply` from a laptop. The statefile in S3 is rewritten with content your pipeline never attested. On its next snapshot the Kosli reporter Lambda finds an artifact with no known provenance, and the Environment turns non-compliant. The Environment's snapshot history shows exactly when the unrecognized statefile appeared and what its fingerprint is — a concrete starting point for the investigation, and a durable record for the audit trail. + +## Hardening + + + + This is the most dangerous failure mode. If the reporter Lambda silently stops running, no new evidence arrives to contradict the last snapshot — so the Environment looks green forever, even as unattested statefiles accumulate. Treating "the dashboard is green" as proof of cleanliness, without also verifying the Lambda is running on schedule, is a misuse of the control. Add a heartbeat or alert on "no snapshot in N intervals". + + + + The reporter Lambda needs read access to the state bucket and the ability to report snapshots to Kosli — nothing more. It must never hold apply permissions. + + + +## Implementation checklist + +- [ ] Terraform is applied through CI/CD, with remote, locked state. +- [ ] Each apply attests the statefile (plus plan and apply log) into a Kosli Environment. +- [ ] The Kosli reporter Lambda snapshots the live statefile from S3 into the Environment on a schedule. +- [ ] The Environment's policy requires known provenance for every artifact. +- [ ] The reporter Lambda is monitored for silent failure (heartbeat / not-run alert). +- [ ] Snapshot cadence is tuned per environment. + +## Related + +- [Drift Detection (SDLC-CTRL-0018)](https://sdlc.kosli.com/controls/runtime/drift_detection/) — the control both drift-detection tutorials implement. +- [Detecting non-Terraform changes](/tutorials/detecting_non_terraform_changes) — the other drift type: console and API edits a plan *can* catch. +- [`kosli-dev/tf`](https://github.com/kosli-dev/tf) — the reference wrapper and reusable workflows. +- [Environments](/getting_started/environments) — the Kosli primitive that carries the compliance signal. +- [Flow template reference](/template-reference/flow_template) — declaring attestations and artifacts. diff --git a/tutorials/unauthorized_iac_changes.md b/tutorials/unauthorized_iac_changes.md deleted file mode 100644 index fceed5b..0000000 --- a/tutorials/unauthorized_iac_changes.md +++ /dev/null @@ -1,158 +0,0 @@ ---- -title: "Detecting unauthorized Terraform changes" -description: "Learn how to use Kosli to detect unauthorized Terraform infrastructure changes — changes made outside your approved CI process." ---- - -By the end of this tutorial, you will have set up Kosli to track authorized Terraform changes and detect when an unauthorized change slips through. - - -This tutorial focuses on detecting changes made by bypassing the approved Terraform process (e.g. a developer running `terraform apply` directly from their machine). Detecting infrastructure drift is a separate concern covered by [Terraform drift detection](https://developer.hashicorp.com/terraform/tutorials/state/resource-drift). - - -## Prerequisites - -* [Install Terraform](https://developer.hashicorp.com/terraform/install). -* [Install Snyk CLI](https://docs.snyk.io/snyk-cli/getting-started-with-the-snyk-cli#install-the-snyk-cli-and-authenticate-your-machine) (optional — needed for the security scan step). -* [Install Kosli CLI](/getting_started/install). -* [Get a Kosli API token](/getting_started/authenticating_to_kosli). - -## Setup - -```shell -export KOSLI_ORG= -export KOSLI_API_TOKEN= -``` - -Clone the tutorial repository: - -```shell -git clone https://github.com/kosli-dev/iac-changes-tutorial.git -cd iac-changes-tutorial -``` - -## Create a Kosli flow - -Create a Kosli flow to represent the approved process for Terraform changes. Using --use-empty-template keeps things simple for this tutorial: - -```shell -kosli create flow tf-tutorial --use-empty-template -``` - -## Make and track an authorized change - - -In production, an authorized change goes through CI. In this tutorial, you run those commands locally to simulate the process. - - -Begin a trail to represent a single authorized change: - -```shell -kosli begin trail authorized-1 --flow=tf-tutorial -``` - -Optionally, scan your Terraform config for security issues and attest the SARIF output to Kosli: - -```shell -snyk iac test main.tf --sarif-file-output=sarif.json -kosli attest snyk --name=security --flow=tf-tutorial --trail=authorized-1 --scan-results=sarif.json -``` - -Create a Terraform plan, save it to a file, and attest it to Kosli: - -```shell -terraform init -terraform plan -out=tf.plan -kosli attest generic --name=tf-plan --flow=tf-tutorial --trail=authorized-1 --attachments=tf.plan -``` - -Apply the plan and attest the resulting state file as an artifact. Kosli calculates a fingerprint from the state file contents — this fingerprint is how it later detects unauthorized changes: - - -This tutorial uses a local state file for simplicity. In production, the state file is typically stored in cloud storage (e.g. AWS S3) and you would download it after the authorized change. Note that `--build-url` and `--commit-url` are set to placeholder URLs here — in CI these are set automatically. - - -```shell -terraform apply -auto-approve tf.plan -kosli attest artifact terraform.tfstate --name=state-file --artifact-type=file --flow=tf-tutorial --trail=authorized-1 \ - --build-url=https://example.com --commit-url=https://example.com --commit=HEAD -``` - -## Monitor the state file - -To detect unauthorized changes, Kosli monitors the state file for changes by tracking it in an environment. Create a `server` environment: - -```shell -kosli create env terraform-state --type=server -``` - -Report the current state file to the environment: - - -In production, configure environment reporting to run periodically or on state file changes. See [reporting AWS environments](/tutorials/report_aws_envs) if you use S3 as your Terraform backend. - - -```shell -kosli snapshot path terraform-state --name=tf-state --path=terraform.tfstate -``` - -Check the latest snapshot: - -```shell -kosli get snapshot terraform-state -``` - -You should see: - -```plaintext -COMMIT ARTIFACT FLOW COMPLIANCE RUNNING_SINCE REPLICAS -d881b2f Name: tf-state tf-tutorial COMPLIANT 28 minutes ago 1 - Fingerprint: a57667a7b921b91d438631afa1a1fe35300b4da909a19d2b61196580f30f1d0c -``` - -The `FLOW` column shows `tf-tutorial` — Kosli has provenance for this change. In the Kosli UI under **Environments > terraform-state**, the artifact shows as compliant. - -Environment shows an authorized change - -## Introduce an unauthorized change - -Simulate an unauthorized change by modifying line 6 of `main.tf` — change `random_pet_result` to `random_pet_name` — then apply directly without going through the approved process: - -```shell -terraform apply --auto-approve -``` - -Report the updated state file to Kosli: - - -In production this step is not needed — environment reporting runs automatically on change or on a schedule. - - -```shell -kosli snapshot path terraform-state --name=tf-state --path=terraform.tfstate -``` - -Check the snapshot again: - -```shell -kosli get snapshot terraform-state -``` - -You should see: - -```plaintext -COMMIT ARTIFACT FLOW COMPLIANCE RUNNING_SINCE REPLICAS -N/A Name: tf-state N/A NON-COMPLIANT 8 minutes ago 1 - Fingerprint: edd93dcde27718ed493222ceb218275655555f3f3bfefa95628c599e678ac325 -``` - -The `FLOW` is now `N/A` — Kosli has no provenance for this state file fingerprint. It was not attested through any known flow, which means the change was unauthorized. The environment page reflects this: - -Environment shows an unauthorized change - -## What you've accomplished - -You have used Kosli to track authorized Terraform changes and detect an unauthorized one. By fingerprinting the Terraform state file and comparing it against attested artifacts, Kosli can tell whether a running infrastructure state came from an approved process or not. - -From here you can: -* Set up alerts and automated responses when unauthorized changes are detected using [Kosli actions](/integrations/actions) -* See how to report S3-backed state files automatically in the [Report AWS environments](/tutorials/report_aws_envs) tutorial