From dd2c92cf4bad03d0d7840a13d88cb812ece8fd36 Mon Sep 17 00:00:00 2001
From: stacknil <stacknil@proton.me>
Date: Sun, 21 Jun 2026 01:35:55 +0800
Subject: [PATCH] docs(performance): add local envelope benchmark

---
 README.md                    |  2 +-
 docs/performance-envelope.md | 96 ++++++++++++++++++++++++++++++++++++
 docs/reviewer-path.md        |  1 +
 3 files changed, 98 insertions(+), 1 deletion(-)
 create mode 100644 docs/performance-envelope.md
diff --git a/README.md b/README.md
index a18025e..3c9dbb9 100644
--- a/README.md
+++ b/README.md
@@ -11,7 +11,7 @@ It parses `auth.log` / `secure`-style syslog input and `journalctl --output=shor
 
 LogLens is an MVP / early release. The repository is stable enough for public review, local experimentation, and extension, but the parser and detection coverage are intentionally narrow.
 
-Reviewing the project quickly? Start with [`docs/reviewer-path.md`](./docs/reviewer-path.md) and [`docs/reviewer-brief.md`](./docs/reviewer-brief.md). For detection reasoning, read the forensic-style [`Linux auth brute-force case study`](./docs/case-study-linux-auth-bruteforce.md) and the [`rule catalog`](./docs/rule-catalog.md).
+Reviewing the project quickly? Start with [`docs/reviewer-path.md`](./docs/reviewer-path.md) and [`docs/reviewer-brief.md`](./docs/reviewer-brief.md). For detection reasoning, read the forensic-style [`Linux auth brute-force case study`](./docs/case-study-linux-auth-bruteforce.md) and the [`rule catalog`](./docs/rule-catalog.md). For local scale expectations, see the [`performance envelope`](./docs/performance-envelope.md).
 
 ## Why This Project Exists
 
diff --git a/docs/performance-envelope.md b/docs/performance-envelope.md
new file mode 100644
index 0000000..fe3424a
--- /dev/null
+++ b/docs/performance-envelope.md
@@ -0,0 +1,96 @@
+# Performance Envelope
+
+This document records a local performance envelope for LogLens. It is a
+reviewer aid, not a throughput guarantee or service-level objective.
+
+The benchmark measures the offline CLI path:
+
+- parse sanitized `syslog_legacy` input
+- normalize events and parser warnings
+- run the default detector configuration
+- write `report.md` and `report.json`
+
+CSV export was not enabled.
+
+## Benchmark Platform
+
+| Field | Value |
+| --- | --- |
+| Date | 2026-06-21 |
+| OS | Microsoft Windows 11, version `10.0.26200`, build `26200` |
+| CPU | AMD Ryzen 9 7940HX with Radeon Graphics |
+| Logical processors | 32 |
+| RAM | 31.2 GB |
+| Shell | PowerShell 7.5.5 |
+| Build | CMake Release build |
+| Executable | `build\Release\loglens.exe` |
+
+## Workload Shape
+
+The input corpus was generated locally under `build/performance-envelope/`.
+Generated files are not committed.
+
+The synthetic input uses sanitized syslog-style records only:
+
+- `bench-host-*` hostnames
+- documentation-range `203.0.113.x` source IPs
+- synthetic `userNNN` usernames
+- timestamps one second apart, starting at `2026-03-10 00:00:00`
+- an eight-line cycle of SSH failure, SSH success, sudo, PAM auth failure,
+  unsupported SSH preauth close, unsupported SSH timeout, session-opened, and
+  `su` failure evidence
+
+The resulting report shape is intentionally mixed:
+
+- 75% parsed lines
+- 25% parser warnings
+- stable parser warning buckets for unsupported SSH preauth and timeout lines
+- 50 top-level findings in each measured size
+
+This shape exercises parser coverage telemetry and report writing without using
+real authentication data.
+
+## Method
+
+Command shape:
+
+```powershell
+build\Release\loglens.exe --mode syslog --year 2026 <input.log> <output-dir>
+```
+
+For each line count:
+
+- one warmup run was excluded from the table
+- five measured runs were recorded
+- elapsed time is wall-clock process time
+- peak memory is the maximum observed process working set sampled by the
+  benchmark harness
+- input generation time is excluded
+
+## Results
+
+| Input lines | Parsed lines | Parser warnings | Findings | Median elapsed | Elapsed range | Peak working set |
+| ---: | ---: | ---: | ---: | ---: | ---: | ---: |
+| 1,000 | 750 | 250 | 50 | 44.66 ms | 44.47-64.96 ms | 3.10 MB |
+| 10,000 | 7,500 | 2,500 | 50 | 104.01 ms | 91.36-107.15 ms | 13.82 MB |
+| 100,000 | 75,000 | 25,000 | 50 | 635.69 ms | 588.39-796.45 ms | 99.77 MB |
+
+## Interpretation
+
+The measured envelope is comfortably interactive for 100k-line local review on
+this machine. The largest run completed in less than one second and stayed under
+100 MB peak working set.
+
+The numbers should be read as a regression reference for this input shape. They
+are not a claim about all Linux authentication logs. Runtime and memory can
+change with:
+
+- larger finding evidence windows
+- substantially different unsupported-line ratios
+- CSV export
+- slower storage
+- debug builds
+- background load on the host
+
+Parser observability remains part of the measured path: unsupported lines are
+reported as warnings and telemetry rather than being silently dropped.
diff --git a/docs/reviewer-path.md b/docs/reviewer-path.md
index 1a15441..38e465d 100644
--- a/docs/reviewer-path.md
+++ b/docs/reviewer-path.md
@@ -12,6 +12,7 @@ This path is for reviewers who want to understand LogLens quickly without readin
 | How do rules use evidence? | [`docs/rule-catalog.md`](./rule-catalog.md) | Can explain grouping keys, windows, thresholds, and unsupported-evidence boundaries |
 | Can the parser behavior be trusted? | Parser contract, fixture matrix, and parser coverage fields | Can see known, unknown, and malformed line handling |
 | How should a finding be interpreted? | [`docs/case-study-linux-auth-bruteforce.md`](./case-study-linux-auth-bruteforce.md) | Can trace raw evidence to normalized events, findings, warnings, and non-goals |
+| How does it behave on larger local inputs? | [`docs/performance-envelope.md`](./performance-envelope.md) | Can state the local 1k/10k/100k-line envelope and its caveats |
 
 ## 30-second orientation