From dd2c92cf4bad03d0d7840a13d88cb812ece8fd36 Mon Sep 17 00:00:00 2001 From: stacknil Date: Sun, 21 Jun 2026 01:35:55 +0800 Subject: [PATCH] docs(performance): add local envelope benchmark --- README.md | 2 +- docs/performance-envelope.md | 96 ++++++++++++++++++++++++++++++++++++ docs/reviewer-path.md | 1 + 3 files changed, 98 insertions(+), 1 deletion(-) create mode 100644 docs/performance-envelope.md diff --git a/README.md b/README.md index a18025e..3c9dbb9 100644 --- a/README.md +++ b/README.md @@ -11,7 +11,7 @@ It parses `auth.log` / `secure`-style syslog input and `journalctl --output=shor LogLens is an MVP / early release. The repository is stable enough for public review, local experimentation, and extension, but the parser and detection coverage are intentionally narrow. -Reviewing the project quickly? Start with [`docs/reviewer-path.md`](./docs/reviewer-path.md) and [`docs/reviewer-brief.md`](./docs/reviewer-brief.md). For detection reasoning, read the forensic-style [`Linux auth brute-force case study`](./docs/case-study-linux-auth-bruteforce.md) and the [`rule catalog`](./docs/rule-catalog.md). +Reviewing the project quickly? Start with [`docs/reviewer-path.md`](./docs/reviewer-path.md) and [`docs/reviewer-brief.md`](./docs/reviewer-brief.md). For detection reasoning, read the forensic-style [`Linux auth brute-force case study`](./docs/case-study-linux-auth-bruteforce.md) and the [`rule catalog`](./docs/rule-catalog.md). For local scale expectations, see the [`performance envelope`](./docs/performance-envelope.md). ## Why This Project Exists diff --git a/docs/performance-envelope.md b/docs/performance-envelope.md new file mode 100644 index 0000000..fe3424a --- /dev/null +++ b/docs/performance-envelope.md @@ -0,0 +1,96 @@ +# Performance Envelope + +This document records a local performance envelope for LogLens. It is a +reviewer aid, not a throughput guarantee or service-level objective. + +The benchmark measures the offline CLI path: + +- parse sanitized `syslog_legacy` input +- normalize events and parser warnings +- run the default detector configuration +- write `report.md` and `report.json` + +CSV export was not enabled. + +## Benchmark Platform + +| Field | Value | +| --- | --- | +| Date | 2026-06-21 | +| OS | Microsoft Windows 11, version `10.0.26200`, build `26200` | +| CPU | AMD Ryzen 9 7940HX with Radeon Graphics | +| Logical processors | 32 | +| RAM | 31.2 GB | +| Shell | PowerShell 7.5.5 | +| Build | CMake Release build | +| Executable | `build\Release\loglens.exe` | + +## Workload Shape + +The input corpus was generated locally under `build/performance-envelope/`. +Generated files are not committed. + +The synthetic input uses sanitized syslog-style records only: + +- `bench-host-*` hostnames +- documentation-range `203.0.113.x` source IPs +- synthetic `userNNN` usernames +- timestamps one second apart, starting at `2026-03-10 00:00:00` +- an eight-line cycle of SSH failure, SSH success, sudo, PAM auth failure, + unsupported SSH preauth close, unsupported SSH timeout, session-opened, and + `su` failure evidence + +The resulting report shape is intentionally mixed: + +- 75% parsed lines +- 25% parser warnings +- stable parser warning buckets for unsupported SSH preauth and timeout lines +- 50 top-level findings in each measured size + +This shape exercises parser coverage telemetry and report writing without using +real authentication data. + +## Method + +Command shape: + +```powershell +build\Release\loglens.exe --mode syslog --year 2026 +``` + +For each line count: + +- one warmup run was excluded from the table +- five measured runs were recorded +- elapsed time is wall-clock process time +- peak memory is the maximum observed process working set sampled by the + benchmark harness +- input generation time is excluded + +## Results + +| Input lines | Parsed lines | Parser warnings | Findings | Median elapsed | Elapsed range | Peak working set | +| ---: | ---: | ---: | ---: | ---: | ---: | ---: | +| 1,000 | 750 | 250 | 50 | 44.66 ms | 44.47-64.96 ms | 3.10 MB | +| 10,000 | 7,500 | 2,500 | 50 | 104.01 ms | 91.36-107.15 ms | 13.82 MB | +| 100,000 | 75,000 | 25,000 | 50 | 635.69 ms | 588.39-796.45 ms | 99.77 MB | + +## Interpretation + +The measured envelope is comfortably interactive for 100k-line local review on +this machine. The largest run completed in less than one second and stayed under +100 MB peak working set. + +The numbers should be read as a regression reference for this input shape. They +are not a claim about all Linux authentication logs. Runtime and memory can +change with: + +- larger finding evidence windows +- substantially different unsupported-line ratios +- CSV export +- slower storage +- debug builds +- background load on the host + +Parser observability remains part of the measured path: unsupported lines are +reported as warnings and telemetry rather than being silently dropped. diff --git a/docs/reviewer-path.md b/docs/reviewer-path.md index 1a15441..38e465d 100644 --- a/docs/reviewer-path.md +++ b/docs/reviewer-path.md @@ -12,6 +12,7 @@ This path is for reviewers who want to understand LogLens quickly without readin | How do rules use evidence? | [`docs/rule-catalog.md`](./rule-catalog.md) | Can explain grouping keys, windows, thresholds, and unsupported-evidence boundaries | | Can the parser behavior be trusted? | Parser contract, fixture matrix, and parser coverage fields | Can see known, unknown, and malformed line handling | | How should a finding be interpreted? | [`docs/case-study-linux-auth-bruteforce.md`](./case-study-linux-auth-bruteforce.md) | Can trace raw evidence to normalized events, findings, warnings, and non-goals | +| How does it behave on larger local inputs? | [`docs/performance-envelope.md`](./performance-envelope.md) | Can state the local 1k/10k/100k-line envelope and its caveats | ## 30-second orientation