Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,12 +83,17 @@ LogLens also tracks parser coverage telemetry for unsupported or malformed lines
- `parsed_lines`
- `unparsed_lines`
- `parse_success_rate`
- `failure_categories`
- `top_unknown_patterns`

Common unsupported-pattern buckets include `sshd_connection_closed_preauth`,
`sshd_timeout_or_disconnection`, `sshd_negotiation_failure`,
`pam_faillock_account_locked`, and `pam_unix_session_closed`. These buckets keep
non-finding evidence reviewable without counting it as detector evidence.
Failure categories group unsupported lines into reviewer-facing parser boundary
classes: `unknown_timestamp`, `unknown_program`,
`known_program_unknown_message`, `malformed_source_ip`, and
`unsupported_pam_variant`.

For rule-by-rule semantics and signal boundaries, see [`docs/rule-catalog.md`](./docs/rule-catalog.md). For a forensic-style evidence walkthrough, see [`docs/case-study-linux-auth-bruteforce.md`](./docs/case-study-linux-auth-bruteforce.md). For the parser behavior contract, supported modes, and fixture map, see [`docs/parser-contract.md`](./docs/parser-contract.md). For the deliberately noisy parser-coverage sample, see [`docs/parser-coverage-notes.md`](./docs/parser-coverage-notes.md).

Expand Down Expand Up @@ -142,7 +147,7 @@ When you add `--csv`, LogLens also writes:
The CSV schema is intentionally small and stable:

- `findings.csv`: `rule`, `subject_kind`, `subject`, `event_count`, `window_start`, `window_end`, `usernames`, `summary`
- `warnings.csv`: `kind`, `line_number`, `message`
- `warnings.csv`: `kind`, `line_number`, `category`, `message`

Without `--csv`, LogLens does not create, overwrite, or delete any existing CSV files in the output directory.

Expand Down
8 changes: 4 additions & 4 deletions docs/case-study-linux-auth-bruteforce.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,10 +110,10 @@ The sudo finding is adjacent but separate. It is not joined to the SSH failure c

The parser warnings are:

| Line | Unknown-pattern bucket | Evidence interpretation |
| ---: | --- | --- |
| 15 | `sshd_connection_closed_preauth` | preauth connection-close noise was observed but not promoted to a typed event |
| 16 | `sshd_timeout_or_disconnection` | timeout/disconnection noise was observed but not promoted to a typed event |
| Line | Failure category | Unknown-pattern bucket | Evidence interpretation |
| ---: | --- | --- | --- |
| 15 | `known_program_unknown_message` | `sshd_connection_closed_preauth` | preauth connection-close noise was observed but not promoted to a typed event |
| 16 | `known_program_unknown_message` | `sshd_timeout_or_disconnection` | timeout/disconnection noise was observed but not promoted to a typed event |

These warnings are useful because they prevent silent overconfidence. A reviewer can see both the finding-producing evidence and the unsupported surrounding records.

Expand Down
62 changes: 32 additions & 30 deletions docs/parser-conformance-matrix.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ corpus.
The parser contract is intentionally conservative:

- recognized evidence emits a normalized `Event`
- unsupported evidence emits a parser warning and an unknown-pattern bucket
- unsupported evidence emits a parser warning, a failure category, and an unknown-pattern bucket
- unsupported evidence does not become detector input

## Input Format Matrix
Expand Down Expand Up @@ -57,42 +57,43 @@ event type in both formats.
Unsupported buckets are warning labels, not normalized events. The expected
normalized event is always `none`.

| Unsupported evidence | Input formats | Expected unsupported line bucket | Expected normalized event |
| --- | --- | --- | --- |
| `sshd` preauth connection closed or reset, including `Connection closed by ... [preauth]`, `Connection closed by authenticating user ... [preauth]`, and `Connection reset by ... [preauth]` | `syslog_legacy`, `journalctl_short_full` | `sshd_connection_closed_preauth` | none |
| `sshd` timeout, disconnection, or disconnect notice, including `Timeout, client not responding`, `Disconnected from ...`, and `Received disconnect ...` | `syslog_legacy`, `journalctl_short_full` | `sshd_timeout_or_disconnection` | none |
| `sshd` negotiation failure such as `Unable to negotiate with ...` | `syslog_legacy`, `journalctl_short_full` | `sshd_negotiation_failure` | none |
| Other well-formed but unsupported `sshd` messages | `syslog_legacy`, `journalctl_short_full` | `sshd_other` | none |
| `pam_unix(...:session)` session closed | `syslog_legacy`, `journalctl_short_full` | `pam_unix_session_closed` | none |
| Other unsupported `pam_unix(...)` messages | `syslog_legacy`, `journalctl_short_full` | `pam_unix_other` | none |
| `pam_faillock(...:auth)` account temporarily locked | `syslog_legacy`, `journalctl_short_full` | `pam_faillock_account_locked` | none |
| `pam_faillock(...:auth)` successful authentication telemetry | `syslog_legacy`, `journalctl_short_full` | `pam_faillock_authsucc` | none |
| Other unsupported `pam_faillock(...)` messages | `syslog_legacy`, `journalctl_short_full` | `pam_faillock_other` | none |
| `pam_sss(...:auth)` user not known to underlying authentication module | `syslog_legacy`, `journalctl_short_full` | `pam_sss_unknown_user` | none |
| `pam_sss(...:auth)` authentication service cannot retrieve authentication info | `syslog_legacy`, `journalctl_short_full` | `pam_sss_authinfo_unavail` | none |
| Other unsupported `pam_sss(...)` messages | `syslog_legacy`, `journalctl_short_full` | `pam_sss_other` | none |
| Well-formed `sudo` line that is not command, incorrect-password, or policy-denial evidence | `syslog_legacy`, `journalctl_short_full` | `sudo_other` | none |
| Well-formed `su` line that is not recognized as success or failure audit evidence | `syslog_legacy`, `journalctl_short_full` | `su_other` | none |
| Well-formed unsupported program tag | `syslog_legacy`, `journalctl_short_full` | `program_<sanitized_program>` | none |
| Unsupported evidence | Input formats | Failure category | Expected unsupported line bucket | Expected normalized event |
| --- | --- | --- | --- | --- |
| `sshd` preauth connection closed or reset, including `Connection closed by ... [preauth]`, `Connection closed by authenticating user ... [preauth]`, and `Connection reset by ... [preauth]` | `syslog_legacy`, `journalctl_short_full` | `known_program_unknown_message` | `sshd_connection_closed_preauth` | none |
| `sshd` timeout, disconnection, or disconnect notice, including `Timeout, client not responding`, `Disconnected from ...`, and `Received disconnect ...` | `syslog_legacy`, `journalctl_short_full` | `known_program_unknown_message` | `sshd_timeout_or_disconnection` | none |
| `sshd` negotiation failure such as `Unable to negotiate with ...` | `syslog_legacy`, `journalctl_short_full` | `known_program_unknown_message` | `sshd_negotiation_failure` | none |
| Other well-formed but unsupported `sshd` messages | `syslog_legacy`, `journalctl_short_full` | `known_program_unknown_message` | `sshd_other` | none |
| `pam_unix(...:session)` session closed | `syslog_legacy`, `journalctl_short_full` | `unsupported_pam_variant` | `pam_unix_session_closed` | none |
| Other unsupported `pam_unix(...)` messages | `syslog_legacy`, `journalctl_short_full` | `unsupported_pam_variant` | `pam_unix_other` | none |
| `pam_faillock(...:auth)` account temporarily locked | `syslog_legacy`, `journalctl_short_full` | `unsupported_pam_variant` | `pam_faillock_account_locked` | none |
| `pam_faillock(...:auth)` successful authentication telemetry | `syslog_legacy`, `journalctl_short_full` | `unsupported_pam_variant` | `pam_faillock_authsucc` | none |
| Other unsupported `pam_faillock(...)` messages | `syslog_legacy`, `journalctl_short_full` | `unsupported_pam_variant` | `pam_faillock_other` | none |
| `pam_sss(...:auth)` user not known to underlying authentication module | `syslog_legacy`, `journalctl_short_full` | `unsupported_pam_variant` | `pam_sss_unknown_user` | none |
| `pam_sss(...:auth)` authentication service cannot retrieve authentication info | `syslog_legacy`, `journalctl_short_full` | `unsupported_pam_variant` | `pam_sss_authinfo_unavail` | none |
| Other unsupported `pam_sss(...)` messages | `syslog_legacy`, `journalctl_short_full` | `unsupported_pam_variant` | `pam_sss_other` | none |
| Well-formed `sudo` line that is not command, incorrect-password, or policy-denial evidence | `syslog_legacy`, `journalctl_short_full` | `known_program_unknown_message` | `sudo_other` | none |
| Well-formed `su` line that is not recognized as success or failure audit evidence | `syslog_legacy`, `journalctl_short_full` | `known_program_unknown_message` | `su_other` | none |
| Well-formed unsupported program tag | `syslog_legacy`, `journalctl_short_full` | `unknown_program` | `program_<sanitized_program>` | none |

## Header And Structural Warning Matrix

Structural failures do not reach the authentication message classifier. They
still produce parser warnings and unknown-pattern buckets through the same
coverage telemetry path.

| Failure class | Input formats | Expected bucket | Expected normalized event |
| --- | --- | --- | --- |
| Missing syslog assumed year | `syslog_legacy` | `syslog_legacy_mode_requires_assume_year` | none |
| Missing syslog header fields | `syslog_legacy` | `missing_syslog_header_fields` | none |
| Invalid syslog month token | `syslog_legacy` | `invalid_month_token` | none |
| Invalid syslog day token | `syslog_legacy` | `invalid_day_token` | none |
| Invalid time token | `syslog_legacy`, `journalctl_short_full` | `invalid_time_token` | none |
| Invalid calendar date | `syslog_legacy`, `journalctl_short_full` | `invalid_calendar_date` | none |
| Missing journalctl short-full header fields | `journalctl_short_full` | `missing_journalctl_short_full_header_fields` | none |
| Invalid journalctl date token | `journalctl_short_full` | `invalid_journalctl_date_token` | none |
| Invalid journalctl timezone token | `journalctl_short_full` | `invalid_timezone_token` | none |
| Missing program/message delimiter | `syslog_legacy`, `journalctl_short_full` | `missing_program_message_delimiter` | none |
| Failure class | Input formats | Failure category | Expected bucket | Expected normalized event |
| --- | --- | --- | --- | --- |
| Missing syslog assumed year | `syslog_legacy` | `unknown_timestamp` | `syslog_legacy_mode_requires_assume_year` | none |
| Missing syslog header fields | `syslog_legacy` | `unknown_timestamp` | `missing_syslog_header_fields` | none |
| Invalid syslog month token | `syslog_legacy` | `unknown_timestamp` | `invalid_month_token` | none |
| Invalid syslog day token | `syslog_legacy` | `unknown_timestamp` | `invalid_day_token` | none |
| Invalid time token | `syslog_legacy`, `journalctl_short_full` | `unknown_timestamp` | `invalid_time_token` | none |
| Invalid calendar date | `syslog_legacy`, `journalctl_short_full` | `unknown_timestamp` | `invalid_calendar_date` | none |
| Missing journalctl short-full header fields | `journalctl_short_full` | `unknown_timestamp` | `missing_journalctl_short_full_header_fields` | none |
| Invalid journalctl date token | `journalctl_short_full` | `unknown_timestamp` | `invalid_journalctl_date_token` | none |
| Invalid journalctl timezone token | `journalctl_short_full` | `unknown_timestamp` | `invalid_timezone_token` | none |
| Missing program/message delimiter | `syslog_legacy`, `journalctl_short_full` | `unknown_program` | `missing_program_message_delimiter` | none |
| Malformed source IP token | `syslog_legacy`, `journalctl_short_full` | `malformed_source_ip` | `malformed_source_ip` | none |

## Fixture Anchors

Expand All @@ -113,4 +114,5 @@ these places:
- normalized event expectation in `tests/test_parser.cpp`
- supported fixture line under `assets/`
- unsupported warning bucket expectation
- parser failure category expectation
- report-contract fixture if the visible report shape changes
13 changes: 11 additions & 2 deletions docs/parser-contract.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,11 +36,20 @@ Recognized success or audit families include accepted password, accepted publick
| --- | --- | --- |
| Recognized auth line | Emits a typed `Event` with timestamp, hostname, program, optional pid, message, source IP, username, event type, and line number | Can contribute to summaries, reports, and configured detection signals |
| Blank line | Skips the line and increments `skipped_blank_lines` | Does not become a warning or parsed event |
| Malformed header | Emits a parser warning with the original line number and structural reason | Counts toward `unparsed_lines` and `top_unknown_patterns` |
| Well-formed but unsupported auth pattern | Emits a parser warning with an unknown-pattern bucket | Stays visible as telemetry instead of being silently ignored |
| Malformed header | Emits a parser warning with the original line number, structural reason, and `unknown_timestamp` category | Counts toward `unparsed_lines`, `failure_categories`, and `top_unknown_patterns` |
| Well-formed but unsupported auth pattern | Emits a parser warning with a failure category and unknown-pattern bucket | Stays visible as telemetry instead of being silently ignored |

This is the main trust boundary: unsupported input should remain inspectable, even when it does not produce a finding.

Parser failure categories are intentionally coarser than unknown-pattern
buckets:

- `unknown_timestamp`
- `unknown_program`
- `known_program_unknown_message`
- `malformed_source_ip`
- `unsupported_pam_variant`

Stable unsupported-pattern buckets currently exercised by the fixture corpus include
`sshd_connection_closed_preauth`, `sshd_timeout_or_disconnection`,
`sshd_negotiation_failure`, `pam_faillock_account_locked`, and
Expand Down
3 changes: 2 additions & 1 deletion docs/parser-coverage-notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,11 @@ The locked expected coverage summary lives in [`tests/fixtures/parser_matrix/noi
- `parsed_lines`: 8
- `unparsed_lines`: 16
- `parse_success_rate`: 0.3333333333
- `failure_categories`: coarse parser boundary categories for unsupported lines
- `top_unknown_patterns`: the five most common unsupported-pattern buckets

## Reading the numbers

A low parse success rate is not automatically a bug for this fixture. The sample is deliberately noisy, and the useful property is that unsupported evidence remains explainable through `warnings` and `top_unknown_patterns`.
A low parse success rate is not automatically a bug for this fixture. The sample is deliberately noisy, and the useful property is that unsupported evidence remains explainable through `warnings`, `failure_categories`, and `top_unknown_patterns`.

The matrix should stay defensive and public-safe: use documentation IP ranges, synthetic hostnames, and synthetic usernames only.
12 changes: 10 additions & 2 deletions docs/report-artifacts.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ The JSON report keeps parser observability visible next to findings:
- `parser_quality.parsed_lines`
- `parser_quality.unparsed_lines`
- `parser_quality.parse_success_rate`
- `parser_quality.failure_categories`
- `parser_quality.top_unknown_patterns`
- `parsed_event_count`
- `warning_count`
Expand All @@ -41,14 +42,21 @@ Finding objects contain `rule_id`, `rule`, `subject_kind`, `subject`, `grouping_

`evidence_event_ids` are deterministic local event identifiers derived from the source line number, formatted as `line:<number>`. They let reviewers trace a finding back to the normalized input events that satisfied the rule window without implying global event identity.

Warning objects contain the original `line_number` and the parser `reason`.
Warning objects contain the original `line_number`, parser `category`, and parser `reason`.

Parser failure categories are stable reviewer-facing buckets for unsupported
lines: `unknown_timestamp`, `unknown_program`,
`known_program_unknown_message`, `malformed_source_ip`, and
`unsupported_pam_variant`. They complement `top_unknown_patterns`: categories
explain the parser boundary class, while unknown-pattern buckets preserve the
more specific unsupported message shape.

## CSV Contract

The optional CSV exports intentionally stay small:

- `findings.csv`: `rule`, `subject_kind`, `subject`, `event_count`, `window_start`, `window_end`, `usernames`, `summary`
- `warnings.csv`: `kind`, `line_number`, `message`
- `warnings.csv`: `kind`, `line_number`, `category`, `message`

Formula-like CSV text fields are neutralized with a leading single quote so spreadsheet tools treat them as text.

Expand Down
1 change: 1 addition & 0 deletions docs/reviewer-path.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ Look for parser coverage fields:
- `parsed_lines`
- `unparsed_lines`
- `parse_success_rate`
- `failure_categories`
- `top_unknown_patterns`

Good stopping point: the reviewer can explain what LogLens parses, how rules count supported evidence, what the reports contain, and how unsupported lines remain visible without becoming findings.
Expand Down
2 changes: 1 addition & 1 deletion docs/rule-catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ The finding is a triage signal. It is not a compromise verdict, attribution clai

### Why unsupported evidence is not counted

Unsupported lines are parser warnings, not `AuthSignal` records. They may appear in `top_unknown_patterns`, but they do not carry the `counts_as_terminal_auth_failure` flag required by this rule.
Unsupported lines are parser warnings, not `AuthSignal` records. They may appear in `failure_categories` and `top_unknown_patterns`, but they do not carry the `counts_as_terminal_auth_failure` flag required by this rule.

This prevents unsupported preauth noise, malformed lines, and unmodeled auth-family messages from silently increasing brute-force counts.

Expand Down
Loading
Loading