docs: add logo and shields.io badges to README#16
Merged
Conversation
analyze.md is a session-scoped working-notes file (~240 KB of audit/plan material) that does not belong in the public SDK repo. Remove from version control but keep on disk for the author's reference. - git rm --cached analyze.md: drop from index, file stays on disk - add analyze.md to .gitignore so it isn't accidentally re-added - drop the self-referential '.gitignore' entry from .gitignore so future edits don't need 'git add -f'
- Add docs/nullrun-logo.png (NullRun NR logo) and render it centered at the top of README.md via raw.githubusercontent.com - Add shields.io badges in two rows: * Release: PyPI version, Python versions, License, Downloads * Quality/Project: CI, Coverage, Stars, Documentation - All badges use https:// (PyPI readme sanitizer strips http://) - No classifier changes (left as-is per project decision) Co-Authored-By: Claude <noreply@anthropic.com>
8 tasks
maltsev-dev
added a commit
that referenced
this pull request
Jun 19, 2026
Closes the P0/P1/P2/P3 issues from the security review (plan §10/§11.4).
Security / PCI-DSS / GDPR
- P0-1: Mask positional PII in `_enforce_sensitive_tool` by introspecting
the wrapped function's signature and applying `SENSITIVE_ARG_KEYS` to
positional params. Pre-fix, `charge("4111-…-1111", 50)` forwarded the
PAN into `/execute` and the audit log.
- P0-6 / P3-3: `_safe_repr` now redacts BEFORE truncating. The pre-fix
order truncated first, so `details={…}` past position 50 leaked
verbatim. `_safe_repr` is now the single source of truth for the
redact-then-truncate flow.
Cost-audit / reliability
- P0-3: Bounded chunked reads on the sync + async httpx transports
(`MAX_RESPONSE_BYTES`, default 16 MiB, `NULLRUN_MAX_RESPONSE_BYTES`
env override). Above the cap, tracking is skipped and
`_coverage_streaming_skipped` is incremented. Replaces the
`response.read()` / `await response.aread()` unbounded buffer that
held entire LLM streaming bodies in memory.
- P0-4: `_do_flush_locked` re-queue on CB OPEN now drops the NEWEST
non-critical events instead of the oldest. The oldest events
(incident start, billing-period start) are exactly what a billing
investigator needs; losing them silently broke monthly rollups.
Control-plane events (`state_change`, `kill_received`,
`policy_invalidated`, `key_rotated`) are preserved unconditionally
so the dashboard KILL switch lands even under sustained backend
outage.
Identity
- S-8 / P2-4: `agent()` now emits `str(uuid.uuid4())` (with dashes).
Pre-fix the format was `f"agent-{uuid.uuid4().hex}"` — 32 hex chars,
no dashes — and backend UUID-typed columns dropped these to NULL
on insert. User-supplied names are still preserved verbatim.
- §7.2 #16: `workflow()` context manager now resets `span_id` (not
only `workflow_id` / `trace_id`) so nested `with span()` blocks
don't leave the inner span_id visible inside the workflow scope.
Resource leaks
- S-9: `_active_runs` on `NullRunCallback` is now an `OrderedDict`
capped at 4096 with FIFO eviction. Pre-fix the dict grew
unbounded when `on_chain_end` did not fire (some LangChain
versions short-circuit the end hook on chain-body errors).
- S-10: WebSocket reconnect loop is now capped at 10 consecutive
failures, then falls back to HTTP-poll. Pre-fix the loop ran
forever when the backend was permanently down, leaking the
WS thread.
Transport
- §7.2 #6: Separate `hmac_verify_expired_total` counter so SRE can
distinguish clock-skew (NTP drift) from forged packets. Mirrored
in both the HTTP and WebSocket verify paths.
- §7.2 #35: `CircuitBreaker.call` now dispatches the OPEN→HALF_OPEN
jitter through `_maybe_apply_open_jitter_sync` /
`_maybe_apply_open_jitter_async`. Pre-fix the jitter used
`time.sleep` before dispatching to async, which blocked the
caller's event loop on every transition.
- P2-1: `_coverage_seen` now bumps in the httpx path (sync + async).
Pre-fix the counter was only bumped by the `requests` transport,
so the dashboard's coverage view was empty for the dominant
OpenAI / Anthropic / Gemini / Mistral / Cohere traffic.
- P2-3: `is_sensitive_tool` match is case-insensitive. Pre-fix
`"stripe.charge"` did not match `"Stripe.Charge"`, bypassing the
sensitive gate.
Concurrency
- §7.2 #39: New `_tools_lock` guards every mutation of
`_strict_mode_tools` / `_sensitive_tools`. Same lock guards the
coverage-counter bump+prune sequence (§7.2 #33) so two threads
can't both observe the dict at length 4095 and both grow it to
4097 before either prune lands.
- §7.2 #47: New `_langchain_lock` / `_langgraph_lock` guard the
patch sequences end-to-end. Pre-fix two threads racing through
`auto_instrument` could both pass the early `_x_patched` check
and double-wrap `BaseCallbackManager` / `Pregel`.
- §7.2 #33: `_COVERAGE_CAP` (4096) bounds the per-host coverage
dicts.
Webhook delivery
- P3-2: Exponential backoff (0.5s, 1s, 2s, 4s, 8s, 16s, 30s cap)
replaces the previous linear schedule. Linear didn't back off
fast enough under sustained outage — each KILL/PAUSE spawned
its own delivery thread, producing 1000+ spinning threads
hammering the dead endpoint.
WAL crash-recovery
- P1-5b: Atomic WAL writes (tmp + `fsync` + `os.replace`), 64 MiB
rotation with `os.replace(wal, wal.1)`, replay drains both
`wal.1` and `wal`. New `NULLRUN_WAL_PATH` / `NULLRUN_WAL_MAX_BYTES`
env overrides for containers with `readOnlyRootFilesystem: true`.
Tests
8 new regression test files (57 tests total):
test_agent_id_uuid.py, test_args_pii_masked.py,
test_streaming_oom_cap.py, test_lru_active_runs.py,
test_reconnect_cap.py, test_coverage_seen_httpx.py,
test_webhook_backoff.py, test_redact.py
`test_buffer_invariants.py` extended with drop-newest +
critical-event preservation cases. `test_release_polish.py`
updated to pin the 5s cap on both the sync and async jitter
helpers (post §7.2 #35 split).
Full incident write-ups in CHANGELOG.md under the same P0/S/P tags.
maltsev-dev
added a commit
that referenced
this pull request
Jun 19, 2026
* fix: P0 security/stability hardening bundle
Closes the P0/P1/P2/P3 issues from the security review (plan §10/§11.4).
Security / PCI-DSS / GDPR
- P0-1: Mask positional PII in `_enforce_sensitive_tool` by introspecting
the wrapped function's signature and applying `SENSITIVE_ARG_KEYS` to
positional params. Pre-fix, `charge("4111-…-1111", 50)` forwarded the
PAN into `/execute` and the audit log.
- P0-6 / P3-3: `_safe_repr` now redacts BEFORE truncating. The pre-fix
order truncated first, so `details={…}` past position 50 leaked
verbatim. `_safe_repr` is now the single source of truth for the
redact-then-truncate flow.
Cost-audit / reliability
- P0-3: Bounded chunked reads on the sync + async httpx transports
(`MAX_RESPONSE_BYTES`, default 16 MiB, `NULLRUN_MAX_RESPONSE_BYTES`
env override). Above the cap, tracking is skipped and
`_coverage_streaming_skipped` is incremented. Replaces the
`response.read()` / `await response.aread()` unbounded buffer that
held entire LLM streaming bodies in memory.
- P0-4: `_do_flush_locked` re-queue on CB OPEN now drops the NEWEST
non-critical events instead of the oldest. The oldest events
(incident start, billing-period start) are exactly what a billing
investigator needs; losing them silently broke monthly rollups.
Control-plane events (`state_change`, `kill_received`,
`policy_invalidated`, `key_rotated`) are preserved unconditionally
so the dashboard KILL switch lands even under sustained backend
outage.
Identity
- S-8 / P2-4: `agent()` now emits `str(uuid.uuid4())` (with dashes).
Pre-fix the format was `f"agent-{uuid.uuid4().hex}"` — 32 hex chars,
no dashes — and backend UUID-typed columns dropped these to NULL
on insert. User-supplied names are still preserved verbatim.
- §7.2 #16: `workflow()` context manager now resets `span_id` (not
only `workflow_id` / `trace_id`) so nested `with span()` blocks
don't leave the inner span_id visible inside the workflow scope.
Resource leaks
- S-9: `_active_runs` on `NullRunCallback` is now an `OrderedDict`
capped at 4096 with FIFO eviction. Pre-fix the dict grew
unbounded when `on_chain_end` did not fire (some LangChain
versions short-circuit the end hook on chain-body errors).
- S-10: WebSocket reconnect loop is now capped at 10 consecutive
failures, then falls back to HTTP-poll. Pre-fix the loop ran
forever when the backend was permanently down, leaking the
WS thread.
Transport
- §7.2 #6: Separate `hmac_verify_expired_total` counter so SRE can
distinguish clock-skew (NTP drift) from forged packets. Mirrored
in both the HTTP and WebSocket verify paths.
- §7.2 #35: `CircuitBreaker.call` now dispatches the OPEN→HALF_OPEN
jitter through `_maybe_apply_open_jitter_sync` /
`_maybe_apply_open_jitter_async`. Pre-fix the jitter used
`time.sleep` before dispatching to async, which blocked the
caller's event loop on every transition.
- P2-1: `_coverage_seen` now bumps in the httpx path (sync + async).
Pre-fix the counter was only bumped by the `requests` transport,
so the dashboard's coverage view was empty for the dominant
OpenAI / Anthropic / Gemini / Mistral / Cohere traffic.
- P2-3: `is_sensitive_tool` match is case-insensitive. Pre-fix
`"stripe.charge"` did not match `"Stripe.Charge"`, bypassing the
sensitive gate.
Concurrency
- §7.2 #39: New `_tools_lock` guards every mutation of
`_strict_mode_tools` / `_sensitive_tools`. Same lock guards the
coverage-counter bump+prune sequence (§7.2 #33) so two threads
can't both observe the dict at length 4095 and both grow it to
4097 before either prune lands.
- §7.2 #47: New `_langchain_lock` / `_langgraph_lock` guard the
patch sequences end-to-end. Pre-fix two threads racing through
`auto_instrument` could both pass the early `_x_patched` check
and double-wrap `BaseCallbackManager` / `Pregel`.
- §7.2 #33: `_COVERAGE_CAP` (4096) bounds the per-host coverage
dicts.
Webhook delivery
- P3-2: Exponential backoff (0.5s, 1s, 2s, 4s, 8s, 16s, 30s cap)
replaces the previous linear schedule. Linear didn't back off
fast enough under sustained outage — each KILL/PAUSE spawned
its own delivery thread, producing 1000+ spinning threads
hammering the dead endpoint.
WAL crash-recovery
- P1-5b: Atomic WAL writes (tmp + `fsync` + `os.replace`), 64 MiB
rotation with `os.replace(wal, wal.1)`, replay drains both
`wal.1` and `wal`. New `NULLRUN_WAL_PATH` / `NULLRUN_WAL_MAX_BYTES`
env overrides for containers with `readOnlyRootFilesystem: true`.
Tests
8 new regression test files (57 tests total):
test_agent_id_uuid.py, test_args_pii_masked.py,
test_streaming_oom_cap.py, test_lru_active_runs.py,
test_reconnect_cap.py, test_coverage_seen_httpx.py,
test_webhook_backoff.py, test_redact.py
`test_buffer_invariants.py` extended with drop-newest +
critical-event preservation cases. `test_release_polish.py`
updated to pin the 5s cap on both the sync and async jitter
helpers (post §7.2 #35 split).
Full incident write-ups in CHANGELOG.md under the same P0/S/P tags.
* fix: address ruff lint findings from CI
Three CI lint failures on `ruff check src/` — fixes only, no
behavioural changes:
- **B905** (`src/nullrun/decorators.py:162`): `zip(bound_params,
args)` now passes `strict=False` explicitly. Pre-fix the two
iterables can be different lengths — `bound_params` is sliced to
`[: len(args)]` but the function may have fewer positional
parameters than args provided (e.g. *args-style callables), in
which case the trailing loop below handles the excess. `strict=`
was implicit and triggered B905. Now explicit so the intent is
documented in code.
- **I001** (`src/nullrun/instrumentation/auto.py:1146`): the late
`import os as _os` was moved to the top-of-file import block as
`import os` (alphabetical order: hashlib, json, logging, os,
threading). The `_os` alias was only there to avoid shadowing —
there is no top-level `os` in scope, so the plain name is fine.
Call site updated to use `os.environ.get(...)`.
- **S108** (`src/nullrun/transport.py:632`): replaced the
hardcoded `/tmp/nullrun.wal` with
`os.path.join(tempfile.gettempdir(), "nullrun.wal")`. The
hardcoded `/tmp` flagged S108 (insecure / non-portable temp
path) and would have broken the SDK on Windows out of the box.
`gettempdir()` returns the OS-appropriate temp dir
(`/tmp` on Linux, `/var/folders/...` on macOS, `%TEMP%` on
Windows). `NULLRUN_WAL_PATH` env override still wins, so
containers with `readOnlyRootFilesystem: true` are unaffected.
Added `import tempfile` to the top-of-file imports.
Verified:
- `ruff check src/` → All checks passed!
- `mypy src/` → Success: no issues found in 23 source files
- `pytest` → 493 passed, 13 skipped (CI default, no `-W error`)
maltsev-dev
added a commit
that referenced
this pull request
Jun 19, 2026
* fix: P0 security/stability hardening bundle
Closes the P0/P1/P2/P3 issues from the security review (plan §10/§11.4).
Security / PCI-DSS / GDPR
- P0-1: Mask positional PII in `_enforce_sensitive_tool` by introspecting
the wrapped function's signature and applying `SENSITIVE_ARG_KEYS` to
positional params. Pre-fix, `charge("4111-…-1111", 50)` forwarded the
PAN into `/execute` and the audit log.
- P0-6 / P3-3: `_safe_repr` now redacts BEFORE truncating. The pre-fix
order truncated first, so `details={…}` past position 50 leaked
verbatim. `_safe_repr` is now the single source of truth for the
redact-then-truncate flow.
Cost-audit / reliability
- P0-3: Bounded chunked reads on the sync + async httpx transports
(`MAX_RESPONSE_BYTES`, default 16 MiB, `NULLRUN_MAX_RESPONSE_BYTES`
env override). Above the cap, tracking is skipped and
`_coverage_streaming_skipped` is incremented. Replaces the
`response.read()` / `await response.aread()` unbounded buffer that
held entire LLM streaming bodies in memory.
- P0-4: `_do_flush_locked` re-queue on CB OPEN now drops the NEWEST
non-critical events instead of the oldest. The oldest events
(incident start, billing-period start) are exactly what a billing
investigator needs; losing them silently broke monthly rollups.
Control-plane events (`state_change`, `kill_received`,
`policy_invalidated`, `key_rotated`) are preserved unconditionally
so the dashboard KILL switch lands even under sustained backend
outage.
Identity
- S-8 / P2-4: `agent()` now emits `str(uuid.uuid4())` (with dashes).
Pre-fix the format was `f"agent-{uuid.uuid4().hex}"` — 32 hex chars,
no dashes — and backend UUID-typed columns dropped these to NULL
on insert. User-supplied names are still preserved verbatim.
- §7.2 #16: `workflow()` context manager now resets `span_id` (not
only `workflow_id` / `trace_id`) so nested `with span()` blocks
don't leave the inner span_id visible inside the workflow scope.
Resource leaks
- S-9: `_active_runs` on `NullRunCallback` is now an `OrderedDict`
capped at 4096 with FIFO eviction. Pre-fix the dict grew
unbounded when `on_chain_end` did not fire (some LangChain
versions short-circuit the end hook on chain-body errors).
- S-10: WebSocket reconnect loop is now capped at 10 consecutive
failures, then falls back to HTTP-poll. Pre-fix the loop ran
forever when the backend was permanently down, leaking the
WS thread.
Transport
- §7.2 #6: Separate `hmac_verify_expired_total` counter so SRE can
distinguish clock-skew (NTP drift) from forged packets. Mirrored
in both the HTTP and WebSocket verify paths.
- §7.2 #35: `CircuitBreaker.call` now dispatches the OPEN→HALF_OPEN
jitter through `_maybe_apply_open_jitter_sync` /
`_maybe_apply_open_jitter_async`. Pre-fix the jitter used
`time.sleep` before dispatching to async, which blocked the
caller's event loop on every transition.
- P2-1: `_coverage_seen` now bumps in the httpx path (sync + async).
Pre-fix the counter was only bumped by the `requests` transport,
so the dashboard's coverage view was empty for the dominant
OpenAI / Anthropic / Gemini / Mistral / Cohere traffic.
- P2-3: `is_sensitive_tool` match is case-insensitive. Pre-fix
`"stripe.charge"` did not match `"Stripe.Charge"`, bypassing the
sensitive gate.
Concurrency
- §7.2 #39: New `_tools_lock` guards every mutation of
`_strict_mode_tools` / `_sensitive_tools`. Same lock guards the
coverage-counter bump+prune sequence (§7.2 #33) so two threads
can't both observe the dict at length 4095 and both grow it to
4097 before either prune lands.
- §7.2 #47: New `_langchain_lock` / `_langgraph_lock` guard the
patch sequences end-to-end. Pre-fix two threads racing through
`auto_instrument` could both pass the early `_x_patched` check
and double-wrap `BaseCallbackManager` / `Pregel`.
- §7.2 #33: `_COVERAGE_CAP` (4096) bounds the per-host coverage
dicts.
Webhook delivery
- P3-2: Exponential backoff (0.5s, 1s, 2s, 4s, 8s, 16s, 30s cap)
replaces the previous linear schedule. Linear didn't back off
fast enough under sustained outage — each KILL/PAUSE spawned
its own delivery thread, producing 1000+ spinning threads
hammering the dead endpoint.
WAL crash-recovery
- P1-5b: Atomic WAL writes (tmp + `fsync` + `os.replace`), 64 MiB
rotation with `os.replace(wal, wal.1)`, replay drains both
`wal.1` and `wal`. New `NULLRUN_WAL_PATH` / `NULLRUN_WAL_MAX_BYTES`
env overrides for containers with `readOnlyRootFilesystem: true`.
Tests
8 new regression test files (57 tests total):
test_agent_id_uuid.py, test_args_pii_masked.py,
test_streaming_oom_cap.py, test_lru_active_runs.py,
test_reconnect_cap.py, test_coverage_seen_httpx.py,
test_webhook_backoff.py, test_redact.py
`test_buffer_invariants.py` extended with drop-newest +
critical-event preservation cases. `test_release_polish.py`
updated to pin the 5s cap on both the sync and async jitter
helpers (post §7.2 #35 split).
Full incident write-ups in CHANGELOG.md under the same P0/S/P tags.
* fix: address ruff lint findings from CI
Three CI lint failures on `ruff check src/` — fixes only, no
behavioural changes:
- **B905** (`src/nullrun/decorators.py:162`): `zip(bound_params,
args)` now passes `strict=False` explicitly. Pre-fix the two
iterables can be different lengths — `bound_params` is sliced to
`[: len(args)]` but the function may have fewer positional
parameters than args provided (e.g. *args-style callables), in
which case the trailing loop below handles the excess. `strict=`
was implicit and triggered B905. Now explicit so the intent is
documented in code.
- **I001** (`src/nullrun/instrumentation/auto.py:1146`): the late
`import os as _os` was moved to the top-of-file import block as
`import os` (alphabetical order: hashlib, json, logging, os,
threading). The `_os` alias was only there to avoid shadowing —
there is no top-level `os` in scope, so the plain name is fine.
Call site updated to use `os.environ.get(...)`.
- **S108** (`src/nullrun/transport.py:632`): replaced the
hardcoded `/tmp/nullrun.wal` with
`os.path.join(tempfile.gettempdir(), "nullrun.wal")`. The
hardcoded `/tmp` flagged S108 (insecure / non-portable temp
path) and would have broken the SDK on Windows out of the box.
`gettempdir()` returns the OS-appropriate temp dir
(`/tmp` on Linux, `/var/folders/...` on macOS, `%TEMP%` on
Windows). `NULLRUN_WAL_PATH` env override still wins, so
containers with `readOnlyRootFilesystem: true` are unaffected.
Added `import tempfile` to the top-of-file imports.
Verified:
- `ruff check src/` → All checks passed!
- `mypy src/` → Success: no issues found in 23 source files
- `pytest` → 493 passed, 13 skipped (CI default, no `-W error`)
* chore(release): bump to 0.5.2
- Promote [Unreleased] to [0.5.2] — 2026-06-19; merge the two
[Unreleased] sections that had drifted during Sprint 2.5 +
Phase 0 development so release tooling scanning for the
[Unreleased] anchor picks up the complete change set exactly
once.
- Add PEP 561 marker (py.typed) — the package ships inline type
annotations; the marker tells mypy / pyright / pylance to honour
them.
- runtime.py (S-4): case-insensitive state compare in
check_control_plane. Defensive against any backend casing drift
beyond the current PascalCase (handlers.rs:9258). Pinned by
tests/test_state_compare_case_insensitive.py (10 cases covering
PascalCase / UPPERCASE / lowercase / mixed-case).
Working-notes file docs/integration-baseline-2026-06-19.md is
deliberately left untracked, matching the analyze.md pattern from
d74712e.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds the NullRun logo and a set of shields.io badges to the top of README.md so the package page on PyPI renders the same visual identity as the brand.
Changes
docs/nullrun-logo.png— the NullRun NR mark, hosted in-repo so it is reachable from the rendered README on PyPI viaraw.githubusercontent.com(…https… only — PyPI strips http… images).All badges link to the corresponding real page (PyPI, GitHub Actions, Codecov, docs.nullrun.io).
Why
nullout-mcpand other ecosystem packages present themselves).https://-only (PyPI sanitiser requirement).Not changed
pyproject.tomlclassifiers — left as-is per project decision.@protectexample form — kept exactly as in the current README.Verification
twine check dist/*— README.md renders correctly as the package long description.nullrunwill show the logo + badges above the heading.