fix: P0 security/stability hardening bundle by maltsev-dev · Pull Request #22 · nullrunio/nullrun-sdk-python

maltsev-dev · 2026-06-19T10:13:54Z

Closes the P0/P1/P2/P3 issues from the security review (plan §10/§11.4).

Security / PCI-DSS / GDPR

P0-1: Mask positional PII in _enforce_sensitive_tool by introspecting the wrapped function's signature and applying SENSITIVE_ARG_KEYS to positional params. Pre-fix, charge("4111-…-1111", 50) forwarded the PAN into /execute and the audit log.
P0-6 / P3-3: _safe_repr now redacts BEFORE truncating. The pre-fix order truncated first, so details={…} past position 50 leaked verbatim. _safe_repr is now the single source of truth for the redact-then-truncate flow.

Cost-audit / reliability

P0-3: Bounded chunked reads on the sync + async httpx transports (MAX_RESPONSE_BYTES, default 16 MiB, NULLRUN_MAX_RESPONSE_BYTES env override). Above the cap, tracking is skipped and _coverage_streaming_skipped is incremented. Replaces the response.read() / await response.aread() unbounded buffer that held entire LLM streaming bodies in memory.
P0-4: _do_flush_locked re-queue on CB OPEN now drops the NEWEST non-critical events instead of the oldest. The oldest events (incident start, billing-period start) are exactly what a billing investigator needs; losing them silently broke monthly rollups. Control-plane events (state_change, kill_received, policy_invalidated, key_rotated) are preserved unconditionally so the dashboard KILL switch lands even under sustained backend outage.

Identity

S-8 / P2-4: agent() now emits str(uuid.uuid4()) (with dashes). Pre-fix the format was f"agent-{uuid.uuid4().hex}" — 32 hex chars, no dashes — and backend UUID-typed columns dropped these to NULL on insert. User-supplied names are still preserved verbatim.
§7.2 docs: add logo and shields.io badges to README #16: workflow() context manager now resets span_id (not only workflow_id / trace_id) so nested with span() blocks don't leave the inner span_id visible inside the workflow scope.

Resource leaks

S-9: _active_runs on NullRunCallback is now an OrderedDict capped at 4096 with FIFO eviction. Pre-fix the dict grew unbounded when on_chain_end did not fire (some LangChain versions short-circuit the end hook on chain-body errors).
S-10: WebSocket reconnect loop is now capped at 10 consecutive failures, then falls back to HTTP-poll. Pre-fix the loop ran forever when the backend was permanently down, leaking the WS thread.

Transport

§7.2 fix(ci): add Callable to typing imports in runtime.py #6: Separate hmac_verify_expired_total counter so SRE can distinguish clock-skew (NTP drift) from forged packets. Mirrored in both the HTTP and WebSocket verify paths.
§7.2 #35: CircuitBreaker.call now dispatches the OPEN→HALF_OPEN jitter through _maybe_apply_open_jitter_sync / _maybe_apply_open_jitter_async. Pre-fix the jitter used time.sleep before dispatching to async, which blocked the caller's event loop on every transition.
P2-1: _coverage_seen now bumps in the httpx path (sync + async). Pre-fix the counter was only bumped by the requests transport, so the dashboard's coverage view was empty for the dominant OpenAI / Anthropic / Gemini / Mistral / Cohere traffic.
P2-3: is_sensitive_tool match is case-insensitive. Pre-fix "stripe.charge" did not match "Stripe.Charge", bypassing the sensitive gate.

Concurrency

§7.2 #39: New _tools_lock guards every mutation of _strict_mode_tools / _sensitive_tools. Same lock guards the coverage-counter bump+prune sequence (§7.2 #33) so two threads can't both observe the dict at length 4095 and both grow it to 4097 before either prune lands.
§7.2 #47: New _langchain_lock / _langgraph_lock guard the patch sequences end-to-end. Pre-fix two threads racing through auto_instrument could both pass the early _x_patched check and double-wrap BaseCallbackManager / Pregel.
§7.2 #33: _COVERAGE_CAP (4096) bounds the per-host coverage dicts.

Webhook delivery

P3-2: Exponential backoff (0.5s, 1s, 2s, 4s, 8s, 16s, 30s cap) replaces the previous linear schedule. Linear didn't back off fast enough under sustained outage — each KILL/PAUSE spawned its own delivery thread, producing 1000+ spinning threads hammering the dead endpoint.

WAL crash-recovery

P1-5b: Atomic WAL writes (tmp + fsync + os.replace), 64 MiB rotation with os.replace(wal, wal.1), replay drains both wal.1 and wal. New NULLRUN_WAL_PATH / NULLRUN_WAL_MAX_BYTES env overrides for containers with readOnlyRootFilesystem: true.

Tests

8 new regression test files (57 tests total):
test_agent_id_uuid.py, test_args_pii_masked.py, test_streaming_oom_cap.py, test_lru_active_runs.py, test_reconnect_cap.py, test_coverage_seen_httpx.py, test_webhook_backoff.py, test_redact.py

test_buffer_invariants.py extended with drop-newest + critical-event preservation cases. test_release_polish.py updated to pin the 5s cap on both the sync and async jitter helpers (post §7.2 #35 split).

Full incident write-ups in CHANGELOG.md under the same P0/S/P tags.

What

Why

How

Test plan

Unit tests pass (per-repo, e.g. cd backend && cargo test, cd frontend && npm test)
Lint passes (per-repo, e.g. cd frontend && npm run lint)
Type-check passes (per-repo, e.g. cd frontend && npm run type-check)
Manually verified in dev / staging

Risk

Checklist

I have read the repo's CONTRIBUTING.md (if present)
My change does not introduce new lint warnings
I have updated the CHANGELOG (if user-visible)
I have considered backwards compatibility

Closes the P0/P1/P2/P3 issues from the security review (plan §10/§11.4). Security / PCI-DSS / GDPR - P0-1: Mask positional PII in `_enforce_sensitive_tool` by introspecting the wrapped function's signature and applying `SENSITIVE_ARG_KEYS` to positional params. Pre-fix, `charge("4111-…-1111", 50)` forwarded the PAN into `/execute` and the audit log. - P0-6 / P3-3: `_safe_repr` now redacts BEFORE truncating. The pre-fix order truncated first, so `details={…}` past position 50 leaked verbatim. `_safe_repr` is now the single source of truth for the redact-then-truncate flow. Cost-audit / reliability - P0-3: Bounded chunked reads on the sync + async httpx transports (`MAX_RESPONSE_BYTES`, default 16 MiB, `NULLRUN_MAX_RESPONSE_BYTES` env override). Above the cap, tracking is skipped and `_coverage_streaming_skipped` is incremented. Replaces the `response.read()` / `await response.aread()` unbounded buffer that held entire LLM streaming bodies in memory. - P0-4: `_do_flush_locked` re-queue on CB OPEN now drops the NEWEST non-critical events instead of the oldest. The oldest events (incident start, billing-period start) are exactly what a billing investigator needs; losing them silently broke monthly rollups. Control-plane events (`state_change`, `kill_received`, `policy_invalidated`, `key_rotated`) are preserved unconditionally so the dashboard KILL switch lands even under sustained backend outage. Identity - S-8 / P2-4: `agent()` now emits `str(uuid.uuid4())` (with dashes). Pre-fix the format was `f"agent-{uuid.uuid4().hex}"` — 32 hex chars, no dashes — and backend UUID-typed columns dropped these to NULL on insert. User-supplied names are still preserved verbatim. - §7.2 #16: `workflow()` context manager now resets `span_id` (not only `workflow_id` / `trace_id`) so nested `with span()` blocks don't leave the inner span_id visible inside the workflow scope. Resource leaks - S-9: `_active_runs` on `NullRunCallback` is now an `OrderedDict` capped at 4096 with FIFO eviction. Pre-fix the dict grew unbounded when `on_chain_end` did not fire (some LangChain versions short-circuit the end hook on chain-body errors). - S-10: WebSocket reconnect loop is now capped at 10 consecutive failures, then falls back to HTTP-poll. Pre-fix the loop ran forever when the backend was permanently down, leaking the WS thread. Transport - §7.2 #6: Separate `hmac_verify_expired_total` counter so SRE can distinguish clock-skew (NTP drift) from forged packets. Mirrored in both the HTTP and WebSocket verify paths. - §7.2 #35: `CircuitBreaker.call` now dispatches the OPEN→HALF_OPEN jitter through `_maybe_apply_open_jitter_sync` / `_maybe_apply_open_jitter_async`. Pre-fix the jitter used `time.sleep` before dispatching to async, which blocked the caller's event loop on every transition. - P2-1: `_coverage_seen` now bumps in the httpx path (sync + async). Pre-fix the counter was only bumped by the `requests` transport, so the dashboard's coverage view was empty for the dominant OpenAI / Anthropic / Gemini / Mistral / Cohere traffic. - P2-3: `is_sensitive_tool` match is case-insensitive. Pre-fix `"stripe.charge"` did not match `"Stripe.Charge"`, bypassing the sensitive gate. Concurrency - §7.2 #39: New `_tools_lock` guards every mutation of `_strict_mode_tools` / `_sensitive_tools`. Same lock guards the coverage-counter bump+prune sequence (§7.2 #33) so two threads can't both observe the dict at length 4095 and both grow it to 4097 before either prune lands. - §7.2 #47: New `_langchain_lock` / `_langgraph_lock` guard the patch sequences end-to-end. Pre-fix two threads racing through `auto_instrument` could both pass the early `_x_patched` check and double-wrap `BaseCallbackManager` / `Pregel`. - §7.2 #33: `_COVERAGE_CAP` (4096) bounds the per-host coverage dicts. Webhook delivery - P3-2: Exponential backoff (0.5s, 1s, 2s, 4s, 8s, 16s, 30s cap) replaces the previous linear schedule. Linear didn't back off fast enough under sustained outage — each KILL/PAUSE spawned its own delivery thread, producing 1000+ spinning threads hammering the dead endpoint. WAL crash-recovery - P1-5b: Atomic WAL writes (tmp + `fsync` + `os.replace`), 64 MiB rotation with `os.replace(wal, wal.1)`, replay drains both `wal.1` and `wal`. New `NULLRUN_WAL_PATH` / `NULLRUN_WAL_MAX_BYTES` env overrides for containers with `readOnlyRootFilesystem: true`. Tests 8 new regression test files (57 tests total): test_agent_id_uuid.py, test_args_pii_masked.py, test_streaming_oom_cap.py, test_lru_active_runs.py, test_reconnect_cap.py, test_coverage_seen_httpx.py, test_webhook_backoff.py, test_redact.py `test_buffer_invariants.py` extended with drop-newest + critical-event preservation cases. `test_release_polish.py` updated to pin the 5s cap on both the sync and async jitter helpers (post §7.2 #35 split). Full incident write-ups in CHANGELOG.md under the same P0/S/P tags.

codecov · 2026-06-19T10:18:02Z

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment

Thanks for integrating Codecov - We've got you covered ☂️

maltsev-dev merged commit 87b1e6a into master Jun 19, 2026
3 of 6 checks passed

maltsev-dev deleted the fix/p0-security-stability-bundle branch June 19, 2026 10:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: P0 security/stability hardening bundle#22

fix: P0 security/stability hardening bundle#22
maltsev-dev merged 1 commit into
masterfrom
fix/p0-security-stability-bundle

maltsev-dev commented Jun 19, 2026

Uh oh!

Uh oh!

codecov Bot commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

maltsev-dev commented Jun 19, 2026

What

Why

How

Test plan

Risk

Checklist

Uh oh!

Uh oh!

codecov Bot commented Jun 19, 2026

Welcome to Codecov 🎉

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant