Skip to content

fix(agent): auto-retry mid-stream + tighten system prompt#13

Merged
code-crusher merged 5 commits into
mainfrom
release/v0.3.4
Jul 2, 2026
Merged

fix(agent): auto-retry mid-stream + tighten system prompt#13
code-crusher merged 5 commits into
mainfrom
release/v0.3.4

Conversation

@code-crusher

Copy link
Copy Markdown
Member

Summary

Two changes shipped on release/v0.3.4 and rolled up into one PR against main.

1. fix(agent): allow auto-retry mid-stream when partial output can be rolled back

Previously streamWithRetry only retried before the first chunk, because once any text or reasoning had streamed, re-issuing the request would duplicate on-screen output. The user-visible effect was that a dropped connection after partial progress surfaced as a failed step, even though the model was happy to continue.

This adds an optional onRestart callback to streamWithRetry. When the caller can cleanly undo the partial output (cleared buffers, reset accumulators) it returns true and the stream is re-issued.

  • The main agent loop installs rollbackForRetry, which resets assistantText, reasoningOpen, reasoningStart, reasoningDetails, and pending tool calls, then emits a new stream-reset event. It declines the restart if a reasoning row was already committed to the transcript, since that cannot be undone.
  • The compaction path installs a simpler reset that just clears its in-memory summary buffer (compaction only streams text and commits once at the end).
  • The UI handler for stream-reset clears textBufferRef, streamingText, reasoningBufferRef, and streamingReasoning, and resets the busy label to Working so the spinner reflects the restarted attempt.

Files: src/core/events.ts, src/core/agent.ts, src/ui/App.tsx.

2. refactor(prompts): rewrite system prompt for speed and editing discipline

Replaces the "always gather exhaustive context" guidance with a "gather enough, then act" principle. The model is now told that a small, localized change typically needs about 3-6 tool calls and that further exploration after the edit point is identified is waste. The TODO list rule is scoped to multi-step tasks (3+ steps) instead of being mandatory for any work.

The file_edit / multi_file_edit section adds an explicit editing-discipline block: copy old_string verbatim from a same-turn read, treat earlier reads as stale after a successful edit, and never guess at a corrected old_string when a multi_file_edit batch fails.

The read_file and search_files sections collapse their repetitive parameter tables and examples into a short reference plus a "Reading Strategy" / "Search Hygiene" ruleset (read whole regions in one call, budget re-reads, verify the output matches the parameters sent, exclude test/spec/mock paths by default, scope path narrowly).

Two new cross-cutting sections are added: Verifying tool results and avoiding loops (check that outputs match the sent parameters; do not repeat an identical failing call) and Plan before editing (write the full change plan once, then execute edits in one batched pass with a single typecheck/build at the end).

Also fixes a "prefer to let the user to that" typo and a few list-formatting inconsistencies in the TODO list section.

Files: src/prompts/system.ts.

Test plan

  • Start a long generation, drop the network mid-stream, confirm the agent auto-retries without duplicating output and finishes the step.
  • Confirm a reasoning row that has already been committed is not silently lost on a mid-stream retry (it should surface as an error rather than a duplicated row).
  • Run the compaction path against a large transcript and confirm a mid-stream drop still produces a single committed summary.
  • Spot-check the system prompt on a 1-line edit (no todo list), a 5-step task (todo list), and a wide refactor (no exhaustive re-search after the edit point is identified).

matterai-app Bot added 5 commits July 1, 2026 17:17
Bump @matterailab/orbcode from 0.3.3 to 0.3.4 in package.json and
fold the AGENTS.md context cap bump (~60 -> ~150 lines, covering
project structure, architecture, business-logic mapping, and code
patterns/conventions without truncation) into the 0.3.3 changelog
entry it shipped under.
…oning phase on first content

Two related correctness/resilience fixes in the agent's per-turn
streaming pipeline:

1. Transient stream failures are now retried automatically. Connection
   drops before the first chunk (DNS/socket reset/TLS, plus 5xx, 408,
   429) are retried up to 3 times with exponential backoff capped at
   8s. Real 4xx client errors are not retried. Retries only apply
   before any output is produced — once chunks have streamed we
   can't safely retry without duplicating on-screen content, so the
   error propagates. A user abort is never retried, and the backoff
   delay is interruptible so Ctrl+C doesn't get stuck waiting it out.
   A 'Connection to the model failed (...). Retrying n/3 in Ns…' line
   is emitted via the system event channel so the user sees progress.

2. The 'Thought for Ns' timer now reflects only the thinking phase.
   Previously, a single boolean 'hadReasoning' flag was set on the
   first reasoning delta and only checked after the stream ended,
   so a reasoning segment followed by text would report the entire
   reasoning+answer span as thinking time. Reasoning is now modeled
   as an open/close segment: it opens on the first reasoning delta
   and closes on the first text delta, tool call, or stream end —
   matching the on-screen 'Thinking' block behavior and supporting
   interleaved reasoning/content correctly.
Adds a '/task' slash command that lets the user pull a prior session
from the same directory into the current conversation as context.

Behavior:
- '/task' (no argument) opens a SessionPicker over all sessions for
  the current cwd except the active one.
- On selection, the previous task's user/assistant messages are
  extracted (user messages unwrapped from <user_query> tags) and
  wrapped in a <previous_task title='...'> block inside a prompt
  asking the model to summarize it. The summary is then presented
  in the current conversation as the reference.
- Conversations longer than ~8000 chars are truncated with a
  marker so the prompt stays well under context limits.
- If no previous tasks exist in this directory, a friendly info
  row is shown instead of opening an empty picker.

Implementation:
- New 'taskPickerSessions' state in App holds the candidate list
  when the picker is open; it's added to the existing 'no-modal'
  guard so other modals (MCP picker, link manager, etc.) don't
  stack.
- 'handleTaskSelect' reuses the existing 'runTurn' path — the
  prompt is the user message, and the model produces the summary.
- SessionPicker gains an optional 'title' prop (default unchanged)
  so the same component reads correctly for both '/resume' and
  '/task'.
…lled back

Previously streamWithRetry only retried before the first chunk, because
once any text or reasoning had streamed, re-issuing the request would
duplicate on-screen output. The user-visible effect was that a dropped
connection after partial progress surfaced as a failed step, even
though the model was happy to continue.

This adds an optional onRestart callback to streamWithRetry. When the
caller can cleanly undo the partial output (cleared buffers, reset
accumulators) it returns true and the stream is re-issued. The main
agent loop installs a rollbackForRetry handler that:

  - resets assistantText, reasoningOpen, reasoningStart, reasoningDetails
  - clears pending tool calls
  - emits a new 'stream-reset' event so the UI can drop its partial
    streaming/reasoning buffers
  - declines the restart if a reasoning row was already committed to
    the transcript, since that cannot be undone

The compaction path installs a simpler reset that just clears its
in-memory summary buffer, because compaction only streams text and
commits once at the end.

The UI handler for 'stream-reset' clears textBufferRef, streamingText,
reasoningBufferRef, and streamingReasoning, then resets the busy label
back to 'Working' so the spinner reflects the restarted attempt.
…line

Replaces the 'always gather exhaustive context' guidance with a
'gather enough context, then act' principle. The model is now told
that a small, localized change typically needs about 3-6 tool calls
and that further exploration after the edit point is identified is
waste. This also tightens the TODO list rule to multi-step tasks
(3+ steps) instead of mandating one for any size of work.

The file_edit / multi_file_edit section adds an explicit editing
discipline block: copy old_string verbatim from a same-turn read,
treat earlier reads as stale after a successful edit, and never guess
at a corrected old_string when a multi_file_edit batch fails.

The read_file and search_files sections collapse their repetitive
parameter tables and examples into a short reference plus a 'Reading
Strategy' / 'Search Hygiene' set of rules (read whole regions in one
call, budget re-reads, verify the output matches the parameters sent,
exclude test/spec/mock paths by default, scope path narrowly).

Two new cross-cutting sections are added: 'Verifying tool results
and avoiding loops' (check that outputs match the sent parameters,
do not repeat an identical failing call) and 'Plan before editing'
(write the full change plan once, then execute edits in one batched
pass with a single typecheck/build at the end).

Also fixes two minor copy issues: 'prefer to let the user to that'
typo and a few list-formatting inconsistencies in the TODO list
section.
@matterai-app

matterai-app Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Summary By MatterAI MatterAI logo

🔄 What Changed

Implemented automatic retry logic for transient LLM streaming failures with exponential backoff and state rollback. Introduced a new /task slash command to reference and summarize previous conversations. Significantly refactored and tightened the system prompt to improve context gathering and editing discipline.

🔍 Impact of the Change

Improves agent resilience against network instability and API timeouts. Enhances user experience by allowing seamless context injection from past tasks. Reduces token usage and improves instruction following through a more concise system prompt.

📁 Total Files Changed

Click to Expand
File ChangeLog
Version Bump package.json Incremented version to 0.3.4.
Retry Logic src/core/agent.ts Added streamWithRetry with backoff and state rollback safeguards.
Event Type src/core/events.ts Added stream-reset event to handle UI buffer clearing during retries.
Prompt Refactor src/prompts/system.ts Tightened instructions, emphasizing context gathering and verbatim editing.
Task Command src/ui/App.tsx Implemented /task command and logic to summarize previous sessions.
UI Component src/ui/components/SessionPicker.tsx Added customizable title prop to the session selection component.

🧪 Test Added/Recommended

Recommended

  • Unit tests for isRetryableStreamError to verify status code filtering.
  • Integration tests for streamWithRetry to ensure state rollbacks (text/reasoning) function correctly.
  • Mock tests for the /task command to verify conversation truncation logic.

🔒 Security Vulnerabilities

No critical security vulnerabilities detected. Input validation for the /task command is handled via internal session listing.

@code-crusher code-crusher merged commit f0b29b4 into main Jul 2, 2026
1 check was pending
@code-crusher code-crusher deleted the release/v0.3.4 branch July 2, 2026 09:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant