fix(agent): auto-retry mid-stream + tighten system prompt#13
Merged
Conversation
Bump @matterailab/orbcode from 0.3.3 to 0.3.4 in package.json and fold the AGENTS.md context cap bump (~60 -> ~150 lines, covering project structure, architecture, business-logic mapping, and code patterns/conventions without truncation) into the 0.3.3 changelog entry it shipped under.
…oning phase on first content Two related correctness/resilience fixes in the agent's per-turn streaming pipeline: 1. Transient stream failures are now retried automatically. Connection drops before the first chunk (DNS/socket reset/TLS, plus 5xx, 408, 429) are retried up to 3 times with exponential backoff capped at 8s. Real 4xx client errors are not retried. Retries only apply before any output is produced — once chunks have streamed we can't safely retry without duplicating on-screen content, so the error propagates. A user abort is never retried, and the backoff delay is interruptible so Ctrl+C doesn't get stuck waiting it out. A 'Connection to the model failed (...). Retrying n/3 in Ns…' line is emitted via the system event channel so the user sees progress. 2. The 'Thought for Ns' timer now reflects only the thinking phase. Previously, a single boolean 'hadReasoning' flag was set on the first reasoning delta and only checked after the stream ended, so a reasoning segment followed by text would report the entire reasoning+answer span as thinking time. Reasoning is now modeled as an open/close segment: it opens on the first reasoning delta and closes on the first text delta, tool call, or stream end — matching the on-screen 'Thinking' block behavior and supporting interleaved reasoning/content correctly.
Adds a '/task' slash command that lets the user pull a prior session from the same directory into the current conversation as context. Behavior: - '/task' (no argument) opens a SessionPicker over all sessions for the current cwd except the active one. - On selection, the previous task's user/assistant messages are extracted (user messages unwrapped from <user_query> tags) and wrapped in a <previous_task title='...'> block inside a prompt asking the model to summarize it. The summary is then presented in the current conversation as the reference. - Conversations longer than ~8000 chars are truncated with a marker so the prompt stays well under context limits. - If no previous tasks exist in this directory, a friendly info row is shown instead of opening an empty picker. Implementation: - New 'taskPickerSessions' state in App holds the candidate list when the picker is open; it's added to the existing 'no-modal' guard so other modals (MCP picker, link manager, etc.) don't stack. - 'handleTaskSelect' reuses the existing 'runTurn' path — the prompt is the user message, and the model produces the summary. - SessionPicker gains an optional 'title' prop (default unchanged) so the same component reads correctly for both '/resume' and '/task'.
…lled back
Previously streamWithRetry only retried before the first chunk, because
once any text or reasoning had streamed, re-issuing the request would
duplicate on-screen output. The user-visible effect was that a dropped
connection after partial progress surfaced as a failed step, even
though the model was happy to continue.
This adds an optional onRestart callback to streamWithRetry. When the
caller can cleanly undo the partial output (cleared buffers, reset
accumulators) it returns true and the stream is re-issued. The main
agent loop installs a rollbackForRetry handler that:
- resets assistantText, reasoningOpen, reasoningStart, reasoningDetails
- clears pending tool calls
- emits a new 'stream-reset' event so the UI can drop its partial
streaming/reasoning buffers
- declines the restart if a reasoning row was already committed to
the transcript, since that cannot be undone
The compaction path installs a simpler reset that just clears its
in-memory summary buffer, because compaction only streams text and
commits once at the end.
The UI handler for 'stream-reset' clears textBufferRef, streamingText,
reasoningBufferRef, and streamingReasoning, then resets the busy label
back to 'Working' so the spinner reflects the restarted attempt.
…line Replaces the 'always gather exhaustive context' guidance with a 'gather enough context, then act' principle. The model is now told that a small, localized change typically needs about 3-6 tool calls and that further exploration after the edit point is identified is waste. This also tightens the TODO list rule to multi-step tasks (3+ steps) instead of mandating one for any size of work. The file_edit / multi_file_edit section adds an explicit editing discipline block: copy old_string verbatim from a same-turn read, treat earlier reads as stale after a successful edit, and never guess at a corrected old_string when a multi_file_edit batch fails. The read_file and search_files sections collapse their repetitive parameter tables and examples into a short reference plus a 'Reading Strategy' / 'Search Hygiene' set of rules (read whole regions in one call, budget re-reads, verify the output matches the parameters sent, exclude test/spec/mock paths by default, scope path narrowly). Two new cross-cutting sections are added: 'Verifying tool results and avoiding loops' (check that outputs match the sent parameters, do not repeat an identical failing call) and 'Plan before editing' (write the full change plan once, then execute edits in one batched pass with a single typecheck/build at the end). Also fixes two minor copy issues: 'prefer to let the user to that' typo and a few list-formatting inconsistencies in the TODO list section.
Contributor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two changes shipped on
release/v0.3.4and rolled up into one PR againstmain.1.
fix(agent): allow auto-retry mid-stream when partial output can be rolled backPreviously
streamWithRetryonly retried before the first chunk, because once any text or reasoning had streamed, re-issuing the request would duplicate on-screen output. The user-visible effect was that a dropped connection after partial progress surfaced as a failed step, even though the model was happy to continue.This adds an optional
onRestartcallback tostreamWithRetry. When the caller can cleanly undo the partial output (cleared buffers, reset accumulators) it returnstrueand the stream is re-issued.rollbackForRetry, which resetsassistantText,reasoningOpen,reasoningStart,reasoningDetails, and pending tool calls, then emits a newstream-resetevent. It declines the restart if a reasoning row was already committed to the transcript, since that cannot be undone.stream-resetclearstextBufferRef,streamingText,reasoningBufferRef, andstreamingReasoning, and resets the busy label toWorkingso the spinner reflects the restarted attempt.Files:
src/core/events.ts,src/core/agent.ts,src/ui/App.tsx.2.
refactor(prompts): rewrite system prompt for speed and editing disciplineReplaces the "always gather exhaustive context" guidance with a "gather enough, then act" principle. The model is now told that a small, localized change typically needs about 3-6 tool calls and that further exploration after the edit point is identified is waste. The TODO list rule is scoped to multi-step tasks (3+ steps) instead of being mandatory for any work.
The
file_edit/multi_file_editsection adds an explicit editing-discipline block: copyold_stringverbatim from a same-turn read, treat earlier reads as stale after a successful edit, and never guess at a correctedold_stringwhen amulti_file_editbatch fails.The
read_fileandsearch_filessections collapse their repetitive parameter tables and examples into a short reference plus a "Reading Strategy" / "Search Hygiene" ruleset (read whole regions in one call, budget re-reads, verify the output matches the parameters sent, exclude test/spec/mock paths by default, scopepathnarrowly).Two new cross-cutting sections are added: Verifying tool results and avoiding loops (check that outputs match the sent parameters; do not repeat an identical failing call) and Plan before editing (write the full change plan once, then execute edits in one batched pass with a single typecheck/build at the end).
Also fixes a "prefer to let the user to that" typo and a few list-formatting inconsistencies in the TODO list section.
Files:
src/prompts/system.ts.Test plan