Support codex as an architect (PIR #929)#1059
Conversation
…dex/gemini architects Fixes a latent crash-loop where a non-Claude architect/builder + a stale Claude .jsonl built an invalid `<cmd> --resume <claude-uuid>` invocation and shellper restart-looped to death. Session resume is now gated on the configured HarnessProvider: - harness.ts: add optional buildResume (bundled discovery + Node-argv + shell-escaped script fragment) and getArchitectFiles. Only CLAUDE_HARNESS implements buildResume; GEMINI_HARNESS writes .gemini/settings.json → AGENTS.md so gemini launches with project context. - tower-instances.ts: architect resume gated on getArchitectHarness().buildResume, preserving the safeToResume sibling guard; getArchitectFiles written if-absent. - spawn.ts / spawn-worktree.ts: discoverResumeSession takes the builder harness and returns the bundled resume object; the launch script consumes the pre-escaped scriptFragment instead of re-deriving the flag. - doctor.ts / arch.md: affirm codex/gemini architect support; document that conversation resume is Claude-main-only and selection is config-driven. Tests: codex/gemini return no --resume with a stale jsonl (architect + builder); claude still resumes; gemini settings.json write-if-missing + no-clobber. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…are harness + centralized context files BLOCKER B: getArchitectHarness/getBuilderHarness now auto-detect the harness from the override-aware resolved command (getResolvedCommands honoring TOWER_ARCHITECT_CMD / --architect-cmd / --builder-cmd), so a non-claude command override no longer resolves the claude harness and re-arms the resume crash-loop. BLOCKER A: centralize getArchitectFiles writing into the shared buildArchitectArgs (exported writeArchitectContextFiles) and route the no-Tower architect command through it, so sibling / no-Tower / reconnect gemini launches all get .gemini/settings.json — not just first-activation. Nits: gate the resume-skip WARN on buildResume support; distinguish "harness does not support resume" in discoverResumeSession; gitignore .gemini/settings.json (repo + managed adopter list). Adds config.test.ts (override-aware resolution) and tower-utils.test.ts (writeArchitectContextFiles) regression coverage. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…tmatter rebuttal Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
cluesmith#1059, no self-merge) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Architect Integration RE-REVIEW — APPROVE (with one documented caveat)3-way CMAP re-run on the post-rework diff ( Verdicts: gemini Verified fixed
codex's finding — verified, dispositioned as documented + follow-up (not blocking)
Tracked for hardening in #1062; an in-repo known-limitation note is being added to this PR's docs. Net: ready to merge. (Contribution PR — merge is a cluesmith maintainer's action.) Architect integration re-review |
…ult to claude harness (cluesmith#1062) 3-way re-review APPROVE-with-caveat. Codex's verified finding: resolveHarness defaults an unrecognized override command (e.g. TOWER_ARCHITECT_CMD=bash) to CLAUDE_HARNESS when no explicit shell.architectHarness/builderHarness is set, so it can still attempt resume against a stale claude jsonl. Pre-existing and narrow (not a cluesmith#929 regression). Documented in arch.md + review; follow-up cluesmith#1062. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…self-merged (awaiting maintainer on cluesmith#1059) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…#1063, gemini builder-only) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…only) Delete the getArchitectFiles HarnessProvider method (gemini was its only implementer) and writeArchitectContextFiles (its only consumer), along with their wiring in buildArchitectArgs. claude/codex read AGENTS.md natively, so no architect context-file seam is needed. The buildResume crash-loop seam and codex architect parity are untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rop .gemini/settings.json gitignore doctor now affirms codex and warns when gemini is configured as an architect (the Gemini CLI is retiring, cluesmith#778 — gemini stays a builder harness). Removes the now-unwritten .gemini/settings.json entry from the root .gitignore and the managed CODEV_GITIGNORE_ENTRIES backfill. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…op gemini-architect cases) Remove the writeArchitectContextFiles and gemini .gemini/settings.json architect tests, retarget the TOWER_ARCHITECT_CMD env-override resolution test to codex, and drop .gemini/settings.json from the gitignore-backfill expectations. Keep gemini *builder* tests untouched. arch.md cluesmith#929 subsection now documents claude+codex architects, gemini builder-only, and the removed seam. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ody) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tions (2 APPROVE, gemini agy unavailable) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…pstream maintainer merges cluesmith#1059) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Architect Integration ReviewScope (final, codex-only): brings What lands:
Independent verification (architect):
3-way CMAP (whole-PR, full-file, advisory single pass):
Deferred non-blocking nit: the Verdict: APPROVE for merge. Low-risk subtractive change with strong coverage. Architect integration review |
…self-merged — maintainer merges cluesmith#1059) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PIR Review: Support
codexas an architect (codex-only)Fixes #929
Summary
Brings the OpenAI
codexCLI to parity withclaudeas a Codev architect, selectable via.codev/config.json(shell.architect/shell.architectHarness). The core fix routes session-discovery +--resumeargument construction behind a new optionalHarnessProvider.buildResumecapability, eliminating a latent crash-loop where a non-Claude architect (or resumed builder) with any stale Claude.jsonlbuilt an invalid<cmd> --resume <claude-uuid>invocation and shellper restart-looped to death. Gemini is builder-only — the Gemini CLI is retiring (#778), so the originally-scoped gemini-architect support was removed; gemini'sGEMINI_SYSTEM_MDbuilder surface is untouched. (agy, the gemini successor, is deferred as an architect to follow-up #1063 — its only role-injection channel is a visible first user turn.)Scope note (codex-only rescope, 2026-06-19)
This branch was originally scoped as codex + gemini architects (the earlier #1059 implementation). It was rescoped mid-review to codex-only: the gemini-architect additions (the
getArchitectFilescontext-file seam,writeArchitectContextFiles, the doctor affirmation, the.gemini/settings.jsongitignore entry, and gemini-architect tests) were reverted in a surgical, subtractive pass. The engine-neutralbuildResumecrash-loop fix and codex architect parity — the durable deliverables — are fully intact. The net diff vsmaintherefore reflects codex architect parity + the crash-loop fix, with no gemini-architect surface.Files Changed
Net vs
main(git diff --stat <merge-base>):packages/codev/src/agent-farm/utils/harness.ts(+27) —buildResume?on theHarnessProviderinterface;CLAUDE_HARNESS.buildResume(delegates tofindLatestSessionId, returns bundled{ sessionId, args, scriptFragment }); codex/gemini/opencode leave it undefined → fresh launchpackages/codev/src/agent-farm/utils/config.ts(+16) —getArchitectHarness/getBuilderHarnessresolve from the override-aware command (getResolvedCommands);getResolvedCommands.architecthonorsTOWER_ARCHITECT_CMDpackages/codev/src/agent-farm/servers/tower-instances.ts(+35/-…) — architect resume gated ongetArchitectHarness(...).buildResume?.(), composed with the pre-existingsafeToResumesibling-collision guard; WARN gated on resume supportpackages/codev/src/agent-farm/servers/tower-utils.ts(+3/-1) —buildArchitectArgsinjects the architect role via the shared helper (the gemini-onlywriteArchitectContextFileswas added then removed in the rescope)packages/codev/src/agent-farm/commands/spawn.ts(+46/-…) —discoverResumeSessiontakes the builder harness, returns the bundled resume object; both call sites passgetBuilderHarness(...); distinct "harness does not support resume" logpackages/codev/src/agent-farm/commands/spawn-worktree.ts(+25/-…) —startBuilderSession'sresumeSessionId?: string→resume?: { sessionId, scriptFragment }; script emits the pre-escaped fragmentpackages/codev/src/agent-farm/commands/architect.ts(+…/-…) — no-Tower path calls the sharedbuildArchitectArgs(was duplicating role injection)packages/codev/src/commands/doctor.ts(+24) — affirm codex architect support; warn that gemini is builder-only (not an architect).gitignore,packages/codev/src/lib/gitignore.ts— net unchanged (the.gemini/settings.jsonentry was added then removed in the rescope)Tests:
packages/codev/src/agent-farm/__tests__/tower-instances.test.ts(+102) — architect resume-skip regression guard (codex + stale Claude jsonl → fresh, no--resume; claude still resumes)packages/codev/src/agent-farm/__tests__/spawn-worktree.test.ts(+75) — builder resume script uses escapedscriptFragment; codex/gemini builders → fresh script, no--resumepackages/codev/src/agent-farm/__tests__/discover-resume-session.test.ts(+45) — harness-arg threading; codex/gemini null-return + claude bundled-object casespackages/codev/src/agent-farm/__tests__/config.test.ts(+53) — override-aware harness resolution (TOWER_ARCHITECT_CMD/--architect-cmd/--builder-cmd→ non-claude harness, nobuildResume)packages/codev/src/agent-farm/__tests__/af-architect.test.ts(+17) —buildArchitectArgsdelegation mockspackages/codev/src/agent-farm/__tests__/tower-utils.test.ts(+1) — minorDocs / artifacts:
codev/resources/arch.md— "Supported Architect Harnesses & Conversation Resume (Supportcodexas an architect #929)" subsection: claude + codex architects, gemini builder-only, Claude-only resume, override-aware resolution, Harden harness resolution: unrecognized override command defaults to claude harness (residual --resume crash-loop) #1062 caveatcodev/resources/lessons-learned.md— one Architecture lesson (audit all invocation seams when extending a provider abstraction)codev/plans/...,codev/reviews/...,codev/state/pir-929_thread.md,codev/projects/929-*/status.yamlCommits
Net vs
main(excluding porch chores + the main-merge commit):53c9f037[PIR Supportcodexas an architect #929] Thread: implement phase complete (codex-only)be84f75a[PIR Supportcodexas an architect #929] Tests + arch.md: claude+codex architects only (drop gemini-architect cases)0d920884[PIR Supportcodexas an architect #929] doctor: bar gemini as architect (builder-only); drop .gemini/settings.json gitignorea6b42daa[PIR Supportcodexas an architect #929] Remove gemini-architect context-file seam (codex-only)db88349c[PIR Supportcodexas an architect #929] Plan revised: codex-only (agy dropped → Supportagy(Antigravity CLI) as an architect #1063, gemini builder-only)1f4bfd30[PIR Supportcodexas an architect #929] Doc: caveat — unrecognized override commands default to claude harness (Harden harness resolution: unrecognized override command defaults to claude harness (residual --resume crash-loop) #1062)9b615cdf[PIR Supportcodexas an architect #929] Address architect integration review: override-aware harness + centralized context files69cf20de[PIR 929][Phase: implement] feat: harness-gated session resume for codex/gemini architectsTest Results
pnpm build: ✓ pass (clean TS types)pnpm test(full suite): ✓ 3332 passed, 48 skipped, 0 failuresafx send, reconnect, affinity, builder--resume) was exercised by the human at thedev-approvalgate against the running worktree — the reason PIR was chosen over AIR/BUGFIX.Architecture Updates
COLD (
codev/resources/arch.md) — updated. The "Supported Architect Harnesses & Conversation Resume (#929)" subsection documents: (1) claude + codex are supported architects selected via.codev/config.json; gemini is builder-only (Gemini CLI retiring, #778); (2) conversation resume is Claude-main-only viaHarnessProvider.buildResume, and the crash-loop it fixes; (3) override-aware harness resolution and the #1062 unrecognized-command caveat; (4) no architect context-file seam exists — claude/codex readAGENTS.mdnatively, and the gemini-onlygetArchitectFilesseam was removed with gemini's architect support.No HOT (
arch-critical.md) change: this PR extends an existing abstraction (Spec 591) rather than introducing a new always-on invariant, so a cold-tier reference detail is the correct routing.Lessons Learned Updates
COLD (
codev/resources/lessons-learned.md, Architecture section) — one lesson retained: when abstracting per-CLI behavior behind a provider, every call site that builds a CLI invocation must route through the provider — including resume/restart paths, not just the obvious fresh-launch path. The resume seam was the one path Spec 591 left harness-blind, and it only crash-loops on the--resumebranch (fresh launches were already correct), which is why "builders already prove the path" didn't cover it.No HOT (
lessons-critical.md) change: this is a spec-narrow recipe better suited to the cold archive.Things to Look At During PR Review
buildResumebundling decision (harness.ts): one method returns both the Node-argvargs(for thespawn()architect site) and a shell-escapedscriptFragment(for the builder bash generator), mirroringbuildRoleInjection/buildScriptRoleInjection. Avoids a second independently-optional method (which would force a!non-null assertion) and avoids.join(' ')-ing a raw argv into bash (word-split/quoting bug).safeToResumeinteraction (tower-instances.ts): the new harness gate composes with the pre-existing sibling-collision guard (safeToResume, Multi-architect conversation resume: disambiguate via per-architect session UUID #832) — resume happens only when both the harness implementsbuildResumeand no persisted siblings exist.getResolvedCommands.architectiscliOverrides.architect || TOWER_ARCHITECT_CMD || config. Within the Tower processcliOverridesis empty (set in the spawningafxprocess, not the long-lived server), so this matches the launch site'sTOWER_ARCHITECT_CMD || config. An explicitshell.architectHarness/shell.builderHarnessstill wins over auto-detection by design.getArchitectFiles/writeArchitectContextFilesseam was deleted (gemini was its only implementer;buildArchitectArgswas its only consumer). ConfirmbuildArchitectArgsstill resolvesgetArchitectHarness(...)(used bybuildRoleInjection) and that codex needs no context file (readsAGENTS.mdnatively). Grep-verified: zero residualgetArchitectFiles/writeArchitectContextFiles/.gemini/settings.jsonreferences.resolveHarnessstill falls through toCLAUDE_HARNESSfor an unrecognized override (e.g.TOWER_ARCHITECT_CMD=bash) with no explicitshell.architectHarness. Pre-existing, narrow, separable (not a Supportcodexas an architect #929 regression). Mitigation: set an explicit harness. Documented inarch.md; tracked in Harden harness resolution: unrecognized override command defaults to claude harness (residual --resume crash-loop) #1062.Consultation Findings & Dispositions
A full 3-way advisory CMAP (gemini, codex, claude) ran on this codex-only PR as a single pass, instructed to read every changed file in full plus callers/dependencies and assess the whole PR (not the unified diff). Verdicts:
buildResume. Three non-blocking coverage nits (below).--resume <claude-uuid>at either the architect or builder site, that the gemini-architect seam removal is complete (zero dangling refs), and that the crash-loop regression is pinned from four independent angles (discover-resume-session,tower-instances,config,spawn-worktreetests). Four non-blocking observations (below).agy(Antigravity CLI) as an architect #1063). Not a review of the code; recorded as unavailable, not as an APPROVE or REQUEST_CHANGES.Net: 2/2 substantive reviewers APPROVE, zero REQUEST_CHANGES, zero blocking defects → no code change required.
Non-blocking nits (consensus, accepted as a low-priority follow-up): both codex and claude noted that the
doctorarchitect-shell branch (the codex-affirm / gemini-builder-only-warning logic) and the no-Towerafx architectcodex path lack direct unit tests, and that the "explicitshell.architectHarnesswins" rule isn't pinned. Disposition: deferred. The branch logic is trivial and both reviewers independently verified it reads correctly; there is currently no test harness for thedoctorshell.architectblock at all (the pre-existing opencode-architect branch is likewise untested), so adding the first test for it is net-new infrastructure beyond this PR's subtractive scope — tracked as a follow-up rather than expanded here. claude also re-flagged the knowndoctor-reads-config-not-overrides cosmetic discrepancy (same class as the #1062 caveat) and thesafeToResumestale-removed-sibling jsonl gap (documented "acceptable until #832"). No action; both pre-existing and separable.Full reviewer outputs:
/tmp/cmap-929-{codex,claude,gemini}.md.How to Test Locally
For reviewers pulling the branch:
pir-929→ Review Diffafx dev pir-929shell.architect: "codex"):afx workspace startmain architect launches with a stale Claude.jsonlpresent in~/.claude/projects/<encoded-cwd>/— must NOT crash-loop, and no--resumein the launched command (primary regression target)afx architect(no-Tower) +afx workspace add-architectlaunch with role injectedafx sendsingle-line / multi-line (>3 lines) /--interrupt/ while streamingafx spawn <id> --resumeon a non-Claude builder → fresh launch + resume notice, inspect.builder-start.shfor no--resume <claude-id>codev doctorwithshell.architect: "codex"→ affirms codex; withgemini→ warns builder-only-not-architect