GLM-5.2 schema-swapped tool_use blocks — detect, break loop, surface to user instead of presenting as 'corruption'

## Summary

When running on `zai-org/GLM-5.2-FP8`, the model occasionally emits a `tool_use` block whose `input` matches a **different tool's** schema than the one named in the call. NCode's Zod validator rejects it, the model retries in a loop, and the turn exhausts itself — visible to the user as garbled thinking and torn output that looks like transcript corruption.

This is a model-side argument-shape confusion, but NCode could do more to prevent it, detect it, and help the user recover.

## Observed mismatches (one query chain)

- `Bash` called with `{content, file_path}` (Write's shape) — missing `command`
- `Write` called with `{command, description, timeout}` (Bash's shape) — missing `file_path`, `content`
- `Write` called with `content` as an **object** instead of a string

Counts on a single affected chain: 12 `Error normalizing tool input` events, ~16 tool rejections. The API returned 200 / `stop_reason=tool_use` every time — no overload, no content filter. The on-disk JSONL transcript is fully valid (0 malformed lines), so this is **not** a storage/harness corruption bug.

## What NCode could do to help

1. **Schema-swap detection.** When a `tool_use` is rejected for `unrecognized_keys` or `invalid_type` on a required field, check whether the *rejected keys* match another tool's required params. If so, log a distinct event (`ncode_tool_use_schema_swap`) separate from generic `ncode_tool_use_error`, so this class of glitch is observable in telemetry instead of buried in `Error normalizing tool input`.

2. **Break the retry loop.** When the same `tool_use` call fails the same schema validation N times in a row on the same queryChainId (say 3+), stop re-emitting the tool result and inject a structured system message telling the model its last tool call used the wrong argument shape — instead of letting Zod's raw error bounce around the context until the turn budget is gone.

3. **User-visible signal.** When this happens, surface a one-line hint to the user in the UI (e.g. "model emitted a malformed tool call N times — start a fresh turn or /compact"). Right now the only evidence is buried in the debug log; the user just sees the model spinning.

4. **Resume guardrail.** On `--resume`, if the last assistant message in the transcript contains a `tool_use` whose `input` fails the named tool's schema, warn the user before sending the next prompt — the malformed message stays in history and can re-trigger the loop on resume. Optionally offer to strip it.

5. **Stricter input coercion at the boundary.** For tools where a field has a known wrong-shape pattern (e.g. `content` arriving as an object instead of a string), consider a targeted coercion or a clearer error message that tells the model *which* field it got wrong, rather than the full Zod dump.

## Why this matters

The current UX presents as "corruption in thinking + output," which sent me down a rabbit hole auditing the JSONL and debug log before realizing the transcript was clean. A distinct event + an early break + a UI hint would have made this a one-line diagnosis instead of a multi-hour forensic dig.

## Environment

- Model: `/data/models/hf/zai-org__GLM-5.2-FP8`
- Permission mode: bypassPermissions
- Affects: resumed sessions (the malformed `tool_use` persists in history)
- Session ID (for correlating against telemetry, if useful): `cfc2b511-c223-42ef-a5b8-c1b5c4ef3380`
- queryChainId: `76ebaf5a-3915-4da0-ac2e-e2bcd3e2975c`

Happy to provide a redacted excerpt of the relevant debug-log lines if helpful — I'm not attaching the raw log since NCode debug logs contain env vars / config snapshots / file contents and shouldn't be uploaded to a public tracker.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GLM-5.2 schema-swapped tool_use blocks — detect, break loop, surface to user instead of presenting as 'corruption' #48

Summary

Observed mismatches (one query chain)

What NCode could do to help

Why this matters

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

GLM-5.2 schema-swapped tool_use blocks — detect, break loop, surface to user instead of presenting as 'corruption' #48

Description

Summary

Observed mismatches (one query chain)

What NCode could do to help

Why this matters

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions