Skip to content

GLM-5.2 schema-swapped tool_use blocks — detect, break loop, surface to user instead of presenting as 'corruption' #48

Description

@RasputinKaiser

Summary

When running on zai-org/GLM-5.2-FP8, the model occasionally emits a tool_use block whose input matches a different tool's schema than the one named in the call. NCode's Zod validator rejects it, the model retries in a loop, and the turn exhausts itself — visible to the user as garbled thinking and torn output that looks like transcript corruption.

This is a model-side argument-shape confusion, but NCode could do more to prevent it, detect it, and help the user recover.

Observed mismatches (one query chain)

  • Bash called with {content, file_path} (Write's shape) — missing command
  • Write called with {command, description, timeout} (Bash's shape) — missing file_path, content
  • Write called with content as an object instead of a string

Counts on a single affected chain: 12 Error normalizing tool input events, ~16 tool rejections. The API returned 200 / stop_reason=tool_use every time — no overload, no content filter. The on-disk JSONL transcript is fully valid (0 malformed lines), so this is not a storage/harness corruption bug.

What NCode could do to help

  1. Schema-swap detection. When a tool_use is rejected for unrecognized_keys or invalid_type on a required field, check whether the rejected keys match another tool's required params. If so, log a distinct event (ncode_tool_use_schema_swap) separate from generic ncode_tool_use_error, so this class of glitch is observable in telemetry instead of buried in Error normalizing tool input.

  2. Break the retry loop. When the same tool_use call fails the same schema validation N times in a row on the same queryChainId (say 3+), stop re-emitting the tool result and inject a structured system message telling the model its last tool call used the wrong argument shape — instead of letting Zod's raw error bounce around the context until the turn budget is gone.

  3. User-visible signal. When this happens, surface a one-line hint to the user in the UI (e.g. "model emitted a malformed tool call N times — start a fresh turn or /compact"). Right now the only evidence is buried in the debug log; the user just sees the model spinning.

  4. Resume guardrail. On --resume, if the last assistant message in the transcript contains a tool_use whose input fails the named tool's schema, warn the user before sending the next prompt — the malformed message stays in history and can re-trigger the loop on resume. Optionally offer to strip it.

  5. Stricter input coercion at the boundary. For tools where a field has a known wrong-shape pattern (e.g. content arriving as an object instead of a string), consider a targeted coercion or a clearer error message that tells the model which field it got wrong, rather than the full Zod dump.

Why this matters

The current UX presents as "corruption in thinking + output," which sent me down a rabbit hole auditing the JSONL and debug log before realizing the transcript was clean. A distinct event + an early break + a UI hint would have made this a one-line diagnosis instead of a multi-hour forensic dig.

Environment

  • Model: /data/models/hf/zai-org__GLM-5.2-FP8
  • Permission mode: bypassPermissions
  • Affects: resumed sessions (the malformed tool_use persists in history)
  • Session ID (for correlating against telemetry, if useful): cfc2b511-c223-42ef-a5b8-c1b5c4ef3380
  • queryChainId: 76ebaf5a-3915-4da0-ac2e-e2bcd3e2975c

Happy to provide a redacted excerpt of the relevant debug-log lines if helpful — I'm not attaching the raw log since NCode debug logs contain env vars / config snapshots / file contents and shouldn't be uploaded to a public tracker.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions