Skip to content

Support tool callbacks in MCP sampling#2998

Draft
EronWright wants to merge 4 commits into
docker:mainfrom
EronWright:sampling-tools
Draft

Support tool callbacks in MCP sampling#2998
EronWright wants to merge 4 commits into
docker:mainfrom
EronWright:sampling-tools

Conversation

@EronWright

@EronWright EronWright commented Jun 4, 2026

Copy link
Copy Markdown

Summary

Closes the tool callbacks functional gap in MCP sampling support — a follow-up to #2815, addressing one of the remaining items from #2809.

When an MCP server includes a tools array in a sampling/createMessage request, the host now drives its model with those tools and returns any tool_use blocks back to the server as ToolUseContent. The server remains responsible for executing the tool and continuing the loop in a follow-up sampling request.

sequenceDiagram
     participant H as cagent
      participant S as MCP Server
      participant L as LLM

      activate H
      H->>+S: tools/call {name, arguments}

      note over S: needs LLM inference

      S->>+H: sampling/createMessage<br/>{messages, tools: [...]}
      H->>+L: chat completion
      L-->>-H: ToolUseContent<br/>stopReason: "toolUse"
      H-->>-S: CreateMessageResult<br/>{tool_use, stopReason: "toolUse"}

      note over S: executes tool locally

      S->>+H: sampling/createMessage<br/>{messages + tool_use + tool_result, tools: [...]}
      H->>+L: chat completion
      L-->>-H: TextContent<br/>stopReason: "endTurn"
      H-->>-S: CreateMessageResult<br/>{text, stopReason: "endTurn"}

      S-->>-H: tool result
      deactivate H
Loading

What's new

  • New SamplingWithToolsHandler type and SampleableWithTools interface — additive, parallel to the existing SamplingHandler / Sampleable. No breaking changes to the basic sampling path merged in feat(mcp): add sampling/createMessage support #2815.
  • MCP toolset wires both handler types. At Initialize, exactly one of the SDK's mutually exclusive ClientOptions.CreateMessage* fields is populated — prefer with-tools when registered, fall back to basic.
  • Capability handshake advertises sampling.tools so servers know the host can receive tool-enabled requests.
  • Runtime handler (pkg/runtime/sampling.go):
    • Converts V2 multi-block messages: text, image/audio, tool_use → assistant ToolCalls, tool_resultMessageRoleTool rows (parallel tool_results expand to multiple chat.Message rows).
    • Converts []*mcp.Tool[]tools.Tool with a no-op handler (the server, not the host, executes).
    • Drives model.CreateChatCompletionStream, aggregates streamed tool calls.
    • Builds result Content with TextContent + ToolUseContent blocks; stopReason: "toolUse" when tool calls are present.
  • New limits: maxSamplingTools=64, maxSamplingToolCalls=32.
  • End-to-end test (e2e/sampling_test.go): mounts an in-process gomcp.NewServer on an httptest server via StreamableHTTPHandler. The server exposes one tool (ask_with_calculator) whose handler drives a real sampling-with-tools loop against the connecting cagent. The Gemini side is recorded once and replayed on subsequent runs, so the test runs offline in CI.

Out of scope (separate gaps from #2809)

  • Human-in-the-loop approval UI
  • Model-preference hints

Test plan

  • Unit tests pass: `go test ./pkg/runtime/... ./pkg/tools/...`
  • `go build ./...` clean
  • `go vet ./...` clean
  • `task lint` (0 offenses)
  • `gofmt` clean on all changed files
  • End-to-end via `TestExec_Gemini_SamplingWithTools` (cassette replays offline; verified once against live Gemini):
    • Handshake includes `sampling.tools` capability (implicit — `ServerSession.CreateMessageWithTools` would refuse otherwise)
    • LLM receives the server-supplied tools
    • Response contains `ToolUseContent` with `stopReason: "toolUse"` when the model emits a tool_use
    • Follow-up sampling request with `tool_result` blocks is converted correctly and the loop terminates with `endTurn`

@aheritier aheritier added area/agent For work that has to do with the general agent loop/agentic features of the app area/tools For features/issues/fixes related to the usage of built-in and MCP tools area/mcp MCP protocol, MCP tool servers, integration kind/feat PR adds a new feature (maps to feat: commit prefix) labels Jun 4, 2026
@EronWright EronWright marked this pull request as ready for review June 7, 2026 00:56
@EronWright EronWright requested a review from a team as a code owner June 7, 2026 00:56
@aheritier aheritier added the area/providers For features/issues/fixes related to LLM providers (Bedrock, LiteLLM, Qwen, custom, etc.) label Jun 7, 2026
@aheritier

Copy link
Copy Markdown
Contributor

/review

@docker-agent docker-agent left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: 🟡 NEEDS ATTENTION

Two medium-severity findings in the newly-added sampling-with-tools code. The core stream-aggregation logic, capability handshake, content building, and limits enforcement are all well-structured and correctly tested.

Comment thread pkg/runtime/sampling.go Outdated
Comment thread pkg/tools/mcp/remote.go
@aheritier aheritier removed the area/providers For features/issues/fixes related to LLM providers (Bedrock, LiteLLM, Qwen, custom, etc.) label Jun 8, 2026
@EronWright EronWright marked this pull request as draft June 8, 2026 15:03
@EronWright EronWright requested a review from docker-agent June 8, 2026 20:35
@EronWright EronWright marked this pull request as ready for review June 8, 2026 20:38
@aheritier aheritier added area/testing Test infrastructure, CI/CD, test runners, evaluation area/providers/gemini Google Gemini provider support labels Jun 8, 2026
@dgageot

dgageot commented Jun 10, 2026

Copy link
Copy Markdown
Member

Sorry @EronWright, our project requires all the commit to be signed and verified. Could you sign them? Thanks!

@aheritier aheritier added area/testing Test infrastructure, CI/CD, test runners, evaluation status/blocked Author can't proceed and needs external help to get unblocked and removed area/testing Test infrastructure, CI/CD, test runners, evaluation area/providers/gemini Google Gemini provider support labels Jun 12, 2026
@aheritier aheritier marked this pull request as draft June 12, 2026 14:38
@aheritier

Copy link
Copy Markdown
Contributor

I moved it back to draft @EronWright the time you rewrite the history with signed commits. Sorry for the issue

@EronWright EronWright marked this pull request as ready for review June 15, 2026 15:06
@EronWright

Copy link
Copy Markdown
Author

@aheritier sorry for delay, all commits are now signed/verified.

Adds a parallel SamplingWithToolsHandler alongside the existing
SamplingHandler so MCP servers can include a tools array in
sampling/createMessage requests. The host drives its model with those
tools and returns any tool_use blocks as ToolUseContent; the server
remains responsible for executing the tool and continuing the loop in a
follow-up sampling request.

The initialize handshake now advertises sampling.tools capability, and
the MCP toolset selects the appropriate go-sdk handler (basic vs.
with-tools) based on which handler is registered.

Signed-off-by: Eron Wright <eronwright@gmail.com>
Mounts an in-process gomcp.NewServer on an httptest server via
StreamableHTTPHandler. Its one tool, ask_with_calculator, runs a
sampling loop: sends sampling/createMessage with a calculator tool,
gets a tool_use back from the host LLM, "executes" the calculator,
sends a follow-up sampling request carrying the tool_result, and
returns the final text. The Gemini side is recorded once and replayed
on subsequent runs, so the test runs offline in CI.

Signed-off-by: Eron Wright <eronwright@gmail.com>
- Reject tool_use blocks under a non-assistant role explicitly rather
  than silently dropping the message. The MCP spec places tool_use on
  assistant turns, but a malformed server would previously have its
  message disappear from the converted chat history with no error.
- Document the ordering dependency between SetSampling*Handler and
  Initialize in both stdio and remote MCP client setups, so future
  maintainers don't reorder them and silently lose the sampling
  handler on reconnect.

Signed-off-by: Eron Wright <eronwright@gmail.com>
Use errors.New for the static loop-exhausted message and strconv.FormatInt
for the calculator result instead of fmt.Errorf / fmt.Sprintf %d.

Signed-off-by: Eron Wright <eronwright@gmail.com>
@aheritier

Copy link
Copy Markdown
Contributor

/review

@docker-agent docker-agent left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: 🟡 NEEDS ATTENTION

Two medium-severity issues found in the new sampling-with-tools path. Both relate to edge cases in the streaming and reconnect logic.

Comment thread pkg/runtime/sampling.go
if choice.FinishReason != "" {
providerFinishReason = choice.FinishReason
}
if choice.FinishReason == chat.FinishReasonStop || choice.FinishReason == chat.FinishReasonLength {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Early break on FinishReasonStop may drop tool-call chunks from later SSE events

drainSamplingStreamWithTools breaks out of the stream-read loop as soon as it sees FinishReasonStop or FinishReasonLength:

if choice.FinishReason == chat.FinishReasonStop || choice.FinishReason == chat.FinishReasonLength {
    break
}

Tool-call deltas that appear in the same chunk as the finish-reason are captured correctly (they're processed in the for _, delta := range choice.Delta.ToolCalls loop a few lines earlier). However, if a provider splits its response so that tool-call argument chunks arrive in subsequent SSE frames after the finish-reason frame, those deltas are silently discarded.

The analogous handleStream in streaming.go guards against this with a continue that skips the early-exit when the current chunk contains only tool-call deltas (no finish reason yet). That pattern is missing here.

Impact: Incomplete or missing tool calls are returned to the MCP server as a ToolUseContent block with truncated arguments, causing the server's tool-execution loop to fail or produce wrong results. The failure is silent — no error is returned.

Suggested fix: Mirror the streaming.go guard: after accumulating tool-call deltas, continue to the next chunk if choice.FinishReason == "", and only break on FinishReasonStop/FinishReasonLength once all pending deltas have been flushed.

Comment thread pkg/tools/mcp/stdio.go
PromptListChangedHandler: promptChanged,
}
switch {
case c.samplingWithToolsHandler != nil:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Reconnect race: sampling handler may not be registered with the SDK if configureToolsetHandlers hasn't run yet

Initialize (called on both the initial connect and every supervisor-driven reconnect) reads c.samplingWithToolsHandler and c.samplingHandler at call time to decide which SDK-level callback to register:

switch {
case c.samplingWithToolsHandler != nil:
    opts.CreateMessageWithToolsHandler = c.handleSamplingWithToolsRequest
case c.samplingHandler != nil:
    opts.CreateMessageHandler = c.handleSamplingRequest
}
// if both are nil → no sampling callback registered with the SDK client

The SDK client is constructed with opts and never updated afterwards. If a reconnect happens before the runtime's configureToolsetHandlers has called SetSamplingWithToolsHandler / SetSamplingHandler, both fields are nil at Initialize time. The resulting SDK session has no CreateMessage* handler at all, so any sampling/createMessage request from the server is rejected with an error for the lifetime of that session.

This is a regression from the previous unconditional registration pattern, where the basic sampling callback was always wired. The comment "callers must invoke … before any reconnect" documents the contract, but it is not enforced by any ordering guarantee in the reconnect path today.

Impact: An MCP server that triggers sampling during or shortly after a reconnect (e.g., a long-running agentic loop that reconnects due to a network hiccup) will receive an error from the sampling handler instead of a model response, breaking its tool-execution loop.

Suggested mitigation: Either (a) guarantee in the supervisor/reconnect path that configureToolsetHandlers is always called before Initialize completes, and add a test that verifies this ordering; or (b) fall back to registering a best-effort handler at Initialize time that reads the handler fields lazily at request time (similar to how handleSamplingWithToolsRequest already reads c.samplingWithToolsHandler under the mutex), so a late SetSamplingWithToolsHandler call is still effective even after reconnect.

@aheritier aheritier marked this pull request as draft June 22, 2026 13:04
@aheritier aheritier added the status/needs-rebase PR has merge conflicts or is out of date with main label Jun 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/agent For work that has to do with the general agent loop/agentic features of the app area/mcp MCP protocol, MCP tool servers, integration area/testing Test infrastructure, CI/CD, test runners, evaluation area/tools For features/issues/fixes related to the usage of built-in and MCP tools kind/feat PR adds a new feature (maps to feat: commit prefix) status/blocked Author can't proceed and needs external help to get unblocked status/needs-rebase PR has merge conflicts or is out of date with main

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants