Skip to content

tool use without auto execution#162

Open
jingyun19 wants to merge 8 commits into
webmachinelearning:mainfrom
jingyun19:patch-1
Open

tool use without auto execution#162
jingyun19 wants to merge 8 commits into
webmachinelearning:mainfrom
jingyun19:patch-1

Conversation

@jingyun19

@jingyun19 jingyun19 commented Nov 19, 2025

Copy link
Copy Markdown

Update explainer and spec to support tool use functionalities without automatic execution.

Explainer: added an example and explained how to make tool calls
Spec: reflect IDL changes in https://chromium-review.googlesource.com/c/chromium/src/+/7092943


Preview | Diff

Added detailed explanations for tool use modes, including examples for open loop and closed loop execution.
Updated README to clarify tool-call and tool-result usage.
Added new types and enums for tool calls and responses.
@jingyun19

Copy link
Copy Markdown
Author

@tomayac tomayac left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to make the code samples more readable and correct. Maybe consider running them all through a tool like prettier, which catches typos like missing commas or parentheses.

As general feedback, could the explainer outline why developers would choose closed vs. open?

Comment thread README.md Outdated
Comment thread README.md
Comment thread README.md
Comment on lines +180 to +185
await session.append([
{role: "user", content: "What is the weather in Seattle?"},
{role: "tool-call", content: {type: "tool-call", value: {callID:" get_weather_1", name: "get_weather", arguments: {location:"Seattle"}}},
{role: "tool-result", content: {type: "tool-response", value: {callID: "get_weather_1", name: "get_weather", result: [{type:"object", value: {temperature: "55F", humidity: "67%"}}]}},
{role: "assistant", content: "The temperature in Seattle is 55F and humidity is 67%"},
]);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
await session.append([
{role: "user", content: "What is the weather in Seattle?"},
{role: "tool-call", content: {type: "tool-call", value: {callID:" get_weather_1", name: "get_weather", arguments: {location:"Seattle"}}},
{role: "tool-result", content: {type: "tool-response", value: {callID: "get_weather_1", name: "get_weather", result: [{type:"object", value: {temperature: "55F", humidity: "67%"}}]}},
{role: "assistant", content: "The temperature in Seattle is 55F and humidity is 67%"},
]);
await session.append([
{ role: "user", content: "What is the weather in Seattle?" },
{
role: "tool-call",
content: {
type: "tool-call",
value: {
callID: " get_weather_1",
name: "get_weather",
arguments: { location: "Seattle" },
},
},
},
{
role: "tool-result",
content: {
type: "tool-response",
value: {
callID: "get_weather_1",
name: "get_weather",
result: [
{ type: "object", value: { temperature: "55F", humidity: "67%" } },
],
},
},
},
{
role: "assistant",
content: "The temperature in Seattle is 55F and humidity is 67%",
},
]);

Comment thread README.md
]);
```

Note that "role" and "type" now supports "tool-call" and "tool-result".

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Note that "role" and "type" now supports "tool-call" and "tool-result".
Note that `"role"` and `"type"` now support `"tool-call"` and `"tool-result"`.

Comment thread README.md
Comment on lines +200 to +212
sessionOptions = structuredClone(options);
sessionOptions.expectedOutputs.push(["tool-call"]);
session = await LanguageModel.create(sessionOptions);

var result = await session.prompt("What is the weather in Seattle?");
if (result.type=="tool-call") {
if (result.name == "get_weather") {
const tool_result = getWeather(result.arguments.location);
result = session.prompt([{role:"tool-result", content: {type: "tool-result", value: {callId: result.callID, name: result.name, result: [{type:"object", value: tool_result}]}}}])
}
} else{
console.log(result)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sessionOptions = structuredClone(options);
sessionOptions.expectedOutputs.push(["tool-call"]);
session = await LanguageModel.create(sessionOptions);
var result = await session.prompt("What is the weather in Seattle?");
if (result.type=="tool-call") {
if (result.name == "get_weather") {
const tool_result = getWeather(result.arguments.location);
result = session.prompt([{role:"tool-result", content: {type: "tool-result", value: {callId: result.callID, name: result.name, result: [{type:"object", value: tool_result}]}}}])
}
} else{
console.log(result)
}
sessionOptions = structuredClone(options);
sessionOptions.expectedOutputs.push(["tool-call"]);
session = await LanguageModel.create(sessionOptions);
var result = await session.prompt("What is the weather in Seattle?");
if (result.type == "tool-call") {
if (result.name == "get_weather") {
const tool_result = getWeather(result.arguments.location);
result = session.prompt([
{
role: "tool-result",
content: {
type: "tool-result",
value: {
callId: result.callID,
name: result.name,
result: [{ type: "object", value: tool_result }],
},
},
},
]);
}
} else {
console.log(result);
}

Comment thread README.md

#### Closed Loop:

To enable automatic execution, add a `execute` function for each tool's implementation, and add a `toolUseConfig` to indicate that execution is enabled and pose a max number of tool calls invoked in a single session generation:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To enable automatic execution, add a `execute` function for each tool's implementation, and add a `toolUseConfig` to indicate that execution is enabled and pose a max number of tool calls invoked in a single session generation:
To enable automatic execution, add an `execute` function for each tool's implementation, and add a `toolUseConfig` to indicate that execution is enabled and pose a max number of tool calls invoked in a single session generation:

Comment thread README.md
sessionOptions.expectedOutputs.push(["tool-call"]);
session = await LanguageModel.create(sessionOptions);

var result = await session.prompt("What is the weather in Seattle?");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var result = await session.prompt("What is the weather in Seattle?");
let result = await session.prompt("What is the weather in Seattle?");

Comment thread README.md
Example:

```js
sessionOptions = structuredClone(options);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sessionOptions = structuredClone(options);
const sessionOptions = structuredClone(options);

Comment thread README.md
```js
sessionOptions = structuredClone(options);
sessionOptions.expectedOutputs.push(["tool-call"]);
session = await LanguageModel.create(sessionOptions);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
session = await LanguageModel.create(sessionOptions);
const session = await LanguageModel.create(sessionOptions);

jingyun19 and others added 5 commits November 26, 2025 09:08
Co-authored-by: Thomas Steiner <tomac@google.com>
Co-authored-by: Thomas Steiner <tomac@google.com>
Added explanation about automatic execution and constraints in planner loop.
@jingyun19

jingyun19 commented Dec 1, 2025

Copy link
Copy Markdown
Author

I added a new section to describe use cases where open loop is preferred. cc @tomayac

@reillyeon

Copy link
Copy Markdown
Collaborator

I hadn't previously considered the context compression use case. That's interesting and motivating to enable developers to manipulate the conversation at this low level.

@nico-martin nico-martin left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this implementation has a few weaknesses when it comes to distinguishing between a Message and a Message.Content element:
Message: Can have a specific role (whether it comes from the user, the assistant, or a tool call); it essentially describes the sender.
Message.Content: There can be multiple instances per Message; it describes the type of content.
I tried to make this concrete with a couple of comments.

Comment thread index.bs
enum LanguageModelMessageRole { "system", "user", "assistant", "tool-call", "tool-response" };

enum LanguageModelMessageType { "text", "image", "audio" };
enum LanguageModelMessageType { "text", "image", "audio","tool-call", "tool-response" };

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, the MessageType should describe the type, or in other words "the modality".
MessageRole: Where does the message come from (who is the sender?)
MessageType: What is the content type of the message (text, image, audio)

Is there a specific reason why I should return a message where the LanguageModelMessageRole = tool-call AND the LanguageModelMessageType = tool-call?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remembered we created "tool-call" for LanguageModelMessageRole so that different browsers can easily format the role for its model-specific implementation. E.g, empty string, or "" or some different control tokens.

If only MessageType supports "tool-call" type and we assumed the role to be "assistant", then the implementation might becomes more complicated than declaring it anyhow and let implementation omit it.

It's also not without precedence that separate roles are defined for tool call and tool response: https://huggingface.co/Trelis/openchat_3.5-function-calling-v3

Comment thread index.bs
// The definitions of `LanguageModelToolCall` and `LanguageModelToolResponse` values
enum LanguageModelToolResultType { "text", "image", "audio", "object" };

dictionary LanguageModelToolResultContent {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end, the result of a tool call is nothing else than a new message in the conversation. But now the message is not coming from the assistant or the user, but from a tool. So I think we should align this with the LanguageModelMessageContent.

Comment thread README.md
```js
await session.append([
{role: "user", content: "What is the weather in Seattle?"},
{role: "tool-call", content: {type: "tool-call", value: {callID:" get_weather_1", name: "get_weather", arguments: {location:"Seattle"}}},

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a technical point of view, tool-call is not a message in the conversation. assistant would be the role and the content could then include tool-call tokens.
So in my opinion the assistant message should always return the actual generated tokens, but for convenience (and cross-model compatibility) also the parsed toolcalls (btw. thats also the same for thinking)

{
  role: "assistant",
  content: "...",
  toolCalls: [
    {
      callId: "get_weather_1",
      name: "get_weather",
      arguments: {
        location: "Seattle"
      }
    }
  ]
}

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear what the client can do with the actual tokens if they have the parsed tool calls anyways. Also, the actual tokens and parsers for function calling are browser- and model-specific implementations, so I'm not sure if all browsers want to expose to client.

(How we expose thinking is a different topic that worth its separate discussion)

Comment thread README.md
await session.append([
{role: "user", content: "What is the weather in Seattle?"},
{role: "tool-call", content: {type: "tool-call", value: {callID:" get_weather_1", name: "get_weather", arguments: {location:"Seattle"}}},
{role: "tool-result", content: {type: "tool-response", value: {callID: "get_weather_1", name: "get_weather", result: [{type:"object", value: {temperature: "55F", humidity: "67%"}}]}},

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, role: "tool-result"+ conten.type: "tool-response"seems unnecessary.
Whit this structure we have a role tool-result and then, just like all the other messages we have a content array where each element can have a different type, depending on what the tools wants to return.

{
  role: "tool-result",
  content:  [
    {
      callId: "get_weather_1",
      type: "object",
      value: [
        {
          type: "object",
          value: {
            temperature: "55F",
            humidity: "67%"
          }
        }
      ]
    }
  ]
}

Comment thread README.md

#### Do I need auto execution?

In general, automatic execution is suitable for use cases where the model quality is good enough via prompt tuning. That can either mean you are tolerable for certain mistakes that the model makes when making tool calls, or the task is simple enough for the model to handle (e.g, just a few distinct tools, short and clean tool output, short context window, etc)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I get this right. For the user, both, the closed and the open loop, are executed automatically. The only difference is that in an open loop, the developer has to execute the tools and start the next generation, while in the closed loop the loop will run without any extra steps.
Also if I dont want to have "automatic execution" as a developer, I could always intercept in the execute function. I would even argue for the wohle LLM conversation it is better to intercept a tool execution inside the execute function. Because then it allows you to return a reason why the tool was not executed intead of letting the model generate the tool call and then it does not know why it was not executed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants