tool use without auto execution#162
Conversation
Added detailed explanations for tool use modes, including examples for open loop and closed loop execution.
Updated README to clarify tool-call and tool-result usage.
Added new types and enums for tool calls and responses.
tomayac
left a comment
There was a problem hiding this comment.
Tried to make the code samples more readable and correct. Maybe consider running them all through a tool like prettier, which catches typos like missing commas or parentheses.
As general feedback, could the explainer outline why developers would choose closed vs. open?
| await session.append([ | ||
| {role: "user", content: "What is the weather in Seattle?"}, | ||
| {role: "tool-call", content: {type: "tool-call", value: {callID:" get_weather_1", name: "get_weather", arguments: {location:"Seattle"}}}, | ||
| {role: "tool-result", content: {type: "tool-response", value: {callID: "get_weather_1", name: "get_weather", result: [{type:"object", value: {temperature: "55F", humidity: "67%"}}]}}, | ||
| {role: "assistant", content: "The temperature in Seattle is 55F and humidity is 67%"}, | ||
| ]); |
There was a problem hiding this comment.
| await session.append([ | |
| {role: "user", content: "What is the weather in Seattle?"}, | |
| {role: "tool-call", content: {type: "tool-call", value: {callID:" get_weather_1", name: "get_weather", arguments: {location:"Seattle"}}}, | |
| {role: "tool-result", content: {type: "tool-response", value: {callID: "get_weather_1", name: "get_weather", result: [{type:"object", value: {temperature: "55F", humidity: "67%"}}]}}, | |
| {role: "assistant", content: "The temperature in Seattle is 55F and humidity is 67%"}, | |
| ]); | |
| await session.append([ | |
| { role: "user", content: "What is the weather in Seattle?" }, | |
| { | |
| role: "tool-call", | |
| content: { | |
| type: "tool-call", | |
| value: { | |
| callID: " get_weather_1", | |
| name: "get_weather", | |
| arguments: { location: "Seattle" }, | |
| }, | |
| }, | |
| }, | |
| { | |
| role: "tool-result", | |
| content: { | |
| type: "tool-response", | |
| value: { | |
| callID: "get_weather_1", | |
| name: "get_weather", | |
| result: [ | |
| { type: "object", value: { temperature: "55F", humidity: "67%" } }, | |
| ], | |
| }, | |
| }, | |
| }, | |
| { | |
| role: "assistant", | |
| content: "The temperature in Seattle is 55F and humidity is 67%", | |
| }, | |
| ]); |
| ]); | ||
| ``` | ||
|
|
||
| Note that "role" and "type" now supports "tool-call" and "tool-result". |
There was a problem hiding this comment.
| Note that "role" and "type" now supports "tool-call" and "tool-result". | |
| Note that `"role"` and `"type"` now support `"tool-call"` and `"tool-result"`. |
| sessionOptions = structuredClone(options); | ||
| sessionOptions.expectedOutputs.push(["tool-call"]); | ||
| session = await LanguageModel.create(sessionOptions); | ||
|
|
||
| var result = await session.prompt("What is the weather in Seattle?"); | ||
| if (result.type=="tool-call") { | ||
| if (result.name == "get_weather") { | ||
| const tool_result = getWeather(result.arguments.location); | ||
| result = session.prompt([{role:"tool-result", content: {type: "tool-result", value: {callId: result.callID, name: result.name, result: [{type:"object", value: tool_result}]}}}]) | ||
| } | ||
| } else{ | ||
| console.log(result) | ||
| } |
There was a problem hiding this comment.
| sessionOptions = structuredClone(options); | |
| sessionOptions.expectedOutputs.push(["tool-call"]); | |
| session = await LanguageModel.create(sessionOptions); | |
| var result = await session.prompt("What is the weather in Seattle?"); | |
| if (result.type=="tool-call") { | |
| if (result.name == "get_weather") { | |
| const tool_result = getWeather(result.arguments.location); | |
| result = session.prompt([{role:"tool-result", content: {type: "tool-result", value: {callId: result.callID, name: result.name, result: [{type:"object", value: tool_result}]}}}]) | |
| } | |
| } else{ | |
| console.log(result) | |
| } | |
| sessionOptions = structuredClone(options); | |
| sessionOptions.expectedOutputs.push(["tool-call"]); | |
| session = await LanguageModel.create(sessionOptions); | |
| var result = await session.prompt("What is the weather in Seattle?"); | |
| if (result.type == "tool-call") { | |
| if (result.name == "get_weather") { | |
| const tool_result = getWeather(result.arguments.location); | |
| result = session.prompt([ | |
| { | |
| role: "tool-result", | |
| content: { | |
| type: "tool-result", | |
| value: { | |
| callId: result.callID, | |
| name: result.name, | |
| result: [{ type: "object", value: tool_result }], | |
| }, | |
| }, | |
| }, | |
| ]); | |
| } | |
| } else { | |
| console.log(result); | |
| } |
|
|
||
| #### Closed Loop: | ||
|
|
||
| To enable automatic execution, add a `execute` function for each tool's implementation, and add a `toolUseConfig` to indicate that execution is enabled and pose a max number of tool calls invoked in a single session generation: |
There was a problem hiding this comment.
| To enable automatic execution, add a `execute` function for each tool's implementation, and add a `toolUseConfig` to indicate that execution is enabled and pose a max number of tool calls invoked in a single session generation: | |
| To enable automatic execution, add an `execute` function for each tool's implementation, and add a `toolUseConfig` to indicate that execution is enabled and pose a max number of tool calls invoked in a single session generation: |
| sessionOptions.expectedOutputs.push(["tool-call"]); | ||
| session = await LanguageModel.create(sessionOptions); | ||
|
|
||
| var result = await session.prompt("What is the weather in Seattle?"); |
There was a problem hiding this comment.
| var result = await session.prompt("What is the weather in Seattle?"); | |
| let result = await session.prompt("What is the weather in Seattle?"); |
| Example: | ||
|
|
||
| ```js | ||
| sessionOptions = structuredClone(options); |
There was a problem hiding this comment.
| sessionOptions = structuredClone(options); | |
| const sessionOptions = structuredClone(options); |
| ```js | ||
| sessionOptions = structuredClone(options); | ||
| sessionOptions.expectedOutputs.push(["tool-call"]); | ||
| session = await LanguageModel.create(sessionOptions); |
There was a problem hiding this comment.
| session = await LanguageModel.create(sessionOptions); | |
| const session = await LanguageModel.create(sessionOptions); |
Co-authored-by: Thomas Steiner <tomac@google.com>
Co-authored-by: Thomas Steiner <tomac@google.com>
Added explanation about automatic execution and constraints in planner loop.
|
I added a new section to describe use cases where open loop is preferred. cc @tomayac |
|
I hadn't previously considered the context compression use case. That's interesting and motivating to enable developers to manipulate the conversation at this low level. |
nico-martin
left a comment
There was a problem hiding this comment.
I think this implementation has a few weaknesses when it comes to distinguishing between a Message and a Message.Content element:
Message: Can have a specific role (whether it comes from the user, the assistant, or a tool call); it essentially describes the sender.
Message.Content: There can be multiple instances per Message; it describes the type of content.
I tried to make this concrete with a couple of comments.
| enum LanguageModelMessageRole { "system", "user", "assistant", "tool-call", "tool-response" }; | ||
|
|
||
| enum LanguageModelMessageType { "text", "image", "audio" }; | ||
| enum LanguageModelMessageType { "text", "image", "audio","tool-call", "tool-response" }; |
There was a problem hiding this comment.
In my opinion, the MessageType should describe the type, or in other words "the modality".
MessageRole: Where does the message come from (who is the sender?)
MessageType: What is the content type of the message (text, image, audio)
Is there a specific reason why I should return a message where the LanguageModelMessageRole = tool-call AND the LanguageModelMessageType = tool-call?
There was a problem hiding this comment.
I remembered we created "tool-call" for LanguageModelMessageRole so that different browsers can easily format the role for its model-specific implementation. E.g, empty string, or "" or some different control tokens.
If only MessageType supports "tool-call" type and we assumed the role to be "assistant", then the implementation might becomes more complicated than declaring it anyhow and let implementation omit it.
It's also not without precedence that separate roles are defined for tool call and tool response: https://huggingface.co/Trelis/openchat_3.5-function-calling-v3
| // The definitions of `LanguageModelToolCall` and `LanguageModelToolResponse` values | ||
| enum LanguageModelToolResultType { "text", "image", "audio", "object" }; | ||
|
|
||
| dictionary LanguageModelToolResultContent { |
There was a problem hiding this comment.
In the end, the result of a tool call is nothing else than a new message in the conversation. But now the message is not coming from the assistant or the user, but from a tool. So I think we should align this with the LanguageModelMessageContent.
| ```js | ||
| await session.append([ | ||
| {role: "user", content: "What is the weather in Seattle?"}, | ||
| {role: "tool-call", content: {type: "tool-call", value: {callID:" get_weather_1", name: "get_weather", arguments: {location:"Seattle"}}}, |
There was a problem hiding this comment.
From a technical point of view, tool-call is not a message in the conversation. assistant would be the role and the content could then include tool-call tokens.
So in my opinion the assistant message should always return the actual generated tokens, but for convenience (and cross-model compatibility) also the parsed toolcalls (btw. thats also the same for thinking)
{
role: "assistant",
content: "...",
toolCalls: [
{
callId: "get_weather_1",
name: "get_weather",
arguments: {
location: "Seattle"
}
}
]
}
There was a problem hiding this comment.
It's not clear what the client can do with the actual tokens if they have the parsed tool calls anyways. Also, the actual tokens and parsers for function calling are browser- and model-specific implementations, so I'm not sure if all browsers want to expose to client.
(How we expose thinking is a different topic that worth its separate discussion)
| await session.append([ | ||
| {role: "user", content: "What is the weather in Seattle?"}, | ||
| {role: "tool-call", content: {type: "tool-call", value: {callID:" get_weather_1", name: "get_weather", arguments: {location:"Seattle"}}}, | ||
| {role: "tool-result", content: {type: "tool-response", value: {callID: "get_weather_1", name: "get_weather", result: [{type:"object", value: {temperature: "55F", humidity: "67%"}}]}}, |
There was a problem hiding this comment.
As mentioned above, role: "tool-result"+ conten.type: "tool-response"seems unnecessary.
Whit this structure we have a role tool-result and then, just like all the other messages we have a content array where each element can have a different type, depending on what the tools wants to return.
{
role: "tool-result",
content: [
{
callId: "get_weather_1",
type: "object",
value: [
{
type: "object",
value: {
temperature: "55F",
humidity: "67%"
}
}
]
}
]
}
|
|
||
| #### Do I need auto execution? | ||
|
|
||
| In general, automatic execution is suitable for use cases where the model quality is good enough via prompt tuning. That can either mean you are tolerable for certain mistakes that the model makes when making tool calls, or the task is simple enough for the model to handle (e.g, just a few distinct tools, short and clean tool output, short context window, etc) |
There was a problem hiding this comment.
Not sure if I get this right. For the user, both, the closed and the open loop, are executed automatically. The only difference is that in an open loop, the developer has to execute the tools and start the next generation, while in the closed loop the loop will run without any extra steps.
Also if I dont want to have "automatic execution" as a developer, I could always intercept in the execute function. I would even argue for the wohle LLM conversation it is better to intercept a tool execution inside the execute function. Because then it allows you to return a reason why the tool was not executed intead of letting the model generate the tool call and then it does not know why it was not executed.
Update explainer and spec to support tool use functionalities without automatic execution.
Explainer: added an example and explained how to make tool calls
Spec: reflect IDL changes in https://chromium-review.googlesource.com/c/chromium/src/+/7092943
Preview | Diff