SDK version: mistralai==2.4.9
Python: 3.12
Summary
The usage object returned by chat.complete / chat.complete_async deserializes into mistralai.client.models.usageinfo.UsageInfo, which does not declare a prompt_tokens_details field. Because the model sets extra="allow", the prompt_tokens_details object the API returns (e.g. for prompt caching) lands in __pydantic_extra__ as a raw dict instead of the existing PromptTokensDetails model. This makes attribute access on it silently return None.
Notably the repo already ships a typed PromptTokensDetails model and a second usage model (usageinfo_dollar_defs.py) that does type prompt_tokens_details — but that is not the model the chat endpoint returns.
Reproduction
import asyncio
from mistralai.client import Mistral
async def main():
cli = Mistral(api_key="...")
msgs = [{"role": "system", "content": "x " * 3000},
{"role": "user", "content": "Reply 1."}]
# Warm the prompt cache with a stable key, then read it back
await cli.chat.complete_async(model="mistral-small-latest", messages=msgs,
max_tokens=8, temperature=0, prompt_cache_key="repro")
r = await cli.chat.complete_async(model="mistral-small-latest", messages=msgs,
max_tokens=8, temperature=0, prompt_cache_key="repro")
u = r.usage
print(type(u).__module__, type(u).__name__) # ...usageinfo UsageInfo
print(type(u.prompt_tokens_details).__name__) # dict <-- expected PromptTokensDetails
print(getattr(u.prompt_tokens_details, "cached_tokens", None)) # None <-- looks like no cache!
print(u.prompt_tokens_details["cached_tokens"]) # 20096 <-- value is actually there
asyncio.run(main())
Raw HTTP response (confirms the API sends it correctly)
"usage": {
"prompt_tokens": 20120,
"total_tokens": 20122,
"completion_tokens": 2,
"prompt_tokens_details": { "cached_tokens": 20096 }
}
Expected
usage.prompt_tokens_details should be a typed PromptTokensDetails instance, so usage.prompt_tokens_details.cached_tokens works.
Actual
usage.prompt_tokens_details is a dict; attribute access returns None, which silently reads as "no caching" — easy to misdiagnose as prompt caching being broken. The data is present, only the typing/DX is wrong.
Impact
Anyone tracking prompt-cache savings (cached tokens billed at 10%) off cached_tokens via attribute access gets None and may wrongly conclude caching isn't working.
Suggested fix
Add prompt_tokens_details: Optional[PromptTokensDetails] (and likely audio_tokens / num_cached_tokens) to the chat-completions UsageInfo schema so codegen emits the typed field.
SDK version:
mistralai==2.4.9Python: 3.12
Summary
The
usageobject returned bychat.complete/chat.complete_asyncdeserializes intomistralai.client.models.usageinfo.UsageInfo, which does not declare aprompt_tokens_detailsfield. Because the model setsextra="allow", theprompt_tokens_detailsobject the API returns (e.g. for prompt caching) lands in__pydantic_extra__as a rawdictinstead of the existingPromptTokensDetailsmodel. This makes attribute access on it silently returnNone.Notably the repo already ships a typed
PromptTokensDetailsmodel and a second usage model (usageinfo_dollar_defs.py) that does typeprompt_tokens_details— but that is not the model the chat endpoint returns.Reproduction
Raw HTTP response (confirms the API sends it correctly)
Expected
usage.prompt_tokens_detailsshould be a typedPromptTokensDetailsinstance, sousage.prompt_tokens_details.cached_tokensworks.Actual
usage.prompt_tokens_detailsis adict; attribute access returnsNone, which silently reads as "no caching" — easy to misdiagnose as prompt caching being broken. The data is present, only the typing/DX is wrong.Impact
Anyone tracking prompt-cache savings (cached tokens billed at 10%) off
cached_tokensvia attribute access getsNoneand may wrongly conclude caching isn't working.Suggested fix
Add
prompt_tokens_details: Optional[PromptTokensDetails](and likelyaudio_tokens/num_cached_tokens) to the chat-completionsUsageInfoschema so codegen emits the typed field.