Skip to content

Chat completions UsageInfo does not type prompt_tokens_details (returned as untyped dict) #572

@sebkuepers

Description

@sebkuepers

SDK version: mistralai==2.4.9
Python: 3.12

Summary

The usage object returned by chat.complete / chat.complete_async deserializes into mistralai.client.models.usageinfo.UsageInfo, which does not declare a prompt_tokens_details field. Because the model sets extra="allow", the prompt_tokens_details object the API returns (e.g. for prompt caching) lands in __pydantic_extra__ as a raw dict instead of the existing PromptTokensDetails model. This makes attribute access on it silently return None.

Notably the repo already ships a typed PromptTokensDetails model and a second usage model (usageinfo_dollar_defs.py) that does type prompt_tokens_details — but that is not the model the chat endpoint returns.

Reproduction

import asyncio
from mistralai.client import Mistral

async def main():
    cli = Mistral(api_key="...")
    msgs = [{"role": "system", "content": "x " * 3000},
            {"role": "user", "content": "Reply 1."}]
    # Warm the prompt cache with a stable key, then read it back
    await cli.chat.complete_async(model="mistral-small-latest", messages=msgs,
                                  max_tokens=8, temperature=0, prompt_cache_key="repro")
    r = await cli.chat.complete_async(model="mistral-small-latest", messages=msgs,
                                      max_tokens=8, temperature=0, prompt_cache_key="repro")
    u = r.usage
    print(type(u).__module__, type(u).__name__)          # ...usageinfo UsageInfo
    print(type(u.prompt_tokens_details).__name__)        # dict   <-- expected PromptTokensDetails
    print(getattr(u.prompt_tokens_details, "cached_tokens", None))  # None  <-- looks like no cache!
    print(u.prompt_tokens_details["cached_tokens"])      # 20096 <-- value is actually there

asyncio.run(main())

Raw HTTP response (confirms the API sends it correctly)

"usage": {
  "prompt_tokens": 20120,
  "total_tokens": 20122,
  "completion_tokens": 2,
  "prompt_tokens_details": { "cached_tokens": 20096 }
}

Expected

usage.prompt_tokens_details should be a typed PromptTokensDetails instance, so usage.prompt_tokens_details.cached_tokens works.

Actual

usage.prompt_tokens_details is a dict; attribute access returns None, which silently reads as "no caching" — easy to misdiagnose as prompt caching being broken. The data is present, only the typing/DX is wrong.

Impact

Anyone tracking prompt-cache savings (cached tokens billed at 10%) off cached_tokens via attribute access gets None and may wrongly conclude caching isn't working.

Suggested fix

Add prompt_tokens_details: Optional[PromptTokensDetails] (and likely audio_tokens / num_cached_tokens) to the chat-completions UsageInfo schema so codegen emits the typed field.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions