Skip to content

fix: forward image and PDF attachments for DMR models#3197

Open
Sayt-0 wants to merge 1 commit into
mainfrom
fix/2739-dmr-attachment-caps
Open

fix: forward image and PDF attachments for DMR models#3197
Sayt-0 wants to merge 1 commit into
mainfrom
fix/2739-dmr-attachment-caps

Conversation

@Sayt-0

@Sayt-0 Sayt-0 commented Jun 22, 2026

Copy link
Copy Markdown
Member

Summary

DMR-hosted models are absent from the models.dev catalog, so modelinfo.LoadCaps
always missed for dmr/<model> IDs and image/PDF attachments were silently
dropped to a text-only strategy. This adds a per-model capability opt-in so DMR
vision/PDF models can forward attachments, and forwards PDFs as a native file
content part on the Chat Completions path.

Fixes #2739.

Issue expectations mapped to the implementation

Issue element Implementation Status
Vision/PDF attachments always degrade to text-only Declared capabilities restore forwarding done
Root cause: models.dev has no dmr provider, so the lookup always misses DMR path no longer depends on the lookup; capabilities are injected explicitly done
Possible fix: local allowlist of known DMR models Not chosen (brittle, needs ongoing upkeep) rejected
Possible fix: provider-level opt-in (capability hint in model config) Chosen: provider_opts.supports_images / supports_pdf done
Possible fix: query a DMR capability API Not chosen (no stable capability API exposed; adds latency and a CLI dependency) rejected

How it works

models:
  qwen-vl:
    provider: dmr
    model: ai/qwen2.5-vl
    provider_opts:
      supports_images: true
      supports_pdf: false

provider_opts is parsed and validated at client creation, stored as
Client.attachmentCaps, and injected via oaistream.ConvertMessagesWithCaps.
The models.dev lookup is skipped on the DMR path because it can never resolve a
dmr/<model> ID.

Changes

Area Change
pkg/model/provider/dmr/configure.go parseBoolOpt, plus parse and validate supports_images / supports_pdf
pkg/model/provider/dmr/client.go Client.attachmentCaps set in NewClient, injected in convertMessages
pkg/model/provider/oaistream/messages.go ConvertMessagesWithCaps; internals thread modelinfo.ModelCapabilities, resolving caps once per batch
pkg/model/provider/oaistream/attachments.go Removed the unused store-variant convertDocument; PDFs forwarded as a file content part instead of dropped
agent-schema.json, examples/dmr.yaml Documented the keys; added a qwen_vision example
tests DMR parse/conversion/end-to-end and oaistream caps/PDF tests

Behavior

Case Before After
DMR vision model, image attachment dropped (text-only) forwarded when supports_images: true
DMR model, PDF attachment dropped forwarded as file part when supports_pdf: true
DMR model, no declared capability dropped dropped (unchanged, conservative default)
Catalogued providers (openai, anthropic, ...) models.dev lookup unchanged

Testing

  • pkg/model/provider/dmr, pkg/model/provider/oaistream, pkg/modelinfo, pkg/attachment, pkg/model/provider/openai: pass.
  • pkg/config example parsing (TestParseExamples) including the new dmr.yaml entry: pass.
  • golangci-lint and the repository linter: clean.

Note for reviewers

The PDF forwarding change also affects the OpenAI legacy Chat Completions path:
a model that models.dev reports as PDF-capable now receives PDFs as a file
content part rather than having them dropped (newer OpenAI models already handle
PDFs via the Responses API). The behavior is gated by resolved capabilities, so
nothing changes for models without PDF support. If restricting this strictly to
DMR seems more appropriate, it can be changed.

DMR-hosted models are not in the models.dev catalog, so the capability
lookup in modelinfo.LoadCaps always missed for "dmr/<model>" IDs and the
conversion path fell back to a text-only strategy, silently dropping image
and PDF attachments (issue #2739).

Let each DMR model declare its multimodal support explicitly via
provider_opts.supports_images and provider_opts.supports_pdf. The DMR client
parses and validates these at creation time, stores them as the model's
attachment capabilities, and injects them into message conversion through the
new oaistream.ConvertMessagesWithCaps, bypassing the models.dev lookup that
can never resolve a DMR model.

While here, the oaistream Chat Completions path now forwards PDFs as a file
content part (base64 data URI plus filename) instead of dropping them. This is
gated by the resolved capabilities, so it only triggers for models declared
PDF-capable: DMR via provider_opts, or catalog models that models.dev reports
as PDF-capable on the legacy Chat Completions path.

The conservative text-only default is unchanged: a model with no declared
capability still drops image and PDF attachments.

Fixes #2739.
@Sayt-0 Sayt-0 requested a review from a team as a code owner June 22, 2026 12:31
@docker-agent

Copy link
Copy Markdown

PR Review Failed — The review agent encountered an error and could not complete the review. View logs.

@aheritier aheritier added area/providers For features/issues/fixes related to LLM providers (Bedrock, LiteLLM, Qwen, custom, etc.) kind/fix PR fixes a bug (maps to fix: commit prefix) labels Jun 22, 2026
@Sayt-0 Sayt-0 requested a review from docker-agent June 22, 2026 13:31

@docker-agent docker-agent left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: 🟡 NEEDS ATTENTION

One MEDIUM severity finding in the new PDF attachment forwarding path. The fix for DMR attachment capabilities is well-structured overall — capabilities are parsed cleanly, threaded correctly through Client.attachmentCaps, and ConvertMessagesWithCaps resolves caps once per batch as intended. The image forwarding path is correct. The concern is in how PDF content is encoded for the Chat Completions file content part.

// upstream and never reach this branch.
return []openai.ChatCompletionContentPartUnionParam{
openai.FileContentPart(openai.ChatCompletionContentPartFileFileParam{
FileData: openai.String(dataURI(doc.MimeType, doc.Source.InlineData)),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] PDF file_data passes a data URI; OpenAI Chat Completions expects plain base64

The new PDF branch builds the file_data value with dataURI(...), producing a data:application/pdf;base64,<encoded> string:

openai.FileContentPart(openai.ChatCompletionContentPartFileFileParam{
    FileData: openai.String(dataURI(doc.MimeType, doc.Source.InlineData)),
    Filename: openai.String(doc.Name),
})

The OpenAI Chat Completions file content part spec expects file_data to be a plain base64-encoded string (no data: URI prefix), identical to how the Responses API input_file.file_data is specified. The data:mimetype;base64,... format is only valid for the url field of image_url content parts (used correctly in the image branch above).

Sending a data URI prefix in file_data will likely result in a 400/422 from any endpoint that follows the OpenAI Chat Completions spec strictly — including DMR backends that proxy to an OpenAI-compatible runtime. The code comment claims "Chat Completions accepts PDFs as a file content part carrying a base64 data URI", but this is not supported by the OpenAI API reference.

Suggested fix: strip the data URI prefix and pass only the raw base64 payload:

FileData: openai.String(base64.StdEncoding.EncodeToString(doc.Source.InlineData)),

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissing this finding as a false positive: the data URI is the required format, and the suggested fix would introduce a 400.

The Chat Completions file content part expects file_data to be a base64 data URI (data:application/pdf;base64,<encoded>), which is exactly what dataURI(...) produces here. A bare base64 string (the suggested fix) is what OpenAI rejects.

Evidence Finding
OpenAI file-inputs / pdf-files guide Every Python and JS example, for both Chat Completions and Responses, sets file_data: "data:application/pdf;base64,..."
OpenAI 400 for bare base64 Expected a base64-encoded data URL with an application/pdf MIME type (e.g. 'data:application/pdf;base64,...'), but got an invalid base64-encoded value.
Community threads (1144263, 1377908) Confirmed fix is to prepend data:application/pdf;base64,; bare base64 does not work
In-repo precedent pkg/model/provider/openai/attachments.go:65 already sends file_data as a data URI, asserted at attachments_test.go:55 (data:application/pdf;base64,JVBERg==)

The image branch in the same function uses the identical data URI convention, which is unambiguously required for image_url. The suggested base64.StdEncoding.EncodeToString(...) (no prefix) is precisely the input that triggers the 400 above, so applying it would have introduced the regression it claims to prevent.

No change made.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wrong — thank you for the detailed correction and for taking the time to document the evidence so thoroughly.

I misread the OpenAI Chat Completions spec: I mistakenly believed file_data required plain base64 (like some other APIs), when it actually requires the full data URI with MIME type prefix, exactly as your dataURI(...) helper produces. The 400 error message you quoted ("Expected a base64-encoded data URL with an application/pdf MIME type") makes this unambiguous, and the in-repo precedent at pkg/model/provider/openai/attachments.go:65 confirms the existing code already follows this pattern correctly. My suggested fix would indeed have caused the exact regression I claimed to prevent.

I apologize for the noise on this thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/providers For features/issues/fixes related to LLM providers (Bedrock, LiteLLM, Qwen, custom, etc.) kind/fix PR fixes a bug (maps to fix: commit prefix)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DMR local models: vision/PDF attachments always degrade to text-only

3 participants