fix: forward image and PDF attachments for DMR models by Sayt-0 · Pull Request #3197 · docker/docker-agent

Sayt-0 · 2026-06-22T12:31:37Z

Summary

DMR-hosted models are absent from the models.dev catalog, so modelinfo.LoadCaps
always missed for dmr/<model> IDs and image/PDF attachments were silently
dropped to a text-only strategy. This adds a per-model capability opt-in so DMR
vision/PDF models can forward attachments, and forwards PDFs as a native file
content part on the Chat Completions path.

Fixes #2739.

Issue expectations mapped to the implementation

Issue element	Implementation	Status
Vision/PDF attachments always degrade to text-only	Declared capabilities restore forwarding	done
Root cause: models.dev has no `dmr` provider, so the lookup always misses	DMR path no longer depends on the lookup; capabilities are injected explicitly	done
Possible fix: local allowlist of known DMR models	Not chosen (brittle, needs ongoing upkeep)	rejected
Possible fix: provider-level opt-in (capability hint in model config)	Chosen: `provider_opts.supports_images` / `supports_pdf`	done
Possible fix: query a DMR capability API	Not chosen (no stable capability API exposed; adds latency and a CLI dependency)	rejected

How it works

models:
  qwen-vl:
    provider: dmr
    model: ai/qwen2.5-vl
    provider_opts:
      supports_images: true
      supports_pdf: false

provider_opts is parsed and validated at client creation, stored as
Client.attachmentCaps, and injected via oaistream.ConvertMessagesWithCaps.
The models.dev lookup is skipped on the DMR path because it can never resolve a
dmr/<model> ID.

Changes

Area	Change
`pkg/model/provider/dmr/configure.go`	`parseBoolOpt`, plus parse and validate `supports_images` / `supports_pdf`
`pkg/model/provider/dmr/client.go`	`Client.attachmentCaps` set in `NewClient`, injected in `convertMessages`
`pkg/model/provider/oaistream/messages.go`	`ConvertMessagesWithCaps`; internals thread `modelinfo.ModelCapabilities`, resolving caps once per batch
`pkg/model/provider/oaistream/attachments.go`	Removed the unused store-variant `convertDocument`; PDFs forwarded as a `file` content part instead of dropped
`agent-schema.json`, `examples/dmr.yaml`	Documented the keys; added a `qwen_vision` example
tests	DMR parse/conversion/end-to-end and oaistream caps/PDF tests

Behavior

Case	Before	After
DMR vision model, image attachment	dropped (text-only)	forwarded when `supports_images: true`
DMR model, PDF attachment	dropped	forwarded as file part when `supports_pdf: true`
DMR model, no declared capability	dropped	dropped (unchanged, conservative default)
Catalogued providers (openai, anthropic, ...)	models.dev lookup	unchanged

Testing

pkg/model/provider/dmr, pkg/model/provider/oaistream, pkg/modelinfo, pkg/attachment, pkg/model/provider/openai: pass.
pkg/config example parsing (TestParseExamples) including the new dmr.yaml entry: pass.
golangci-lint and the repository linter: clean.

Note for reviewers

The PDF forwarding change also affects the OpenAI legacy Chat Completions path:
a model that models.dev reports as PDF-capable now receives PDFs as a file
content part rather than having them dropped (newer OpenAI models already handle
PDFs via the Responses API). The behavior is gated by resolved capabilities, so
nothing changes for models without PDF support. If restricting this strictly to
DMR seems more appropriate, it can be changed.

DMR-hosted models are not in the models.dev catalog, so the capability lookup in modelinfo.LoadCaps always missed for "dmr/<model>" IDs and the conversion path fell back to a text-only strategy, silently dropping image and PDF attachments (issue #2739). Let each DMR model declare its multimodal support explicitly via provider_opts.supports_images and provider_opts.supports_pdf. The DMR client parses and validates these at creation time, stores them as the model's attachment capabilities, and injects them into message conversion through the new oaistream.ConvertMessagesWithCaps, bypassing the models.dev lookup that can never resolve a DMR model. While here, the oaistream Chat Completions path now forwards PDFs as a file content part (base64 data URI plus filename) instead of dropping them. This is gated by the resolved capabilities, so it only triggers for models declared PDF-capable: DMR via provider_opts, or catalog models that models.dev reports as PDF-capable on the legacy Chat Completions path. The conservative text-only default is unchanged: a model with no declared capability still drops image and PDF attachments. Fixes #2739.

docker-agent · 2026-06-22T13:02:40Z

❌ PR Review Failed — The review agent encountered an error and could not complete the review. View logs.

docker-agent

Assessment: 🟡 NEEDS ATTENTION

One MEDIUM severity finding in the new PDF attachment forwarding path. The fix for DMR attachment capabilities is well-structured overall — capabilities are parsed cleanly, threaded correctly through Client.attachmentCaps, and ConvertMessagesWithCaps resolves caps once per batch as intended. The image forwarding path is correct. The concern is in how PDF content is encoded for the Chat Completions file content part.

docker-agent · 2026-06-22T13:36:26Z

+			// upstream and never reach this branch.
+			return []openai.ChatCompletionContentPartUnionParam{
+				openai.FileContentPart(openai.ChatCompletionContentPartFileFileParam{
+					FileData: openai.String(dataURI(doc.MimeType, doc.Source.InlineData)),


[MEDIUM] PDF file_data passes a data URI; OpenAI Chat Completions expects plain base64

The new PDF branch builds the file_data value with dataURI(...), producing a data:application/pdf;base64,<encoded> string:

openai.FileContentPart(openai.ChatCompletionContentPartFileFileParam{ FileData: openai.String(dataURI(doc.MimeType, doc.Source.InlineData)), Filename: openai.String(doc.Name), })

The OpenAI Chat Completions file content part spec expects file_data to be a plain base64-encoded string (no data: URI prefix), identical to how the Responses API input_file.file_data is specified. The data:mimetype;base64,... format is only valid for the url field of image_url content parts (used correctly in the image branch above).

Sending a data URI prefix in file_data will likely result in a 400/422 from any endpoint that follows the OpenAI Chat Completions spec strictly — including DMR backends that proxy to an OpenAI-compatible runtime. The code comment claims "Chat Completions accepts PDFs as a file content part carrying a base64 data URI", but this is not supported by the OpenAI API reference.

Suggested fix: strip the data URI prefix and pass only the raw base64 payload:

FileData: openai.String(base64.StdEncoding.EncodeToString(doc.Source.InlineData)),

Dismissing this finding as a false positive: the data URI is the required format, and the suggested fix would introduce a 400.

The Chat Completions file content part expects file_data to be a base64 data URI (data:application/pdf;base64,<encoded>), which is exactly what dataURI(...) produces here. A bare base64 string (the suggested fix) is what OpenAI rejects.

Evidence Finding

OpenAI file-inputs / pdf-files guide Every Python and JS example, for both Chat Completions and Responses, sets file_data: "data:application/pdf;base64,..."

OpenAI 400 for bare base64 Expected a base64-encoded data URL with an application/pdf MIME type (e.g. 'data:application/pdf;base64,...'), but got an invalid base64-encoded value.

Community threads (1144263, 1377908) Confirmed fix is to prepend data:application/pdf;base64,; bare base64 does not work

In-repo precedent pkg/model/provider/openai/attachments.go:65 already sends file_data as a data URI, asserted at attachments_test.go:55 (data:application/pdf;base64,JVBERg==)

The image branch in the same function uses the identical data URI convention, which is unambiguously required for image_url. The suggested base64.StdEncoding.EncodeToString(...) (no prefix) is precisely the input that triggers the 400 above, so applying it would have introduced the regression it claims to prevent.

No change made.

I was wrong — thank you for the detailed correction and for taking the time to document the evidence so thoroughly.

I misread the OpenAI Chat Completions spec: I mistakenly believed file_data required plain base64 (like some other APIs), when it actually requires the full data URI with MIME type prefix, exactly as your dataURI(...) helper produces. The 400 error message you quoted ("Expected a base64-encoded data URL with an application/pdf MIME type") makes this unambiguous, and the in-repo precedent at pkg/model/provider/openai/attachments.go:65 confirms the existing code already follows this pattern correctly. My suggested fix would indeed have caused the exact regression I claimed to prevent.

I apologize for the noise on this thread.

Sayt-0 requested a review from a team as a code owner June 22, 2026 12:31

aheritier added area/providers For features/issues/fixes related to LLM providers (Bedrock, LiteLLM, Qwen, custom, etc.) kind/fix PR fixes a bug (maps to fix: commit prefix) labels Jun 22, 2026

Sayt-0 requested a review from docker-agent June 22, 2026 13:31

docker-agent reviewed Jun 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: forward image and PDF attachments for DMR models#3197

fix: forward image and PDF attachments for DMR models#3197
Sayt-0 wants to merge 1 commit into
mainfrom
fix/2739-dmr-attachment-caps

Sayt-0 commented Jun 22, 2026

Uh oh!

docker-agent commented Jun 22, 2026

Uh oh!

docker-agent left a comment

Uh oh!

docker-agent Jun 22, 2026

Uh oh!

Sayt-0 Jun 22, 2026

Uh oh!

docker-agent Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Evidence	Finding
OpenAI file-inputs / pdf-files guide	Every Python and JS example, for both Chat Completions and Responses, sets `file_data: "data:application/pdf;base64,..."`
OpenAI 400 for bare base64	`Expected a base64-encoded data URL with an application/pdf MIME type (e.g. 'data:application/pdf;base64,...'), but got an invalid base64-encoded value.`
Community threads (1144263, 1377908)	Confirmed fix is to prepend `data:application/pdf;base64,`; bare base64 does not work
In-repo precedent	`pkg/model/provider/openai/attachments.go:65` already sends `file_data` as a data URI, asserted at `attachments_test.go:55` (`data:application/pdf;base64,JVBERg==`)

Conversation

Sayt-0 commented Jun 22, 2026

Summary

Issue expectations mapped to the implementation

How it works

Changes

Behavior

Testing

Note for reviewers

Uh oh!

docker-agent commented Jun 22, 2026

Uh oh!

docker-agent left a comment

Choose a reason for hiding this comment

Assessment: 🟡 NEEDS ATTENTION

Uh oh!

docker-agent Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Sayt-0 Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

docker-agent Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants