-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[WIP] Update diffusers-cli for agentic use
#13966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
DN6
wants to merge
32
commits into
main
Choose a base branch
from
diffuser-cli-for-agent
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
32 commits
Select commit
Hold shift + click to select a range
e84a3ef
update
DN6 59be753
update
DN6 4194c39
update
DN6 d8eb952
update
DN6 95f33c7
update
DN6 accfa06
update
DN6 4d4d9e8
update
DN6 f97aef8
update
DN6 3774951
update
DN6 add747b
update
DN6 934b557
update
DN6 0ae1eb0
update
DN6 dcfd09c
update
DN6 2221383
Merge remote-tracking branch 'origin' into diffuser-cli-for-agent
DN6 9515c55
update
DN6 404be8a
update
DN6 f3fa589
update
DN6 633461d
update
DN6 268bae9
update
DN6 fa7a0a2
update
DN6 55e1c14
update
DN6 6ba7a3f
update
DN6 6f02aed
update
DN6 889f646
update
DN6 ab70d69
pdate
DN6 af8cbf4
update
DN6 b50dae1
update
DN6 1d6f5b3
update
DN6 46849ae
Merge branch 'main' into diffuser-cli-for-agent
sayakpaul 0439192
update
DN6 bf8fe64
update
DN6 8354f6e
update
DN6 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,68 @@ | ||
| --- | ||
| name: diffusers-cli | ||
| description: > | ||
| Use when the user wants to run a diffusers pipeline from a terminal (one-off | ||
| generation, batch jobs, smoke-testing a new model), submit jobs to HF Jobs | ||
| hardware via `--remote`, introspect a pipeline's input schema before | ||
| calling it, or attach a LoRA at inference time. Prefer this over writing | ||
| ad-hoc Python scripts for generation tasks. | ||
| --- | ||
|
|
||
| ## Overview | ||
|
|
||
| `diffusers-cli` is the shipped CLI in `src/diffusers/commands/`. Subcommands relevant to agentic use: | ||
|
|
||
| | Command | Purpose | | ||
| | --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||
| | `generate` | Run any `DiffusionPipeline` or `ModularPipeline`. Forwards `--pipeline-kwargs` verbatim, saves output by sniffing its runtime type, optionally runs on HF Jobs via `--remote`. | | ||
| | `describe` | Print the input schema for a pipeline repo (kwarg names, types, defaults, descriptions). **No weights downloaded** — only the small index file. | | ||
| | `custom_blocks` | Package a local `ModularPipelineBlocks` subclass for the Hub. | | ||
| | `env` | Print versions of diffusers + torch + transformers + accelerate + safetensors + CUDA + GPU info. Use when investigating environment issues, dtype/precision support, or building bug reports. | | ||
|
|
||
| ## When to read which file | ||
|
|
||
| Most agentic work goes through `generate`. Read the matching reference file before constructing a command: | ||
|
|
||
| - **[`generate.md`](generate.md)** — full reference for `diffusers-cli generate`. Covers `--pipeline-kwargs` | ||
| semantics and the shell-quoting gotcha, LoRA via `--lora`, optimization flags (`--dtype`, `--cpu-offload`, | ||
| `--attention-backend`, `--vae-tiling/slicing`), output handling and `--push-to` bucket uploads, the full | ||
| `--remote` HF Jobs flow (image, container command, log streaming, timing payload, artifact download), and | ||
| context parallel (`--context-parallel`) for both local-torchrun and `--remote` paths. | ||
|
|
||
| The other commands are small enough that `diffusers-cli <command> --help` is the canonical reference: | ||
|
|
||
| ```bash | ||
| diffusers-cli describe --help | ||
| diffusers-cli custom_blocks --help | ||
| diffusers-cli env --help | ||
| ``` | ||
|
|
||
| ## When NOT to use this skill | ||
|
|
||
| - Multi-stage workflows where you need intermediate tensor manipulation between pipelines → write Python. | ||
| - Training or fine-tuning → CLI only covers inference. | ||
| - Anything requiring custom `device_map`, `quantization_config`, or other low-level loader knobs not exposed by | ||
|
sayakpaul marked this conversation as resolved.
|
||
| the CLI flags → write Python. | ||
|
|
||
| ## Verifying the CLI is installed | ||
|
|
||
| The console entry point is registered in `pyproject.toml` (`diffusers-cli = | ||
| "diffusers.commands.diffusers_cli:main"`). If `diffusers-cli` is not on PATH after `pip install -e .`, reinstall | ||
| with `pip install -e . --force-reinstall --no-deps` and check `which diffusers-cli`. If the installed binary is | ||
| missing recent features (e.g. you see `unrecognized arguments: --lora`), reinstall. | ||
|
|
||
| ## Output formats | ||
|
|
||
| `--format {auto, human, agent, json}` (top-level flag, must appear before the subcommand): | ||
|
|
||
| - **`human`** — plain-text indented output for terminals (default when not running under an agent harness). No ANSI color. | ||
| - **`agent`** — TSV tables and `key=value` lines. Auto-selected when an agent env var is present | ||
| (`CLAUDECODE`, `CLAUDE_CODE`, `CODEX_SANDBOX`, `CURSOR_AI`, `AIDER_AI_CONTEXT`, `GH_COPILOT_AGENT`, | ||
| `AI_AGENT`). Token-cheap for LLM agents to read. | ||
| - **`json`** — compact JSON. Use for programmatic parsing (scripts, services) where type fidelity and nested | ||
| structures matter. | ||
|
|
||
| `stdout` carries data; `stderr` carries hints/warnings/progress — parseable output is never polluted. | ||
|
|
||
| Rule of thumb: `--format json` for scripts that will `json.loads()` the output, otherwise leave it on | ||
| auto-detect (`agent` for LLMs, `human` for terminals). | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,175 @@ | ||
| # `diffusers-cli generate` — reference | ||
|
|
||
| Full surface for `diffusers-cli generate`. Use this file as the source of truth when constructing a `generate` | ||
| invocation. The top-level [`SKILL.md`](SKILL.md) covers when to use the CLI; this file covers how. | ||
|
|
||
| ## The describe → generate flow | ||
|
|
||
| For any model you haven't called before, run `describe` first to learn its input contract, then `generate` with | ||
| the right `--pipeline-kwargs`: | ||
|
|
||
| ```bash | ||
| # 1. Discover what kwargs the pipeline takes (no weight download) | ||
| diffusers-cli --format json describe --model black-forest-labs/FLUX.2-klein-9B | ||
|
|
||
| # 2. Run it | ||
| diffusers-cli generate \ | ||
| --model black-forest-labs/FLUX.2-klein-9B \ | ||
| --pipeline-kwargs '{"prompt": "Make the cats fur grey", "image": "https://blobcdn.same.energy/a/d0/58/d058b51c2329b0ea4057e9f12cd9a1da36347e34"}' \ | ||
| --dtype bf16 | ||
| ``` | ||
|
|
||
| `describe --format json` emits a `{task, model, pipeline_class, inputs[]}` payload where each input is | ||
| `{name, type_hint, default, required, description}`. | ||
|
|
||
| ## Standard vs modular detection | ||
|
|
||
| `generate` auto-detects which kind of pipeline it's calling: | ||
|
|
||
| 1. If `model_index.json` exists on the repo → `DiffusionPipeline.from_pretrained` path. | ||
| 2. Otherwise → `ModularPipeline.from_pretrained` path. | ||
|
|
||
| You don't need to tell it which. Modular repos must pass `--trust-remote-code` if they ship custom block code. | ||
|
|
||
| ## `--pipeline-kwargs` semantics | ||
|
|
||
| A JSON object passed straight through to `pipeline(**kwargs)`. String values at known image-input keys (`image`, | ||
| `mask_image`, `control_image`, `ip_adapter_image`, `image_2`) are auto-loaded as PIL images, so you can pass URLs | ||
| or local paths directly: | ||
|
|
||
| ```bash | ||
| diffusers-cli generate \ | ||
| --model black-forest-labs/FLUX.2-klein-9B \ | ||
| --pipeline-kwargs '{"image": "https://example.com/cat.png", "prompt": "make the fur grey", "strength": 0.6}' | ||
| ``` | ||
|
|
||
| **Shell-quoting gotcha**: the JSON must be on one line (or use `\` to line-continue). A literal newline inside the | ||
| single-quoted argument lands as a raw control char inside the string and breaks `json.loads`. | ||
|
|
||
| ## LoRA adapters (`--lora`) | ||
|
|
||
| Attach a LoRA after the pipeline loads via a JSON spec: | ||
|
|
||
| ```bash | ||
| diffusers-cli generate \ | ||
| --model black-forest-labs/FLUX.2-klein-9B \ | ||
| --pipeline-kwargs '{"prompt": "a tiny grey cat"}' \ | ||
| --lora '{"lora_id": "alvdansen/littletinies", "lora_scale": 0.8}' | ||
| ``` | ||
|
|
||
| Calls `pipeline.load_lora_weights(<lora_id>, adapter_name="default")` and, if `lora_scale` is present, | ||
| `pipeline.set_adapters(["default"], adapter_weights=[<scale>])`. Errors clearly if the pipeline doesn't support | ||
| LoRA or `lora_id` is missing. | ||
|
|
||
| ## Optimization flags | ||
|
|
||
| - `--dtype {auto, bf16, fp16, fp32, …}` — pipeline weight dtype. `bf16` is the right default for modern DiTs on | ||
| A100/H100. | ||
| - `--cpu-offload {model, group}` — `model` uses `enable_model_cpu_offload`, `group` uses | ||
| `enable_group_offload(offload_type="leaf_level", use_stream=True)`. Use `group` to fit a 9B+ model on a single A100. | ||
| - `--attention-backend {default, flash_hub, flash_varlen_hub, flash_4_hub, sage_hub}` — hub-hosted kernels, | ||
| auto-downloaded on first use. Failures (kernel not available, CUDA arch mismatch, network) raise a clear | ||
| `SystemExit` listing the alternatives instead of silently reverting to the default. | ||
| - `--vae-tiling` / `--vae-slicing` — lower peak VAE decode VRAM. | ||
| - `--context-parallel` — Ulysses-style context parallelism on a DiT. See [Context parallel](#context-parallel) below. | ||
|
|
||
| `disable_mmap=True` is always passed to `from_pretrained` — sequential reads are faster than mmap page-faults on | ||
| most filesystems. | ||
|
|
||
| ## Output handling | ||
|
|
||
| `generate` sniffs the pipeline return type and saves accordingly: | ||
|
|
||
| - `PIL.Image` / list of them → `outputs/generate-<i>.png` | ||
| - Frame sequence (≥2 PILs or ndarrays) → `outputs/generate-0.mp4` (uses `--fps`, default 8) | ||
| - Numpy audio array → `outputs/generate-0.wav` (uses `--sampling-rate`) | ||
| - Anything else → JSON dump | ||
|
|
||
| Override the destination with `--output <path>` (file or directory). | ||
|
|
||
| Use `--push-to <user>/<bucket>` to upload outputs to an HF bucket after saving. The bucket is created if it | ||
| doesn't exist; objects land under `<run_id>/<filename>`. | ||
|
|
||
| ## Remote execution (`--remote`) | ||
|
|
||
| Adds `--remote` to submit the same call as a Hugging Face Job: | ||
|
|
||
| ```bash | ||
| diffusers-cli generate \ | ||
| --model black-forest-labs/FLUX.2-klein-9B \ | ||
| --pipeline-kwargs '{"prompt": "Make the cats fur grey", "image": "https://blobcdn.same.energy/a/d0/58/d058b51c2329b0ea4057e9f12cd9a1da36347e34"}' \ | ||
| --remote --flavor a100-large \ | ||
| --dtype bf16 \ | ||
| --cpu-offload group | ||
| ``` | ||
|
|
||
| What happens: | ||
|
|
||
| 1. Your HF token is picked up (from `--token` or your login). | ||
| 2. A bucket (`<user>/jobs-artifacts` by default) is created if it doesn't exist. | ||
| 3. The job runs in a pytorch container that already has torch + CUDA preinstalled. Only the small Python | ||
| deps (`diffusers`, `accelerate`, `transformers`, `safetensors`) are installed at container start — about | ||
| 50 MB instead of 3 GB. | ||
| 4. Container logs stream to your terminal. When the job finishes, the CLI downloads every file the job | ||
| uploaded to the bucket under its `run_id` prefix into `./outputs/`. | ||
| 5. A timing breakdown (`queued_seconds`, `run_seconds`, `total_seconds`) is printed and included in the JSON | ||
| payload. | ||
|
|
||
| Flags: | ||
|
|
||
| - `--flavor <name>` — HF Jobs hardware (e.g. `a10g-small`, `a100-large`, `4xa100-large`). | ||
| - `--timeout <duration>` — max wallclock (e.g. `30m`, `2h`). Defaults to `10m`. | ||
| - `--dependencies <pkg>` — extra pip deps (repeatable). | ||
| - `--namespace <name>` — run under a different account. | ||
| - `--no-wait` — submit, return job id, don't stream logs. | ||
| - `--push-to <bucket>` — override the artifact bucket id. | ||
|
|
||
| ## Context parallel | ||
|
|
||
| `--context-parallel` enables Ulysses CP on a DiT-based pipeline. **Locally** the user must launch via torchrun: | ||
|
|
||
| ```bash | ||
| torchrun --nproc-per-node=2 -m diffusers.commands.diffusers_cli generate \ | ||
| --model black-forest-labs/FLUX.2-klein-9B \ | ||
| --pipeline-kwargs '{"prompt": "Make the cats fur grey"}' \ | ||
| --dtype bf16 \ | ||
| --context-parallel | ||
| ``` | ||
|
|
||
| **Remotely** the CLI handles the torchrun wrapping — just pass `--context-parallel` to a `--remote` invocation on | ||
| a multi-GPU flavor: | ||
|
|
||
| ```bash | ||
| diffusers-cli generate \ | ||
| --model black-forest-labs/FLUX.2-klein-9B \ | ||
| --pipeline-kwargs '{"prompt": "Make the cats fur grey", "image": "https://blobcdn.same.energy/a/d0/58/d058b51c2329b0ea4057e9f12cd9a1da36347e34"}' \ | ||
| --remote --flavor 4xa100-large \ | ||
| --dtype bf16 \ | ||
| --context-parallel | ||
| ``` | ||
|
|
||
| Inside the container, CP swaps the entrypoint to `torchrun --nproc-per-node=gpu -m | ||
| diffusers.commands.diffusers_cli`, initializes a hybrid process group (`cpu:gloo,cuda:nccl` — NCCL for the | ||
| attention all-to-all, Gloo for `ulysses_anything`'s per-rank size coordination), pins each rank to | ||
| `cuda:{LOCAL_RANK}`, and gates output saving/printing to rank 0 only. | ||
|
|
||
| **Memory note**: CP shards the sequence, **not the weights**. Every rank still holds the full transformer. Wins | ||
| are wall-clock attention speedup and headroom for very long sequences, not "fit a model that doesn't fit." For | ||
| weight sharding you'd want TP or FSDP — not exposed in the CLI yet. | ||
|
|
||
| CP is DiT-only. UNet pipelines raise a clear error directing you to a DiT pipeline (FLUX, SD3, HunyuanDiT, | ||
| AuraFlow, …). | ||
|
|
||
| ## Output mode (`--format`) | ||
|
|
||
| The CLI auto-detects when running under an AI coding agent (Claude Code, Cursor, Aider, GH Copilot Agent — via | ||
| `CLAUDECODE`, `CLAUDE_CODE`, `CURSOR_AI`, `AIDER_AI_CONTEXT`, `GH_COPILOT_AGENT`) and switches output to **agent | ||
| mode** automatically — TSV tables, `key=value` results, compact JSON dicts, no progress bars. | ||
|
|
||
| Override explicitly with `--format {auto, human, agent, json}` placed **before** the subcommand: | ||
|
|
||
| ```bash | ||
| diffusers-cli --format json generate --model <id> --pipeline-kwargs '...' | ||
| ``` | ||
|
|
||
| The legacy `--json` flag on `generate` still works as a shortcut for `--format json`. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| # Copyright 2026 The HuggingFace Team. All rights reserved. | ||
|
sayakpaul marked this conversation as resolved.
|
||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| """Shared helpers used by multiple ``diffusers-cli`` subcommands. | ||
|
|
||
| Anything imported by more than one command file lives here so command modules stay standalone — no cross-command | ||
| imports between e.g. ``describe`` and ``generate``. | ||
| """ | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| from argparse import Namespace | ||
| from pathlib import Path | ||
|
|
||
|
|
||
| def try_fetch_config(args: Namespace, filename: str) -> str | None: | ||
| """Resolve ``filename`` for ``args.model`` (local path or Hub repo). Return None if absent. | ||
|
|
||
| Used by ``generate`` (to detect modular vs standard pipelines) and ``describe`` (to read the pipeline class for | ||
| schema introspection) — no weights are downloaded, only the small index file. | ||
| """ | ||
| local = Path(args.model) | ||
| if local.exists(): | ||
| candidate = local / filename | ||
| return str(candidate) if candidate.exists() else None | ||
|
|
||
| from huggingface_hub import hf_hub_download | ||
| from huggingface_hub.utils import EntryNotFoundError, HfHubHTTPError, RepositoryNotFoundError | ||
|
|
||
| try: | ||
| return hf_hub_download(args.model, filename, revision=args.revision, token=args.token) | ||
| except (EntryNotFoundError, HfHubHTTPError, RepositoryNotFoundError): | ||
| return None | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.