Add VibeVoice 7B support and decoder LoRA merging#14
Conversation
Support the VibeVoice 7B model alongside the 1.5B: - Add the vibevoice_7b model-manager package, reusing the shared Qwen2.5 tokenizer bundle. - Handle the 7B config.json, tolerating its upstream "acostic_vae_dim" key and reading top-level tie_word_embeddings. - Load a separate lm_head.weight when word embeddings are untied and bind the decoder logits head to it. Add optional PEFT LoRA merging for the decoder: - Merge lora_A/lora_B into the targeted linear weights at load time via a tensor-source decorator, so it composes with the weight-type quantization options at no per-step cost. - Expose it through the vibevoice.lora / vibevoice.lora_scale load options and document it in the README and docs/tts.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@justinjohn0306 Thank you for your PR! Could you add extra logs to confirm the LoRA loading is woring with 7B model and share the run log? Reviewed by Codex: |
vibevoice.lora was only read from SessionOptions, so the documented --load-option path silently no-op'd. Consume it at load time via a shared apply_vibevoice_finetune_options helper (still handles --session-option too), guarding against passing it through both. Extend the overlay to apply all four trained components (mirroring infer.py's apply_lora): the language-model LoRA is delta-merged into the decoder linears, and the fine-tuned diffusion head and acoustic/semantic connectors replace their base tensors. Connector/head .bin files are read by a new pure-C++ Torch pickle reader (torch_bin), parity-checked against torch.load. Add torch_bin_parity and vibevoice_finetune_overlay_check. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
LoRA loading confirmed on VibeVoice-7B +
|
|
Thanks! I will do a final test tomorrow and then merge it. |
|
@justinjohn0306 Code merged! Thanks a lot! |
|
@justinjohn0306 I did some changes to improve the lora performance. Just pushed acde132.
|
Support the VibeVoice 7B model alongside the 1.5B:
Add optional PEFT LoRA merging for the decoder: