This repository contains the demo page for DirectAudioEdit, a training-free and inversion-free method for text-guided audio editing.
Text-guided audio editing aims to modify the language-specified acoustic content while preserving edit-irrelevant source components. Existing training-free methods typically rely on inversion-based editing. While inversion-free editing is appealing as it decreases computational overhead and reconstruction errors, it remains largely unexplored for audio editing. The key challenge is to construct a source-to-target editing trajectory through diffusion denoising dynamics. In this paper, we introduce DirectAudioEdit, the first attempt to develop a training-free and inversion-free method for audio editing.
Visit niutrans.github.io/DirectAudioEdit to listen to audio editing samples and compare DirectAudioEdit with baseline methods (DDIM-inv, DDPM-inv, SDEdit).
@article{directaudioedit2026,
title={DirectAudioEdit: Inversion-Free Text-Guided Audio Editing via Diffusion Prediction Contrast},
author={Ge, Zhengkun and Liu, Xiaoqian and Zhang, Haoran and Ge, Yuan and Zhang, Junxiang and Yu, Zhengtao and Zhu, Jingbo and Xiao, Tong},
journal={arXiv preprint arXiv:2606.07356},
year={2026}
}
This demo page is for research purposes.