Training-Free Multi-Step Inference for Target Speaker Extraction
This paper proposes a training-free, multi-step inference method for target speaker extraction that iteratively refines speech estimates using a frozen pretrained model and introduces joint metric optimization to balance performance across intrusive and non-intrusive evaluation criteria.