Acoustic and Semantic Modeling of Emotion in Spoken Language
This thesis advances emotion understanding and synthesis in spoken language by proposing methods to jointly model acoustic and semantic information through pre-training strategies, hierarchical conversational architectures, and a textless speech-to-speech framework for controllable emotion style transfer.