Disentangling Speech Representations with Mutual Information Estimators for Expressive Synthesis

 

Abstract

Disentangled speech representation allows for precise control over individual speech attributes, such as content, speaker identity, and style, enabling more flexible and natural voice synthesis engines. This study advances speech synthesis by developing innovative disentangled speech representation algorithms. Techniques grounded in Information Theory such as recently-proposed regularized variational mutual information estimators supplemented with gradient reversal layer were integrated to refine the representation of independent speech attributes. Using the Expresso dataset within the FastSpeech 2 framework, this work demonstrates significant improvements in the controllability and quality of synthetic speech. Objective metrics including cosine similarity matrices, perceptual evaluation of speech quality (PESQ), and short-term objective intelligence (STOI), complemented by subjective assessment of speech quality, were evaluated. The results show that the proposed methods outperform existing approaches, evidenced by superior A/B testing outcomes, improved inter-cluster distance metrics, and enhanced PESQ and STOI scores, highlighting the advancements of the developed systems in intelligibility, naturalness, and overall speech quality.

 

Testing Results


 

Use Case 1: Multispeaker Audio Synthesis Results by Model

sp0 sp1 sp2 sp3
CC
CC&GRL
CLUB
GRL
INFO
MINE
WC
         

 

 

Use Cases 2-4: Multispeaker Audio Synthesis Results by Model

Use Case 2 Use Case 3 Use Case 4
Confused
Default
Enunc
Happy
Laugh
Sad
Whisper
         

 

Citing:

T. Kassiotis and Y. Pantazis: Disentangling Speech Representations with Mutual Information Estimators for Expressive Synthesis. 33rd European Signal Processing Conference (EUSIPCO 2025), Palermo, Italy, September 8-12, 2025.

Get In Touch

Foundation for Research and Technology - Hellas
Ν. Plastira 100, Vassilika Vouton GR-700 13, Heraklion, Crete

+30 2810 391825

pantazis@iacm.forth.gr