Welcome to STOMA project

Text-to-Speech (TTS) Synthesis plays a crucial role in facilitating human-machine communication by transforming text into natural speech signals. Recent TTS systems, powered by neural network models, have made significant strides in improving speech quality, naturalness, and expressiveness. However, these models come with millions of trainable parameters, demanding substantial computational resources and high-quality recordings for development and deployment. Current state-of-the-art neural models aren't suitable for real-time on-device speech generation, leading to server-side processing. Speech, as an efficient communication medium, faces challenges due to reduced hearing capacity and noisy environments, impacting intelligibility. While speech synthesis can enhance intelligibility, it often results in sound quality deterioration. The ongoing research challenge is to create neural-based models that offer adaptability, naturalness, and intelligibility in speech synthesis. STOMA aims to address this challenge and has outlined specific objectives, including reducing neural network parameters, enhancing system robustness through signal decomposition, accelerating real-time speech synthesis on mobile devices, adapting speaking style for intelligibility in noisy environments, building speech databases for improved intelligibility, measuring the performance of hybrid TTS synthesis systems using objective and subjective metrics, and ultimately demonstrating and disseminating research outcomes.

STOMA Research

STOMA has received funding from the Hellenic Foundation for Research and Innovation (Project 4753).

Get In Touch

Foundation for Research and Technology - Hellas
Ν. Plastira 100, Vassilika Vouton GR-700 13, Heraklion, Crete

+30 2810 391825

pantazis@iacm.forth.gr