STOMA is a multi-speaker Greek read-speech corpus designed to support modern Text-to-Speech (TTS) research for an under-resourced language. It contains ~24 hours of professionally recorded speech from six native speakers (3 male /3 female), recorded in controlled studio conditions and distributed as high-quality WAV audio.

 

 

What’s inside

The corpus includes linguistically rich text material (B2/C1/C2 level content from the Center for the Greek Language Text Bank, plus phonetically balanced sentences from the Greek Harvard corpus), with standardized processing and quality control.

  • Format: WAV (mono, 44.1 kHz, 16-bit PCM)
  • Use cases: TTS training, transfer learning, prosody/phonetics studies

 

Project page

Official database landing page:

 

Note for maintainers: the download links above assume a folder /downloads/ under the STOMA web root. If your ZIPs live elsewhere, just change the href targets accordingly.

Get In Touch

Foundation for Research and Technology - Hellas
Ν. Plastira 100, Vassilika Vouton GR-700 13, Heraklion, Crete

+30 2810 391825

pantazis@iacm.forth.gr