STOMA Database

STOMA is a multi-speaker Greek read-speech corpus designed to support modern Text-to-Speech (TTS) research for an under-resourced language. It contains approximately 23 hours of professionally recorded speech from six native speakers (3 male /3 female), recorded in controlled studio conditions and distributed as high-quality WAV audio.

What’s inside

The corpus includes linguistically rich text material (B2/C1/C2 level content from the Center for the Greek Language Text Bank, plus phonetically balanced sentences from the Greek Harvard corpus), with standardized processing and quality control.

Format: WAV (mono, 44.1 kHz, 16-bit PCM)
Use cases: TTS training, transfer learning, prosody/phonetics studies

Download

Zipped bundles hosted on the STOMA server (choose full corpus or per-speaker subsets).

Project page

Official database landing page:

https://stoma.iacm.forth.gr/database.html

Note for maintainers: the download links above assume a folder /downloads/ under the STOMA web root. If your ZIPs live elsewhere, just change the href targets accordingly.

STOMA Database

What’s inside

Download

Project page

Get In Touch