The project aims at investigating the generation of prosodic structure for a text-to-speech synthesis system. Being able to generate accurate prosody is one of the most crucial developments needed to get speech synthesis at a level of pleasant fluency. Within the project, prosody generation is considered a natural language processing problem rather than a speech technology problem: it is defined as the prediction of prosodic markers (accents and breaks) by means of automatic analyses of written texts, and is less concerned about how these markers need to be interpreted in terms of appropriate melodic, durational and other prosodic features when the text is converted into speech.
The central question is whether prosody generation can be accurately performed by (a) robust automatic analysis of texts using techniques from information retrieval and natural language processing, and (b) advanced machine learning systems and meta-learning systems such as combiners and boosting ensembles. The target language is Dutch.