PhD defense : Raheel Qader: Pronunciation and disfluency modeling for expressive speech synthesis, 31st March, 2017.

Désolé, cet article est seulement disponible en English.

In this thesis, we address the problem of expressivity in Text-To-Speech (TTS) by incorporating two phenomena with a high impact on speech: pronunciation variants and speech disfluencies. In the first part of this thesis, we present a pronunciation variant generation method which works by adapting standard, i.e., dictionary-based, pronunciations to a spontaneous style. Its strength lies in exploiting a wide range of linguistic, articulatory and acoustic features and to use a probabilistic machine earning framework, namely conditional random fields (CRFs) and language models. The second part of the thesis explores a new approach to automatic generation of speech disfluencies for TTS. The proposed approach provides the advantage of generating several types of disfluencies: pauses, repetitions and revisions. To achieve this task, we formalize the problem as a theoretical process, where transformation functions are iteratively composed to insert disfluencies. We present a first implementation of the proposed process using CRFs and language models.