PhD defense : Raheel Qader: Pronunciation and disfluency modeling for expressive speech synthesis, 31st March, 2017.

Désolé, cet article est seulement disponible en Anglais Américain.

In this thesis, we address the problem of expressivity in Text-To-Speech (TTS) by incorporating two phenomena with a high impact on speech: pronunciation variants and speech disfluencies. In the first part of this thesis, we present a pronunciation variant generation method which works by adapting standard, i.e., dictionary-based, pronunciations to a spontaneous style. Its strength lies in exploiting a wide range of linguistic, articulatory and acoustic features and to use a probabilistic machine earning framework, namely conditional random fields (CRFs) and language models. The second part of the thesis explores a new approach to automatic generation of speech disfluencies for TTS. The proposed approach provides the advantage of generating several types of disfluencies: pauses, repetitions and revisions. To achieve this task, we formalize the problem as a theoretical process, where transformation functions are iteratively composed to insert disfluencies. We present a first implementation of the proposed process using CRFs and language models.

Les commentaires sont clos.