Documente Academic
Documente Profesional
Documente Cultură
An Overview
Definition
The production of speech by machines, by way of the automatic phonetization of the sentences to utter
Text-To-Speech
Text Processing
Text Normalization Pronunciation Timing and Intonation
Speech Generation
Segmental Concatenation Waveform Synthesis
Functional Diagram
TTS Synthesizer Natural Language Processing
Text Morphosyntactic Analysis Letter-to-Sound Prosody Generation
NLP Module
Preprocessor Morphological Analyzer Contextual Analyzer Syntactic and Prosodic Parser
Morphosyntactic Analyzer
Letter-to-Sound Module
Text Preprocessing
Challenges
Text Preprocessing
Tokenizer
Classifier
Determines the most likely class for a given token January 1956 1956 potatoes
Expansion Module
Methods for expanding numbers and classes that can be handled algorithmically
Text Preprocessing
pt | o
pt o | t pt po
p(o): the probability of the observed text p(t): the prior probability of observing the tag t in the text p(o|t): a trigram letter language model for predicting observations of a particulat tag t
Morphological Analysis
Function Words
Determiners, Pronouns, Prepositions, Conjunctions Skeleton of sentence Stored in lexicon, along with pronunciation
Content Words
Inflection + Compounding Used for pronunciation and stressing
Synthesis
Input
Sequence of phonemes Prosodic Information
Output
Digital Speech
Synthesis Strategies
Synthesis by Rule
Cognitive approach of the phonation mechanism Speech is produced by mathematical rules that formally describe the influence of phonemes on one another
Synthesis by Concatenation
Limited knowledge of the data to be handled Elementary speech units are stored in a database and then concatenated and processed to produce the speech signal
Synthesis by Rule
Functional Diagram
Phone Names Prosody
DSP Module
Speech Science Parametric Speech Corpus
Speech Corpus
Rule Database
Rule Matching
Rule Finding
Signal Synthesis
Speech
Synthesis by Rule
Preparation
Synthesis
Rules are matched to phonetic input Production of parametric signal Synthesis of speech signal by re-implementing analysis model
Synthesis by Rule
Rule Efficiency Corpus Quality
Segmental Quality
Choice of utterances and recording quality Intrinsic Errors: Accuracy of model describing highquality speech
Even simple analysis-resynthesis may produce problems!
Synthesis by Rule
Formant Synthesizers
+ Speech is a dynamic evolution of up to 60 parameters
Formant, antiformant frequencies and bandwidths Glottal waveforms
Synthesis by Concatenation
Functional Diagram
Speech Science
Speech Corpus
DSP Module
Parametric Segment DB
Speech Segment DB
Segment Info
Signal Processing
Synthesis Segment DB
Synthesis by Concatenation
Compile and record utterances Segment signal and extract speech units Store segment waveforms (along with context) and extended information in database Extract parameters and create parametric segment database
Useful for data compaction Easier prosody matching and modification
Synthesis by Concatenation
Concatenating Segments
Pitch Modification
Relative shifting of localized signals Spacing reflects pitch duration Good result for modification factor =[0.6 1.5]
Duration
Localized signals are added or deleted from output