SlideShare a Scribd company logo
1 of 21
Download to read offline
ANALYSIS OF IMAGES, SOCIAL NETWORKS,AND TEXTS
April, 9-11th, 2015, Yekaterinburg
Normalization of Non-Standard Words
with Finite State Transducers
for Russian Speech Synthesis
Artem Lukanin
Text Preprocessing for Speech Synthesis
• is usually a very complex task
• Text normalization is one of the steps in text preprocessing [1]
• sentence segmentation
• tokenization
• normalization of non-standard words (NSWs)
• numbers, abbreviations, and acronyms
• different characters like % , $ , # , № , etc.
2
Normalization of Non-Standard Words
• NSWs must be expanded into full SW to be pronounced correctly
• It's even more complex in inflective languages such as Russian
• ordinal number can be converted into 36 different word forms (6
cases * 2 numers * 3 genders)
• digit position changes the output standard word
• 1111 1 — первый
• 111 11 — одиннадцатый
• 11 1 11 — сто
• 1 1 111 — тысяча
• 11 111 — одиннадцать
тысяч
3
Existing Russian Normalization Systems
• As a part of proprietory Text-to-Speech (TTS) systems
• Google Translate, https://translate.google.ru/
• VitalVoice, http://cards.voicefabric.ru/
• Windows SAPI voices, etc.
• As a part of open-source TTS systems
• Festival [2]
• only digit-by-digit number normalization for the Russian voice
4
Normatex
• is the first Russian open-source normalization system, known to the
author, github.com/avlukanin/normatex
• If the input texts are normalized beforehand the quality of the
synthesized speech of existing TTS systems can be improved
• 118 finite state transducers (FSTs) for conversion of cardinal and ordinal
numbers into the corresponding numerals, which can preprocess
different ranges, time, dates, telephone numbers, postal codes, etc.
• 33 FSTs for normalization of graphic abbreviations and acronyms
5
Test Parallel Corpus
• 66 original texts of the official site of South Ural State University,
susu.ac.ru, which contains 38,439 tokens (broad segmentation units [3]):
• 14,661 word tokens
• 333 acronyms and 98 initials; 379 graphic abbreviations
• 977 number tokens (2,511 digits)
• 66 manually preprocessed texts, where all numbers, abbreviations and
acronyms were expanded into full words or replaced with pronounceable
combination of letters
6
Finite State Transducers
• are developed in the form of graphs in Unitex 3.1beta
• Before applying FSTs to a text, it is preprocessed:
• The text is splitted into sentences
• The text is tokenized
• Every token is assigned all possible grammatical forms
• Number FSTs are applied first to deal with numbers and measure unit
abbreviations
• Abbreviation FSTs and acronym FSTs are applied sequentially after that
7
Cardinal Numbers
• agree with nouns in case, but the numerals один “one” and два “two”
agree in gender as well
• all the constituent words of a compound numeral agree with the
corresponding noun: двадцати одного and двадцати одной (“twenty-
one” in gen. m. and f.)
• одни (“one” in plural) agrees only with pluralia tantum, e.g. одни
ножницы “one pair of scissors”, одни брюки “one pair of pants” [4]
8
5-9ncard 5
пять
6
шесть
7
семь
8
восемь
9
девять
9
2x-9xncard
2
двадцать
3
тридцать
4
сорок
5
пятьдесят
6
шестьдесят
7
семьдесят
8
восемьдесят
9
девяносто
10
NUM-5-9-ncard
5­9ncard
2x­9xncard
10­19ncard
пробел
0
NUMxx­ncard
0
пробел
11
units
NUM­1­ncard
NUM­2­ncard
NUM­5­9­ncard
NUM­3­4­ncard
units­1
<N:g>
<A:g>[ ]
units­2­4
" "
" "
из ­2­9­gcard
*
­1m­gcard
12
Ordinal Numbers
• Simple ordinal numerals agree with nouns in gender, case and number
• In compound ordinal numerals only the last constituent word agrees with
the noun [5]: две тысячи четырнадцатом (“two thousand fourteenth” in
prepositional masculine)
• Complex ordinal numbers, ending in -00,-000,-000000,-000000000, are
written without spaces: “153000” is converted into
стопятидесятитрёхтысячный “one hundred and fifty-three thousandth”
in nominative masculine
13
Ordinal Numbers
• Only the last constituent words -сотый “hundredth”, -тысячный
“thousandth”, -миллионный “millionth”, -миллиардный “billionth”
agree with the nouns
• The words, preceding the last word, are used in genitive plural (the
exceptions are сто “one hundred” and девяносто “ninety”, which are
used in the nominative case) [6]
14
Acronyms
• Most acronyms should be converted into full words before speech
synthesis, because it is difficult for people to comprehend a letter-by-
letter pronunciation in speech and because acronyms are often rare for
everybody to know what phrase the acronym corresponds to
ФГБОУ ВПО «ЮУрГУ» (НИУ) → Федеральное государственное
бюджетное образовательное учреждение высшего
профессионального образования «Южно-Уральский государственный
университет» (Научно-исследовательский университет)
ФГБОУ ВПО «ЮУрГУ» (НИУ)
15
Acronyms
• The main component of an acronym is a noun, that is why there can be
12 possible forms of the converted phrase (six cases and two numbers) in
Russian
• There are rules for all six cases in Normatex
• Acronyms can be ambiguous in different corpora
• For all ambiguous or unknown acronyms Normatex substitutes each
letter with its alphabet name: ВПП → ВэПэПэ
16
Graphic Abbreviations
• Single interpretation: и т.д. “etc.” → и так далее , т.е. “i.e.” → то есть
• The interpretation depends on the context: и др. “et al.” → и другие
“and others”, и других “and others”, и другим “and others”, и другое
“and other”
• Ambiguous: г. → год “year”, город “city”, грамм “gram” (every noun
can have 12 word forms), Аудитория: 339-г, 339-д “Room 339-g, 339-d
• Sufficient left and right contexts should be provided in FSTs as well as
FSTs should be applied in a definite order
17
Results
Token type Tokens Correct Errors Recall Precision
Numbers 977 920 53 94.17% 94.55%
Acronyms and initials 431 355 40 82.37% 89.87%
Graphic abbreviations 379 232 4 61.21% 98.05%
Total 1787 1507 97 84.33% 93.95%
The work is still in progress
18
References
1. Reichel, U.D., Pfitzinger, H.R.: Text preprocessing for speech synthesis
(2006)
2. The Festival Speech Synthesis System,
http://www.cstr.ed.ac.uk/projects/festival/
3. Dutoit, T.: An introduction to text-to-speech synthesis (Vol. 3). Springer
Science & Busi-ness Media (1997)
4. Russian Grammar [Русская грамматика]. Vol. 1. Nauka, Moscow (1980)
19
References
5. Rosental, D.E., Golub, I.B., Telenkova, M.A.: The Modern Russian Language
[Современный русский язык]. Airis-Press, Moscow (1997)
6. Rosental, D.E., Djandjakova, E.V., Kabanova, N.P.: Reference Book on
Orthography, Pronunciation, Literary Editing [Справочник по
правописанию, произношению, литературному редактированию].
CheRo, Moscow (1998)
20
Normatex—Russian text normalization
github.com/avlukanin/normatex
Artem Lukanin
• about.me/alukanin
• @avlukanin
• artyom.lukanin@gmail.com
Slides: artyom.ice-lc.com/slides/normatex
21

More Related Content

Viewers also liked

Artem Lukanin - Text Processing with Finite State Transducers in Unitex
Artem Lukanin - Text Processing with Finite State  Transducers in UnitexArtem Lukanin - Text Processing with Finite State  Transducers in Unitex
Artem Lukanin - Text Processing with Finite State Transducers in UnitexAIST
 
Sofia Dokuka, Diliara Valeeva, Maria Yudkevich - Formation and evolution mecha...
Sofia Dokuka, Diliara Valeeva, Maria Yudkevich - Formation and evolution mecha...Sofia Dokuka, Diliara Valeeva, Maria Yudkevich - Formation and evolution mecha...
Sofia Dokuka, Diliara Valeeva, Maria Yudkevich - Formation and evolution mecha...AIST
 
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge Exchange
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge ExchangeNikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge Exchange
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge ExchangeAIST
 
Павел Браславский,Velpas - Velpas: мобильный визуальный поиск
Павел Браславский,Velpas - Velpas: мобильный визуальный поискПавел Браславский,Velpas - Velpas: мобильный визуальный поиск
Павел Браславский,Velpas - Velpas: мобильный визуальный поискAIST
 
Aist exactpro
Aist exactproAist exactpro
Aist exactproAIST
 
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...AIST
 
Iosif Itkin - Network models for exchange trade analysis
Iosif Itkin - Network models for exchange trade analysisIosif Itkin - Network models for exchange trade analysis
Iosif Itkin - Network models for exchange trade analysisAIST
 
Nataly Zhukova - Conceptual Model for Routine Measurements Analyses in Seman...
Nataly Zhukova - Conceptual Model for Routine Measurements Analyses  in Seman...Nataly Zhukova - Conceptual Model for Routine Measurements Analyses  in Seman...
Nataly Zhukova - Conceptual Model for Routine Measurements Analyses in Seman...AIST
 
Dmitry Bugaychenko - Smart.Data@ОК.ru. How to make the world a bit better usi...
Dmitry Bugaychenko - Smart.Data@ОК.ru. How to make the world a bit better usi...Dmitry Bugaychenko - Smart.Data@ОК.ru. How to make the world a bit better usi...
Dmitry Bugaychenko - Smart.Data@ОК.ru. How to make the world a bit better usi...AIST
 
Dmitry Berg, Olga Zvereva - Identification Of Autopoietic Communication Patte...
Dmitry Berg, Olga Zvereva - Identification Of Autopoietic Communication Patte...Dmitry Berg, Olga Zvereva - Identification Of Autopoietic Communication Patte...
Dmitry Berg, Olga Zvereva - Identification Of Autopoietic Communication Patte...AIST
 
Dmitrii Stepanov, Aleksandr Bakhshiev, D.Gromoshinsky, N.Kirpan F.Gundelakh -...
Dmitrii Stepanov, Aleksandr Bakhshiev, D.Gromoshinsky, N.Kirpan F.Gundelakh -...Dmitrii Stepanov, Aleksandr Bakhshiev, D.Gromoshinsky, N.Kirpan F.Gundelakh -...
Dmitrii Stepanov, Aleksandr Bakhshiev, D.Gromoshinsky, N.Kirpan F.Gundelakh -...AIST
 
Ilya Trofimov - Distributed Coordinate Descent for L1-regularized Logistic Re...
Ilya Trofimov - Distributed Coordinate Descent for L1-regularized Logistic Re...Ilya Trofimov - Distributed Coordinate Descent for L1-regularized Logistic Re...
Ilya Trofimov - Distributed Coordinate Descent for L1-regularized Logistic Re...AIST
 
Alexander Panchenko - Human and Machine Judgements about Russian Semantic Re...
Alexander Panchenko - Human and Machine Judgements about Russian  Semantic Re...Alexander Panchenko - Human and Machine Judgements about Russian  Semantic Re...
Alexander Panchenko - Human and Machine Judgements about Russian Semantic Re...AIST
 
Elena Bolshakova and Natalia Efremova - A Heuristic Strategy for Extracting T...
Elena Bolshakova and Natalia Efremova - A Heuristic Strategy for Extracting T...Elena Bolshakova and Natalia Efremova - A Heuristic Strategy for Extracting T...
Elena Bolshakova and Natalia Efremova - A Heuristic Strategy for Extracting T...AIST
 
Verichev Fedoseev - Robust Image Watermarking on Triangle Grid of Feature Points
Verichev Fedoseev - Robust Image Watermarking on Triangle Grid of Feature PointsVerichev Fedoseev - Robust Image Watermarking on Triangle Grid of Feature Points
Verichev Fedoseev - Robust Image Watermarking on Triangle Grid of Feature PointsAIST
 
E.Ostheimer , V.G. Labunets, A.A. Kurganskiy, I.V. Artemov, D.E. Komarov - Ne...
E.Ostheimer , V.G. Labunets, A.A. Kurganskiy, I.V. Artemov, D.E. Komarov - Ne...E.Ostheimer , V.G. Labunets, A.A. Kurganskiy, I.V. Artemov, D.E. Komarov - Ne...
E.Ostheimer , V.G. Labunets, A.A. Kurganskiy, I.V. Artemov, D.E. Komarov - Ne...AIST
 
Sergey Nikolenko - Probabilistic rating systems
Sergey Nikolenko - Probabilistic rating systemsSergey Nikolenko - Probabilistic rating systems
Sergey Nikolenko - Probabilistic rating systemsAIST
 

Viewers also liked (17)

Artem Lukanin - Text Processing with Finite State Transducers in Unitex
Artem Lukanin - Text Processing with Finite State  Transducers in UnitexArtem Lukanin - Text Processing with Finite State  Transducers in Unitex
Artem Lukanin - Text Processing with Finite State Transducers in Unitex
 
Sofia Dokuka, Diliara Valeeva, Maria Yudkevich - Formation and evolution mecha...
Sofia Dokuka, Diliara Valeeva, Maria Yudkevich - Formation and evolution mecha...Sofia Dokuka, Diliara Valeeva, Maria Yudkevich - Formation and evolution mecha...
Sofia Dokuka, Diliara Valeeva, Maria Yudkevich - Formation and evolution mecha...
 
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge Exchange
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge ExchangeNikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge Exchange
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge Exchange
 
Павел Браславский,Velpas - Velpas: мобильный визуальный поиск
Павел Браславский,Velpas - Velpas: мобильный визуальный поискПавел Браславский,Velpas - Velpas: мобильный визуальный поиск
Павел Браславский,Velpas - Velpas: мобильный визуальный поиск
 
Aist exactpro
Aist exactproAist exactpro
Aist exactpro
 
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...
 
Iosif Itkin - Network models for exchange trade analysis
Iosif Itkin - Network models for exchange trade analysisIosif Itkin - Network models for exchange trade analysis
Iosif Itkin - Network models for exchange trade analysis
 
Nataly Zhukova - Conceptual Model for Routine Measurements Analyses in Seman...
Nataly Zhukova - Conceptual Model for Routine Measurements Analyses  in Seman...Nataly Zhukova - Conceptual Model for Routine Measurements Analyses  in Seman...
Nataly Zhukova - Conceptual Model for Routine Measurements Analyses in Seman...
 
Dmitry Bugaychenko - Smart.Data@ОК.ru. How to make the world a bit better usi...
Dmitry Bugaychenko - Smart.Data@ОК.ru. How to make the world a bit better usi...Dmitry Bugaychenko - Smart.Data@ОК.ru. How to make the world a bit better usi...
Dmitry Bugaychenko - Smart.Data@ОК.ru. How to make the world a bit better usi...
 
Dmitry Berg, Olga Zvereva - Identification Of Autopoietic Communication Patte...
Dmitry Berg, Olga Zvereva - Identification Of Autopoietic Communication Patte...Dmitry Berg, Olga Zvereva - Identification Of Autopoietic Communication Patte...
Dmitry Berg, Olga Zvereva - Identification Of Autopoietic Communication Patte...
 
Dmitrii Stepanov, Aleksandr Bakhshiev, D.Gromoshinsky, N.Kirpan F.Gundelakh -...
Dmitrii Stepanov, Aleksandr Bakhshiev, D.Gromoshinsky, N.Kirpan F.Gundelakh -...Dmitrii Stepanov, Aleksandr Bakhshiev, D.Gromoshinsky, N.Kirpan F.Gundelakh -...
Dmitrii Stepanov, Aleksandr Bakhshiev, D.Gromoshinsky, N.Kirpan F.Gundelakh -...
 
Ilya Trofimov - Distributed Coordinate Descent for L1-regularized Logistic Re...
Ilya Trofimov - Distributed Coordinate Descent for L1-regularized Logistic Re...Ilya Trofimov - Distributed Coordinate Descent for L1-regularized Logistic Re...
Ilya Trofimov - Distributed Coordinate Descent for L1-regularized Logistic Re...
 
Alexander Panchenko - Human and Machine Judgements about Russian Semantic Re...
Alexander Panchenko - Human and Machine Judgements about Russian  Semantic Re...Alexander Panchenko - Human and Machine Judgements about Russian  Semantic Re...
Alexander Panchenko - Human and Machine Judgements about Russian Semantic Re...
 
Elena Bolshakova and Natalia Efremova - A Heuristic Strategy for Extracting T...
Elena Bolshakova and Natalia Efremova - A Heuristic Strategy for Extracting T...Elena Bolshakova and Natalia Efremova - A Heuristic Strategy for Extracting T...
Elena Bolshakova and Natalia Efremova - A Heuristic Strategy for Extracting T...
 
Verichev Fedoseev - Robust Image Watermarking on Triangle Grid of Feature Points
Verichev Fedoseev - Robust Image Watermarking on Triangle Grid of Feature PointsVerichev Fedoseev - Robust Image Watermarking on Triangle Grid of Feature Points
Verichev Fedoseev - Robust Image Watermarking on Triangle Grid of Feature Points
 
E.Ostheimer , V.G. Labunets, A.A. Kurganskiy, I.V. Artemov, D.E. Komarov - Ne...
E.Ostheimer , V.G. Labunets, A.A. Kurganskiy, I.V. Artemov, D.E. Komarov - Ne...E.Ostheimer , V.G. Labunets, A.A. Kurganskiy, I.V. Artemov, D.E. Komarov - Ne...
E.Ostheimer , V.G. Labunets, A.A. Kurganskiy, I.V. Artemov, D.E. Komarov - Ne...
 
Sergey Nikolenko - Probabilistic rating systems
Sergey Nikolenko - Probabilistic rating systemsSergey Nikolenko - Probabilistic rating systems
Sergey Nikolenko - Probabilistic rating systems
 

More from AIST

Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray Images
Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray  ImagesAlexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray  Images
Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray ImagesAIST
 
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоны
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоныАлена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоны
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоныAIST
 
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...AIST
 
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...AIST
 
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...AIST
 
George Moiseev - Classification of E-commerce Websites by Product Categories
George Moiseev - Classification of E-commerce Websites by Product CategoriesGeorge Moiseev - Classification of E-commerce Websites by Product Categories
George Moiseev - Classification of E-commerce Websites by Product CategoriesAIST
 
Elena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
Elena Bruches - The Hybrid Approach to Part-of-Speech DisambiguationElena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
Elena Bruches - The Hybrid Approach to Part-of-Speech DisambiguationAIST
 
Marina Danshina - The methodology of automated decryption of znamenny chants
Marina Danshina - The methodology of automated decryption of znamenny chantsMarina Danshina - The methodology of automated decryption of znamenny chants
Marina Danshina - The methodology of automated decryption of znamenny chantsAIST
 
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First Glance
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First GlanceEdward Klyshinsky - The Corpus of Syntactic Co-occurences: the First Glance
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First GlanceAIST
 
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...AIST
 
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...AIST
 
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...AIST
 
Valeri Labunets - The bichromatic excitable Schrodinger metamedium
Valeri Labunets - The bichromatic excitable Schrodinger metamediumValeri Labunets - The bichromatic excitable Schrodinger metamedium
Valeri Labunets - The bichromatic excitable Schrodinger metamediumAIST
 
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...AIST
 
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...AIST
 
Artyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
Artyom Makovetskii - An Efficient Algorithm for Total Variation DenoisingArtyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
Artyom Makovetskii - An Efficient Algorithm for Total Variation DenoisingAIST
 
Olesia Kushnir - Reflection Symmetry of Shapes Based on Skeleton Primitive Ch...
Olesia Kushnir - Reflection Symmetry of Shapes Based on Skeleton Primitive Ch...Olesia Kushnir - Reflection Symmetry of Shapes Based on Skeleton Primitive Ch...
Olesia Kushnir - Reflection Symmetry of Shapes Based on Skeleton Primitive Ch...AIST
 
Andrey Mukhtarov - The Study of Applicability of the Decision Tree Method for...
Andrey Mukhtarov - The Study of Applicability of the Decision Tree Method for...Andrey Mukhtarov - The Study of Applicability of the Decision Tree Method for...
Andrey Mukhtarov - The Study of Applicability of the Decision Tree Method for...AIST
 
Oxana Logunova - The Results Of Sulfur Print Image Classification Of Section ...
Oxana Logunova - The Results Of Sulfur Print Image Classification Of Section ...Oxana Logunova - The Results Of Sulfur Print Image Classification Of Section ...
Oxana Logunova - The Results Of Sulfur Print Image Classification Of Section ...AIST
 
Anton Korsakov - Determination of an unmanned mobile object orientation by na...
Anton Korsakov - Determination of an unmanned mobile object orientation by na...Anton Korsakov - Determination of an unmanned mobile object orientation by na...
Anton Korsakov - Determination of an unmanned mobile object orientation by na...AIST
 

More from AIST (20)

Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray Images
Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray  ImagesAlexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray  Images
Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray Images
 
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоны
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоныАлена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоны
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоны
 
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...
 
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...
 
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...
 
George Moiseev - Classification of E-commerce Websites by Product Categories
George Moiseev - Classification of E-commerce Websites by Product CategoriesGeorge Moiseev - Classification of E-commerce Websites by Product Categories
George Moiseev - Classification of E-commerce Websites by Product Categories
 
Elena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
Elena Bruches - The Hybrid Approach to Part-of-Speech DisambiguationElena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
Elena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
 
Marina Danshina - The methodology of automated decryption of znamenny chants
Marina Danshina - The methodology of automated decryption of znamenny chantsMarina Danshina - The methodology of automated decryption of znamenny chants
Marina Danshina - The methodology of automated decryption of znamenny chants
 
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First Glance
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First GlanceEdward Klyshinsky - The Corpus of Syntactic Co-occurences: the First Glance
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First Glance
 
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
 
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
 
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...
 
Valeri Labunets - The bichromatic excitable Schrodinger metamedium
Valeri Labunets - The bichromatic excitable Schrodinger metamediumValeri Labunets - The bichromatic excitable Schrodinger metamedium
Valeri Labunets - The bichromatic excitable Schrodinger metamedium
 
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
 
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...
 
Artyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
Artyom Makovetskii - An Efficient Algorithm for Total Variation DenoisingArtyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
Artyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
 
Olesia Kushnir - Reflection Symmetry of Shapes Based on Skeleton Primitive Ch...
Olesia Kushnir - Reflection Symmetry of Shapes Based on Skeleton Primitive Ch...Olesia Kushnir - Reflection Symmetry of Shapes Based on Skeleton Primitive Ch...
Olesia Kushnir - Reflection Symmetry of Shapes Based on Skeleton Primitive Ch...
 
Andrey Mukhtarov - The Study of Applicability of the Decision Tree Method for...
Andrey Mukhtarov - The Study of Applicability of the Decision Tree Method for...Andrey Mukhtarov - The Study of Applicability of the Decision Tree Method for...
Andrey Mukhtarov - The Study of Applicability of the Decision Tree Method for...
 
Oxana Logunova - The Results Of Sulfur Print Image Classification Of Section ...
Oxana Logunova - The Results Of Sulfur Print Image Classification Of Section ...Oxana Logunova - The Results Of Sulfur Print Image Classification Of Section ...
Oxana Logunova - The Results Of Sulfur Print Image Classification Of Section ...
 
Anton Korsakov - Determination of an unmanned mobile object orientation by na...
Anton Korsakov - Determination of an unmanned mobile object orientation by na...Anton Korsakov - Determination of an unmanned mobile object orientation by na...
Anton Korsakov - Determination of an unmanned mobile object orientation by na...
 

Recently uploaded

SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfSOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfMahamudul Hasan
 
Using AI to boost productivity for developers
Using AI to boost productivity for developersUsing AI to boost productivity for developers
Using AI to boost productivity for developersTeri Eyenike
 
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityHung Le
 
BIG DEVELOPMENTS IN LESOTHO(DAMS & MINES
BIG DEVELOPMENTS IN LESOTHO(DAMS & MINESBIG DEVELOPMENTS IN LESOTHO(DAMS & MINES
BIG DEVELOPMENTS IN LESOTHO(DAMS & MINESfuthumetsaneliswa
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lodhisaajjda
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalFabian de Rijk
 
History of Morena Moshoeshoe birth death
History of Morena Moshoeshoe birth deathHistory of Morena Moshoeshoe birth death
History of Morena Moshoeshoe birth deathphntsoaki
 
ECOLOGY OF FISHES.pptx full presentation
ECOLOGY OF FISHES.pptx full presentationECOLOGY OF FISHES.pptx full presentation
ECOLOGY OF FISHES.pptx full presentationFahadFazal7
 
Ready Set Go Children Sermon about Mark 16:15-20
Ready Set Go Children Sermon about Mark 16:15-20Ready Set Go Children Sermon about Mark 16:15-20
Ready Set Go Children Sermon about Mark 16:15-20rejz122017
 
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...ZurliaSoop
 
"I hear you": Moving beyond empathy in UXR
"I hear you": Moving beyond empathy in UXR"I hear you": Moving beyond empathy in UXR
"I hear you": Moving beyond empathy in UXRMegan Campos
 
LITTLE ABOUT LESOTHO FROM THE TIME MOSHOESHOE THE FIRST WAS BORN
LITTLE ABOUT LESOTHO FROM THE TIME MOSHOESHOE THE FIRST WAS BORNLITTLE ABOUT LESOTHO FROM THE TIME MOSHOESHOE THE FIRST WAS BORN
LITTLE ABOUT LESOTHO FROM THE TIME MOSHOESHOE THE FIRST WAS BORNtntlai16
 
Introduction to Artificial intelligence.
Introduction to Artificial intelligence.Introduction to Artificial intelligence.
Introduction to Artificial intelligence.thamaeteboho94
 
BEAUTIFUL PLACES TO VISIT IN LESOTHO.pptx
BEAUTIFUL PLACES TO VISIT IN LESOTHO.pptxBEAUTIFUL PLACES TO VISIT IN LESOTHO.pptx
BEAUTIFUL PLACES TO VISIT IN LESOTHO.pptxthusosetemere
 
Call Girls Near The Byke Suraj Plaza Mumbai »¡¡ 07506202331¡¡« R.K. Mumbai
Call Girls Near The Byke Suraj Plaza Mumbai »¡¡ 07506202331¡¡« R.K. MumbaiCall Girls Near The Byke Suraj Plaza Mumbai »¡¡ 07506202331¡¡« R.K. Mumbai
Call Girls Near The Byke Suraj Plaza Mumbai »¡¡ 07506202331¡¡« R.K. MumbaiPriya Reddy
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfSkillCertProExams
 
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...David Celestin
 

Recently uploaded (20)

SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfSOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
 
Using AI to boost productivity for developers
Using AI to boost productivity for developersUsing AI to boost productivity for developers
Using AI to boost productivity for developers
 
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
 
BIG DEVELOPMENTS IN LESOTHO(DAMS & MINES
BIG DEVELOPMENTS IN LESOTHO(DAMS & MINESBIG DEVELOPMENTS IN LESOTHO(DAMS & MINES
BIG DEVELOPMENTS IN LESOTHO(DAMS & MINES
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of Drupal
 
History of Morena Moshoeshoe birth death
History of Morena Moshoeshoe birth deathHistory of Morena Moshoeshoe birth death
History of Morena Moshoeshoe birth death
 
ECOLOGY OF FISHES.pptx full presentation
ECOLOGY OF FISHES.pptx full presentationECOLOGY OF FISHES.pptx full presentation
ECOLOGY OF FISHES.pptx full presentation
 
Ready Set Go Children Sermon about Mark 16:15-20
Ready Set Go Children Sermon about Mark 16:15-20Ready Set Go Children Sermon about Mark 16:15-20
Ready Set Go Children Sermon about Mark 16:15-20
 
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
 
"I hear you": Moving beyond empathy in UXR
"I hear you": Moving beyond empathy in UXR"I hear you": Moving beyond empathy in UXR
"I hear you": Moving beyond empathy in UXR
 
Abortion Pills Fahaheel ௹+918133066128💬@ Safe and Effective Mifepristion and ...
Abortion Pills Fahaheel ௹+918133066128💬@ Safe and Effective Mifepristion and ...Abortion Pills Fahaheel ௹+918133066128💬@ Safe and Effective Mifepristion and ...
Abortion Pills Fahaheel ௹+918133066128💬@ Safe and Effective Mifepristion and ...
 
LITTLE ABOUT LESOTHO FROM THE TIME MOSHOESHOE THE FIRST WAS BORN
LITTLE ABOUT LESOTHO FROM THE TIME MOSHOESHOE THE FIRST WAS BORNLITTLE ABOUT LESOTHO FROM THE TIME MOSHOESHOE THE FIRST WAS BORN
LITTLE ABOUT LESOTHO FROM THE TIME MOSHOESHOE THE FIRST WAS BORN
 
Introduction to Artificial intelligence.
Introduction to Artificial intelligence.Introduction to Artificial intelligence.
Introduction to Artificial intelligence.
 
BEAUTIFUL PLACES TO VISIT IN LESOTHO.pptx
BEAUTIFUL PLACES TO VISIT IN LESOTHO.pptxBEAUTIFUL PLACES TO VISIT IN LESOTHO.pptx
BEAUTIFUL PLACES TO VISIT IN LESOTHO.pptx
 
Call Girls Near The Byke Suraj Plaza Mumbai »¡¡ 07506202331¡¡« R.K. Mumbai
Call Girls Near The Byke Suraj Plaza Mumbai »¡¡ 07506202331¡¡« R.K. MumbaiCall Girls Near The Byke Suraj Plaza Mumbai »¡¡ 07506202331¡¡« R.K. Mumbai
Call Girls Near The Byke Suraj Plaza Mumbai »¡¡ 07506202331¡¡« R.K. Mumbai
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
 
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait Cityin kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
 
ICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdfICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdf
 
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
 

Artem Lukanin - Normalization of Non-Standard Words with Finite State Transducers for Russian Speech Synthesis

  • 1. ANALYSIS OF IMAGES, SOCIAL NETWORKS,AND TEXTS April, 9-11th, 2015, Yekaterinburg Normalization of Non-Standard Words with Finite State Transducers for Russian Speech Synthesis Artem Lukanin
  • 2. Text Preprocessing for Speech Synthesis • is usually a very complex task • Text normalization is one of the steps in text preprocessing [1] • sentence segmentation • tokenization • normalization of non-standard words (NSWs) • numbers, abbreviations, and acronyms • different characters like % , $ , # , № , etc. 2
  • 3. Normalization of Non-Standard Words • NSWs must be expanded into full SW to be pronounced correctly • It's even more complex in inflective languages such as Russian • ordinal number can be converted into 36 different word forms (6 cases * 2 numers * 3 genders) • digit position changes the output standard word • 1111 1 — первый • 111 11 — одиннадцатый • 11 1 11 — сто • 1 1 111 — тысяча • 11 111 — одиннадцать тысяч 3
  • 4. Existing Russian Normalization Systems • As a part of proprietory Text-to-Speech (TTS) systems • Google Translate, https://translate.google.ru/ • VitalVoice, http://cards.voicefabric.ru/ • Windows SAPI voices, etc. • As a part of open-source TTS systems • Festival [2] • only digit-by-digit number normalization for the Russian voice 4
  • 5. Normatex • is the first Russian open-source normalization system, known to the author, github.com/avlukanin/normatex • If the input texts are normalized beforehand the quality of the synthesized speech of existing TTS systems can be improved • 118 finite state transducers (FSTs) for conversion of cardinal and ordinal numbers into the corresponding numerals, which can preprocess different ranges, time, dates, telephone numbers, postal codes, etc. • 33 FSTs for normalization of graphic abbreviations and acronyms 5
  • 6. Test Parallel Corpus • 66 original texts of the official site of South Ural State University, susu.ac.ru, which contains 38,439 tokens (broad segmentation units [3]): • 14,661 word tokens • 333 acronyms and 98 initials; 379 graphic abbreviations • 977 number tokens (2,511 digits) • 66 manually preprocessed texts, where all numbers, abbreviations and acronyms were expanded into full words or replaced with pronounceable combination of letters 6
  • 7. Finite State Transducers • are developed in the form of graphs in Unitex 3.1beta • Before applying FSTs to a text, it is preprocessed: • The text is splitted into sentences • The text is tokenized • Every token is assigned all possible grammatical forms • Number FSTs are applied first to deal with numbers and measure unit abbreviations • Abbreviation FSTs and acronym FSTs are applied sequentially after that 7
  • 8. Cardinal Numbers • agree with nouns in case, but the numerals один “one” and два “two” agree in gender as well • all the constituent words of a compound numeral agree with the corresponding noun: двадцати одного and двадцати одной (“twenty- one” in gen. m. and f.) • одни (“one” in plural) agrees only with pluralia tantum, e.g. одни ножницы “one pair of scissors”, одни брюки “one pair of pants” [4] 8
  • 13. Ordinal Numbers • Simple ordinal numerals agree with nouns in gender, case and number • In compound ordinal numerals only the last constituent word agrees with the noun [5]: две тысячи четырнадцатом (“two thousand fourteenth” in prepositional masculine) • Complex ordinal numbers, ending in -00,-000,-000000,-000000000, are written without spaces: “153000” is converted into стопятидесятитрёхтысячный “one hundred and fifty-three thousandth” in nominative masculine 13
  • 14. Ordinal Numbers • Only the last constituent words -сотый “hundredth”, -тысячный “thousandth”, -миллионный “millionth”, -миллиардный “billionth” agree with the nouns • The words, preceding the last word, are used in genitive plural (the exceptions are сто “one hundred” and девяносто “ninety”, which are used in the nominative case) [6] 14
  • 15. Acronyms • Most acronyms should be converted into full words before speech synthesis, because it is difficult for people to comprehend a letter-by- letter pronunciation in speech and because acronyms are often rare for everybody to know what phrase the acronym corresponds to ФГБОУ ВПО «ЮУрГУ» (НИУ) → Федеральное государственное бюджетное образовательное учреждение высшего профессионального образования «Южно-Уральский государственный университет» (Научно-исследовательский университет) ФГБОУ ВПО «ЮУрГУ» (НИУ) 15
  • 16. Acronyms • The main component of an acronym is a noun, that is why there can be 12 possible forms of the converted phrase (six cases and two numbers) in Russian • There are rules for all six cases in Normatex • Acronyms can be ambiguous in different corpora • For all ambiguous or unknown acronyms Normatex substitutes each letter with its alphabet name: ВПП → ВэПэПэ 16
  • 17. Graphic Abbreviations • Single interpretation: и т.д. “etc.” → и так далее , т.е. “i.e.” → то есть • The interpretation depends on the context: и др. “et al.” → и другие “and others”, и других “and others”, и другим “and others”, и другое “and other” • Ambiguous: г. → год “year”, город “city”, грамм “gram” (every noun can have 12 word forms), Аудитория: 339-г, 339-д “Room 339-g, 339-d • Sufficient left and right contexts should be provided in FSTs as well as FSTs should be applied in a definite order 17
  • 18. Results Token type Tokens Correct Errors Recall Precision Numbers 977 920 53 94.17% 94.55% Acronyms and initials 431 355 40 82.37% 89.87% Graphic abbreviations 379 232 4 61.21% 98.05% Total 1787 1507 97 84.33% 93.95% The work is still in progress 18
  • 19. References 1. Reichel, U.D., Pfitzinger, H.R.: Text preprocessing for speech synthesis (2006) 2. The Festival Speech Synthesis System, http://www.cstr.ed.ac.uk/projects/festival/ 3. Dutoit, T.: An introduction to text-to-speech synthesis (Vol. 3). Springer Science & Busi-ness Media (1997) 4. Russian Grammar [Русская грамматика]. Vol. 1. Nauka, Moscow (1980) 19
  • 20. References 5. Rosental, D.E., Golub, I.B., Telenkova, M.A.: The Modern Russian Language [Современный русский язык]. Airis-Press, Moscow (1997) 6. Rosental, D.E., Djandjakova, E.V., Kabanova, N.P.: Reference Book on Orthography, Pronunciation, Literary Editing [Справочник по правописанию, произношению, литературному редактированию]. CheRo, Moscow (1998) 20
  • 21. Normatex—Russian text normalization github.com/avlukanin/normatex Artem Lukanin • about.me/alukanin • @avlukanin • artyom.lukanin@gmail.com Slides: artyom.ice-lc.com/slides/normatex 21