• Like
Bali Hana Tan140803
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Bali Hana Tan140803

  • 928 views
Published

 

Published in Economy & Finance , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
928
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
6
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. PRONUNCIATION DICTIONARIES Dr. Bali RANAIVO-MALANÇON Unit Terjemahan Melalui Komputer Universiti Sains Malaysia
  • 2. Definition What is a pronunciation dictionary?
    • A pronunciation dictionary (or Phonetic dictionary) is a list of words following by their phonetic transcriptions.
    • Phonetic transcriptions
    • Canonical +
    • pronunciation
    Variant pronunciations Phonological rules to generate variant pronunciations
  • 3. A few linguistic basic knowledge Notation <> orthographic representation <buah> ‘’ character representation ‘b’, ‘u’, ‘a’, ‘h’ // phonemic representation /buah/ [] phonetic representation [buwah] PHONOLOGY (or phonemics) study distinctive sound units, the patterns they form, and the rules which regulate their use Phonemes / Phones /r/ PHONETICS study the inventory and structure of the sounds of language Allophones [ r ] [ R ] [ ʁ ]
  • 4. Examples of pronunciation dictionaries Verbmobil &quot;Ubernachtungen Qy:b6n'axtUN@n &quot;Ubernachtungskosten Qy:b6n'axtUNs#k&quot;Ost@n &quot;Ubernachtungsm&quot;oglichk Qy:b6n'axtUNs#m&quot;2:klICk CMUdict (Carnegie Mellon Pronouncing dictionary) PHONOLEX &quot; Ubernachtungsgeldes CL:nom OR:sb TP:ptra Qy:b6naxtUNsgEld@s * &quot;Ubernachtungskosten OR:vm TP:manu Qy:b6n'axtUNs#k&quot;Ost@n y:b6naxtUNskOst@n 1 VM MAUS y:b6naxtUNskOsn 1 VM MAUS *
  • 5. Applications Why do we need pronunciation dictionaries?
    • Speech technologies – to help phonetic labeling
      • Automatic Speech Recognition (ASR)
      • - Tan Tien Pieng -
      • Text-To-Speech (TTS)
      • - Nur Hana Samsudin -
    • Pronunciation can be added to Malay dictionary
  • 6. Simplified Speech Recognition Architecture Jurafsky D., Martin J. H. (2000) Speech and Language Processing , Prentice-Hall, Inc. Speech Waveform Feature Extraction Spectral Feature Vectors Decoding (Viterbi/ A*) Pronunciation dictionary N-Gram ay 0.70 ay 0.80 ay 0.80 n 0.50 aa 0.22 aa0.12 aa 0.12 en 0.20 . . . Phone Likelihood Estimation (Gaussians/ Neural Network) Phone Likelihoods P(o|q) Neural Net i need a Words
  • 7. MBROLA – Malay Diphone Database
    • Diphones
      • Speech units that begin in the middle of a phone and end in the middle of the following one.
      • Concatenative synthesis
      • Minimize concatenation problems
      • Require an affordable amount of memory
    • MBROLA (Multi Band Resynthesis OverLap Add)
      • Speech synthesizer based on the concatenation of diphones
      • Faculté Polytechnique de Mons, Belgium, 1996,
      • Synthesizers for many languages, e.g. Indonesian, {British, American} English, Arab
      • Synthesizers + Diphone database: free, non-commercial applications, available online
      • As MBROLA provides all facilities (programs, guidelines, assistance, etc.) to build a synthesizer, we can focus our research only on preparing the diphone data to built the Malay synthesizer
  • 8. Building diphone database Combine two phones Pronunciation Dictionary … saya, [saja] … List of diphones [aj], [ja], [sa], … List of diphones [aa], [aj], [as], [ja], [jj], [js], [sa], [sj], [ss], … List of phones [a], [j], [s], …
  • 9. Resources What do we have today to build the Malay pronunciation dictionary?
    • Linguistic resources
    • List of Malay words  60,000 words or tokens
    • List of Malay abbreviations and their expansions
    • List of Malay proper names
    • Malay corpus: novels, academic
    • Phonological rules (Dr Tajul)
    UTMK’s future researches on speech Applications of the Malay pronunciation dictionary
    • From readings (books, reports, etc.)
    • Knowledge about pronunciation dictionary
    • applications,
    • needs,
    • techniques, algorithms, implementation
    • Programs, Techniques, Algorithms
    • Grapheme-to-phoneme converter
    • Statistical techniques
  • 10. Building the pronunciation dictionary
    • Define phoneme inventories and use machine-readable phonetic alphabets (ASCII-IPA alphabets), e.g. SAMPA, TIMIT, etc.)
    • IPA SAMPA TIMIT
    • ʃ S sh sh e
    • ʤ jh j oke
    • ŋ N ng si ng
    • Define phonological rules in a form adapted to computation
      • Etymology information
        • Arab <maaf> [ ma ʕ af ]
        • Malay <gunaan> [ guna ʔ an ]
      • Morphological analysis
      • <pakai> [ paka ɪ ]
      • <diketuai> [ dik ə tuwaji ]
      • Rewriting rules – order rules
      • Two-level morphology – without rule-ordering
      • Implementation using finite-state transducers
  • 11. Building …
    • Differentiate homographs ,
      • semak_Noun [ səmaʔ ]
      • semak_Verb [ semaʔ ]
    • Pronunciation of
      • proper names
      • abbreviations , e.g. Proton
      • numbers , e.g. Boeing 747
      • some characters , e.g. ‘@’ and ‘.’ in ranaivo@cs.usm.my
    • Grapheme to phoneme converter
    • Experts checking
  • 12. Conclusion
    • Structure of Malay pronunciation dictionary
    • word, lexcat, etym, pht, nbph
      • lexcat = lexical category
      • etym = etymology
      • {MAL(ay), IND(onesian), ENG(lish), AR(a)B, OTH(er)}
      • pht = phonetic transcription
      • using one ASCII-API alphabets (not defined yet)
      • nbph = number of phones
    • Set of phonological rules to derive variant pronunciations
    • TERIMA KASIH