Lec 15,16,17 NLP.machine translation

2,061 views

Published on

Artificial intelligence important lectures ..
BY
Shah Khalid
CS department University of Peshawar.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,061
On SlideShare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • My work on palestinian, arabic mt and arabic hebrew mt Highlight similarities and differences A lot of similarities/differences not included
  • Lec 15,16,17 NLP.machine translation

    1. 1. Lec-15 Natural language processing NLU Techniques Prepared by Shah khalid Department of Computer Science, University of Peshawar Lecture 15 بسم الله الرحمن الرحيم
    2. 2. Department of Computer Science, University of Peshawar Typical NLP System Natural Language input parsing generation Internal representation Natural Language output Inference/retrieval
    3. 3. <ul><li>NLP </li></ul><ul><ul><li>Components of NLP </li></ul></ul><ul><ul><ul><li>NLU </li></ul></ul></ul><ul><ul><ul><li>NLG </li></ul></ul></ul><ul><ul><li>Branches of NLP </li></ul></ul><ul><ul><ul><li>MT. </li></ul></ul></ul><ul><ul><ul><li>Question Answering. </li></ul></ul></ul><ul><li>NLU </li></ul>NLP Department of Computer Science, University of Peshawar Recap
    4. 4. <ul><ul><li>NLU problems </li></ul></ul><ul><ul><ul><li>Ambiguity. </li></ul></ul></ul><ul><ul><ul><li>Incompleteness </li></ul></ul></ul><ul><ul><ul><li>Imprecision. </li></ul></ul></ul><ul><ul><ul><li>Inaccuracy. </li></ul></ul></ul>NLP Department of Computer Science, University of Peshawar
    5. 5. NLP Department of Computer Science, University of Peshawar Department of Computer Science, University of Peshawar There were two man blocking my escape. One held a hammer; one had nothing in his hands. I knew that I could not hit both of them. I hit a man with the hammer. Problem Solution Ambiguity Put the idea in context. Imprecision Relate the idea to a familiar situation. Incompleteness Complete the idea based on our expectations of likely events. Inaccuracy Infer the intended meaning by recognizing familiar patterns.
    6. 6. How can a machine understand these differences? <ul><ul><li>Get the cat with the gloves. </li></ul></ul>/26 NLP Department of Computer Science, University of Peshawar
    7. 7. <ul><li>Agenda </li></ul><ul><ul><li>NLU Techniques. </li></ul></ul><ul><ul><ul><li>Syntax Analysis. </li></ul></ul></ul><ul><ul><ul><li>Semantic analysis. </li></ul></ul></ul><ul><ul><ul><li>Morphology. </li></ul></ul></ul><ul><ul><ul><li>Pragmatics. </li></ul></ul></ul><ul><ul><ul><li>. </li></ul></ul></ul><ul><ul><ul><li>Discourse analysis. </li></ul></ul></ul>NLP Department of Computer Science, University of Peshawar
    8. 8. <ul><li>To understand natural language, a program may use one or more of these methods of analyzing text. </li></ul><ul><ul><ul><ul><li>Syntax analysis. </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Semantic analysis </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Morphology. </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Pragmatics. </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Discourse analysis . </li></ul></ul></ul></ul>Department of Computer Science, University of Peshawar NLU TECHNIQUES.
    9. 9. <ul><li>Separates a sentence into its component parts in order to analyze its form. </li></ul><ul><li>The study about the structure of a sentence . </li></ul><ul><li>Requires parsing technique. </li></ul>Department of Computer Science, University of Peshawar Syntax analysis
    10. 10. <ul><li>Using rules </li></ul><ul><ul><li>we can determine whether a sentence is legal, and obtain its structure. </li></ul></ul><ul><li>“ The large cat eats the small rat” </li></ul><ul><li>This consists of: </li></ul><ul><ul><li>Noun Phrase: The large cat </li></ul></ul><ul><ul><li>Verb Phrase: eats the small rat </li></ul></ul><ul><li>The verb phrase in turn consists of: </li></ul><ul><ul><li>verb: eats </li></ul></ul><ul><ul><li>Noun Phrase: the small rat </li></ul></ul>Department of Computer Science, University of Peshawar Parsing
    11. 11. 7 Traditional POS Categories <ul><li>N noun chair, bandwidth, yard. </li></ul><ul><li>V verb study, hit. </li></ul><ul><li>ADJ adj old, tall, ridiculous </li></ul><ul><li>ADV adverb unfortunately, slowly, </li></ul><ul><li>P preposition of, by, to </li></ul><ul><li>PRO pronoun I, me, mine </li></ul><ul><li>DET determiner the, a, that, those </li></ul>/39 Syntax analysis Department of Computer Science, University of Peshawar
    12. 12. The main structure rules <ul><li>1. S  NP + VP </li></ul><ul><li>2. NP  (Det) + N / N </li></ul><ul><li>3. VP  V / VP+PP / VP+NP </li></ul><ul><li>4. PP  P+NP </li></ul>Parsing Department of Computer Science, University of Peshawar
    13. 13. <ul><li>This structure can be represented as a tree: </li></ul>Department of Computer Science, University of Peshawar Parse Tree sentence noun phrase verb phrase article adjective noun verb noun phrase article adjective noun The large cat eats the small rat
    14. 14. Example The children put the toy in the box V PP in NP P the Det N box The N put S NP VP Det children NP the Det N toy Parse Tree Department of Computer Science, University of Peshawar
    15. 15. Draw the tree diagram. <ul><li>1. repaired the telephone </li></ul><ul><li>2. the success of the program </li></ul><ul><li>3. a film about pollution </li></ul><ul><li>4. move towards the window. </li></ul><ul><li>5.the glass suddenly broke. </li></ul>Parsing Department of Computer Science, University of Peshawar
    16. 16. <ul><li>Interprets a sentence according to meaning rather than form. </li></ul><ul><li>The study of the relationships between the symbols and their meanings. </li></ul><ul><ul><ul><ul><li>The chair sat on the boy. </li></ul></ul></ul></ul><ul><ul><li>Semantic analysis is divided into two steps. </li></ul></ul><ul><ul><li>Lexical semantics : the study of meaning of individual words </li></ul></ul><ul><ul><li>Global semantics : how the meaning of individual words are combined into meaning of sentences. </li></ul></ul>Department of Computer Science, University of Peshawar Semantic analysis
    17. 17. <ul><li>To concern how the words are constructed out of the basic words called morphemes. </li></ul><ul><ul><ul><ul><li>e.g. Friendly from friend. </li></ul></ul></ul></ul><ul><ul><ul><ul><li>disease from ease. </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Sit, sits, sat, sitting </li></ul></ul></ul></ul>Department of Computer Science, University of Peshawar MORPHOLOGY
    18. 18. <ul><li>What is the actual meaning of the sentence. </li></ul><ul><li>In which sense the sentence is asked. </li></ul><ul><li>e.g. “ why did not the company give the profit last month ”. </li></ul>Department of Computer Science, University of Peshawar PRAGMATICS.
    19. 19. <ul><li>meaning of an individual sentence may depend on the sentence that precede it and may influence the meaning of sentences that follow it.. . </li></ul><ul><ul><ul><ul><li>e.g. </li></ul></ul></ul></ul>Department of Computer Science, University of Peshawar DISCOURSE ANALYSIS
    20. 20. <ul><li>The field that study ways of making it easier for you to understand what computer is telling you is called natural language generation. </li></ul><ul><li>Three components: </li></ul><ul><ul><li>The program must decide when to say something? </li></ul></ul><ul><ul><li>The program must decide what to say? </li></ul></ul><ul><ul><li>The program must decide how to say it? </li></ul></ul>Department of Computer Science, University of Peshawar NLG
    21. 21. Lec-16 MACHINE TRANSLATION ( MT ) Prepared by Shah khalid Department of Computer Science, University of Peshawar Lecture 16 بسم الله الرحمن الرحيم
    22. 22. <ul><li>History of MT. </li></ul><ul><li>Reasons to use MT. </li></ul><ul><li>Types of MT. </li></ul><ul><ul><li>Bilingual MT. </li></ul></ul><ul><ul><li>Multilingual MT. </li></ul></ul><ul><li>Advantages of MT. </li></ul><ul><li>Causes of failure of MT. </li></ul><ul><li>MT strategies. </li></ul><ul><li>Problems with MT. </li></ul><ul><li>Translation Steps </li></ul><ul><ul><li>Analysis. </li></ul></ul><ul><ul><li>Transfer. </li></ul></ul><ul><ul><li>Generation. </li></ul></ul>Department of Computer Science, University of Peshawar Machine translation (MT)
    23. 23. <ul><li>Currently, Google offers translations between the following languages: </li></ul>Machine translation (MT) Department of Computer Science, University of Peshawar Arabic Bulgarian Catalan Chinese Croatian Czech Danish Dutch Filipino Finnish French German Greek Hebrew Hindi Indonesian Italian Japanese Korean Latvian Lithuanian Norwegian Polish Portuguese Romanian Russian Serbian Slovak Slovenian Spanish Swedish Ukrainian Vietnamese
    24. 24. Machine translation (MT) Department of Computer Science, University of Peshawar
    25. 25. “ BBC found similar support”!!! Machine translation (MT) Department of Computer Science, University of Peshawar
    26. 26. Department of Computer Science, University of Peshawar Machine Translation
    27. 27. <ul><li>Conversion of text from one language to another language . </li></ul><ul><li>Best one </li></ul><ul><ul><li>S.T~T.T </li></ul></ul><ul><ul><li>M.T----- </li></ul></ul><ul><ul><ul><li>that area of AI in which the source text is converted into its target text by the help of computer. </li></ul></ul></ul><ul><ul><li>History of machine translation </li></ul></ul>Department of Computer Science, University of Peshawar Translation
    28. 28. <ul><li>Bi-lingual MT. </li></ul><ul><li>MULTI-Lingual MT. </li></ul><ul><li>S L T L SL TL (s) </li></ul>Department of Computer Science, University of Peshawar Types of MT
    29. 29. <ul><li>MT is better than technical translators due to the following reasons. </li></ul><ul><li>One can communicate with the foreigner. </li></ul><ul><li>Banks, commercial and education purposes. </li></ul><ul><li>Get rid of technical translators which wastes time as well as money. </li></ul><ul><li>Play good role in unification of people from different area. </li></ul>Department of Computer Science, University of Peshawar Advantages of MT
    30. 30. <ul><li>We can get benefits of other satellites. </li></ul><ul><li>Also fulfill the shortage of technical translator. </li></ul><ul><li>Helps students to translates many books into their own mother language. </li></ul><ul><li>Can be used any time . </li></ul>Advantages of MT Department of Computer Science, University of Peshawar
    31. 31. <ul><li>Need two dictionaries two be present at the time of translation. </li></ul><ul><li>Translation of poly-semantic word. </li></ul><ul><ul><li>e.g . fair . </li></ul></ul><ul><li>Some time there is no equivalent word in the target text. e.g. milkman . </li></ul><ul><li>Some times a single word falls in different categories. </li></ul><ul><ul><li>E.g. Ali is joking. </li></ul></ul><ul><ul><li>Ali made a joke. </li></ul></ul>Department of Computer Science, University of Peshawar Causes Of Failure
    32. 32. <ul><li>Number of word in the ST and TT are not the same. </li></ul><ul><li>No one to one correspondence. </li></ul><ul><ul><li>He is walking . </li></ul></ul>Causes Of Failure Department of Computer Science, University of Peshawar
    33. 33. Translation Divergences English John swam across the river quickly Spanish Juan cruzó rapidamente el río nadando Gloss: John crossed fast the river swimming Arabic اسرع جون عبور النهر سباحة Gloss: sped john crossing the-river swimming Chinese 约翰 快速 地 游 过 这 条 河 Gloss: John quickly  (DE) swam  cross  the (Quantifier)    river Russian Джон быстро переплыл реку Gloss: John quickly cross-swam river
    34. 34. Multilingual Challenges <ul><li>Orthographic Variations </li></ul><ul><ul><li>Ambiguous spelling </li></ul></ul><ul><ul><ul><li>كتب الاولاد اشعارا كَتَبَ الأوْلادُ اشعَاراً </li></ul></ul></ul><ul><ul><li>Ambiguous word boundaries </li></ul></ul><ul><li>Lexical Ambiguity </li></ul><ul><ul><li>Bank  بنك (financial) vs. ضفة (river) </li></ul></ul><ul><ul><li>Eat  essen (human) vs. fressen (animal) </li></ul></ul>
    35. 35. Multilingual Challenges Morphological Variations <ul><li>Tokenization </li></ul>Needs of MT Department of Computer Science, University of Peshawar write  writt en كتب  م كت و ب kill  kill ed قتل  م قت و ل do  do ne فعل  م فع و ل conj noun plural article And the car s  and the car s و ال سيار ات  w Al SyAr At Et le s voiture s  et le voiture s
    36. 36. <ul><li>Less technical translators. </li></ul><ul><li>TT spent a lot of time in training </li></ul><ul><li>TT cost increases day by day. </li></ul><ul><li>TT may lost (temporary). </li></ul>Department of Computer Science, University of Peshawar Needs of MT
    37. 37. <ul><li>Direct strategies. </li></ul><ul><li>Transfer approach. </li></ul><ul><li>Interlingua strategy. </li></ul>Department of Computer Science, University of Peshawar Various MT strategies
    38. 38. <ul><li>Word by word. </li></ul><ul><li>e.g. out of sight, out of mind . </li></ul><ul><li>invisible idiots. </li></ul><ul><li>The spirit is willing but the flash is weak. </li></ul><ul><li>The vodka is good , but the meat is rotten. </li></ul>Department of Computer Science, University of Peshawar Direct strategy
    39. 39. <ul><li>First the source text is analyzed. </li></ul><ul><li>Converted into logical form of the SL. </li></ul><ul><li>converted into the logical form of the TL. </li></ul><ul><li>Target text is generated. </li></ul><ul><li>e.g. onboard. </li></ul>Department of Computer Science, University of Peshawar Transfer approach
    40. 40. <ul><li>Good for multi-lingual MT </li></ul><ul><li>. </li></ul><ul><li>Text is converted into intermediate representation. </li></ul><ul><li>Intermediate representation the target text is obtained. </li></ul>Department of Computer Science, University of Peshawar Interlingua strategy SL Intermediate representation TL
    41. 41. Interlingua Syntactic Parsing Semantic Analysis Sentence Planning Text Generation Source (Arabic) Target (English) Transfer Rules Department of Computer Science, University of Peshawar MT
    42. 42. LECTURE 17 Dictionary BY: SHAH KHALID DICTIONARY Department of Computer Science, University of Peshawar
    43. 43. <ul><li>Dictionaries their need and purpose. </li></ul><ul><ul><li>Types of dictionaries. </li></ul></ul><ul><ul><ul><li>Monolingual dictionary. </li></ul></ul></ul><ul><ul><ul><li>Bilingual dictionary. </li></ul></ul></ul><ul><ul><ul><li>Multilingual dictionary. </li></ul></ul></ul><ul><ul><li>Design of multi lingual and monolingual dictionary. </li></ul></ul><ul><ul><li>Where dictionaries are used. </li></ul></ul><ul><ul><li>Difference between computer based dictionary and manual dictionary. </li></ul></ul>Department of Computer Science, University of Peshawar DICTIONARY
    44. 44. Dictionaries are books that list all the words in a language. DICTIONARY Department of Computer Science, University of Peshawar
    45. 45. To make dictionaries easier to use, the words are organized in alphabetical order . So to use a dictionary easily , you must know how to alphabetize words. Department of Computer Science, University of Peshawar DICTIONARY
    46. 46. Since there are so many words in a dictionary, guide words are used to help you locate a word quickly. What are guide words ? Department of Computer Science, University of Peshawar
    47. 47. Guide words are found at the top of each page. They tell you the first and last word that is found on that page. Department of Computer Science, University of Peshawar DICTIONARY
    48. 48. How do guide words help you find a word quickly? Instead of looking at each word on a page, which could take forever, you just look at the guide words and then use what you know about alphabetizing words to decide if your word would be found on that page. Department of Computer Science, University of Peshawar DICTIONARY
    49. 49. Let’s see what that means- Let’s pretend we are looking up the word, science . First we would turn to the S section. Department of Computer Science, University of Peshawar DICTIONARY
    50. 50. Then we would use the guide words and what we know about alphabetizing words to decide the correct page in the S section. Department of Computer Science, University of Peshawar DICTIONARY
    51. 51. We would look at the guide words at the top of each page and decide between which ones our word would come in alphabetical order. Department of Computer Science, University of Peshawar DICTIONARY
    52. 52. Let’s do that for the word science - Which one of these pages would contain the word science ? Department of Computer Science, University of Peshawar DICTIONARY
    53. 53. The page with the guide words- stamp - summer Or the page with the guide words- sandwich - seventy science Department of Computer Science, University of Peshawar DICTIONARY
    54. 54. AN INTRODUCTION TO MECHANICAL DICTIONARIES What do they mean? Department of Computer Science, University of Peshawar DICTIONARY
    55. 55. Monolingual Dictionary <ul><li>Word and their meaning r stored in the same language. </li></ul><ul><li>e.g. </li></ul><ul><ul><li>Urdu into Urdu. </li></ul></ul><ul><ul><li>English into English </li></ul></ul>Department of Computer Science, University of Peshawar DICTIONARY
    56. 56. <ul><li>Bi-Lingual Dictionary </li></ul><ul><li>Words are stored in one language and their meanings are stored in another language. </li></ul>Department of Computer Science, University of Peshawar DICTIONARY
    57. 57. <ul><li>Multi-lingual Dictionary. </li></ul><ul><li>Words are stored in one language and their meanings are stored in more than one languages. </li></ul>Department of Computer Science, University of Peshawar DICTIONARY
    58. 58. <ul><li>flag (flag) </li></ul><ul><li>noun A piece of cloth with a pattern or symbol of a country, an organization, etc. </li></ul><ul><li>verb To stop, or to signal. We flagged down the police officer. </li></ul>The word being defined is followed by the pronunciation in parenthesis. Department of Computer Science, University of Peshawar DICTIONARY
    59. 59. <ul><li>flag (flag) </li></ul><ul><li>noun A piece of cloth with a pattern or symbol of a country, an organization, etc. </li></ul><ul><li>verb To stop, or to signal. We flagged down the police officer. </li></ul>The first word tells the word’s part of speech Department of Computer Science, University of Peshawar DICTIONARY
    60. 60. <ul><li>noun A piece of cloth with a pattern or symbol of a country, an organization, etc. </li></ul><ul><li>verb To stop, or to signal. We flagged down the police officer. </li></ul>The next section is the actual definition of the word. Department of Computer Science, University of Peshawar DICTIONARY
    61. 61. <ul><li>noun A piece of cloth with a pattern or symbol of a country, an organization, etc. </li></ul><ul><li>verb To stop, or to signal. We flagged down the police officer. </li></ul>Finally, you might see a sentence showing how the word is used. Especially if the use is not the most common for the word. Department of Computer Science, University of Peshawar DICTIONARY
    62. 62. Let’s see what you’ve learned! mischief (miss-chif) noun Playful behavior that may cause annoyance or harm to others. My brother is always up to some sort of mischief. Department of Computer Science, University of Peshawar DICTIONARY
    63. 63. <ul><li>book (buk) </li></ul><ul><li>noun A set of pages that are bound together. </li></ul><ul><li>verb To arrange for something ahead of time. We’ve booked a vacation in Mexico. </li></ul>Department of Computer Science, University of Peshawar DICTIONARY
    64. 64. Now you know the definition of definition ! Department of Computer Science, University of Peshawar DICTIONARY
    65. 65. Thank you very much!!! Shah NLP Department of Computer Science, University of Peshawar

    ×