 Overview Basic knowledge Demonstration
LOGOUSER : Men are all alike.ELIZA : In what way?USER : They’re always bugging us about something or other.ELIZA : Can you...
 A sub-field of Artificial Intelligent, since 1960s … Concerned with the interactions between computers and  human langu...
 Natural language unit?    Natural language understanding    Natural language generation Data?    Speech processing  ...
 Task of generating natural language from a machine  representation May be viewed as the opposite of natural language  u...
 An advanced subtopic of NLP deals with reading  comprehension More complex than NLG Many commercial interest in this f...
 Logic is too clear, the lost of flexibility cause  difficulties in NLP Examples :   Time flies like an arrow  Can be u...
 Logic is too clear, the lost of flexibility become  difficulties in NLP Examples :   Time flies like an arrow  Can be ...
 Logic is too clear, the lost of flexibility become  difficulties in NLP Examples :   Time flies like an arrow  Can be ...
 Logic is too clear, the lost of flexibility become  difficulties in NLP Examples :   Time flies like an arrow  Can be ...
 Logic is too clear, the lost of flexibility become  difficulties in NLP Examples:   Time flies like an arrow  Can be u...
 Logic is too clear, the lost of flexibility become  difficulties in NLP Examples :   Time flies like an arrow  Can be ...
 Logic is too clear, the lost of flexibility become  difficulties in NLP Examples :   Time flies like an arrow  Can be ...
 Words combination and division Stress placing on words The properties of subjects   We gave the monkeys the bananas b...
 Involves reasoning about the world Embedded a social system of people interacting   persuading, insulting and amusing ...
 Automatic Summarization
 Information Extraction
 Grammar Testing
 ePi Group:   Automatic Vietnamese processing system   www.baomoi.com      Collecting news from all Vietnamese e-newsp...
 Morphological analysis :   Individual words are analyzed into their     components Syntactic analysis   Linear sequence...
MorphologySyntaxSemanticPragmaticDiscourse
MorphologySyntaxSemanticPragmaticDiscourse
 Morphemes: smallest meaningful unit spoken units of language.   Stem: book, cat, car, …   Affixes : un-, -s, -es, ..  ...
 Word Classes   Parts of speech: noun, verb, adjectives,    etc.                                               Morpholog...
 Vietnamese?   Ăn = ăn                                  Morphology   Uống = uống   Xe = xe                       Synta...
 Why parse words?                                          Morphology   To identify a word’s part-of-speech   To identi...
 Ambiguity   I want her to go to the cinema with me                                             Morphology  To - infinit...
 How to implement?   Regular expression   Finite State Transducers (FST)   Finite State Accepter (FSA)      Morphology...
 Relate terms:   Stem, stemming   Morphology   Part of speech                     Syntax   N-gram                     ...
MorphologySyntaxSemanticPragmaticDiscourse
MorphologySYNTAX   Syntax         Semantic         Pragmatic         Discourse
 Linear sequence of words are transformed into  structures that show how the words relate to  each other.                ...
MorphologySyntaxSemanticPragmaticDiscourse
 Syntax   Actual structure of a sentence                                        Morphology                              ...
 A grammar define syntactically legal sentences    I ate an apple     (syntactic legal)    I ate apple        (not synt...
 Ambiguities                Morphology                Syntax                Semantic                Pragmatic            ...
MorphologySyntaxSemanticPragmaticDiscourse
Morphology           SyntaxSEMANTIC   Semantic           Pragmatic           Discourse
 What could this mean…   Representations of linguistic inputs that capture    the meanings of those inputs For us it me...
MorphologySyntaxSemanticPragmaticDiscourse
 Requirements   Verifiability   Ambiguity                     Morphology   Canonical Form   Inference        Syntax  ...
MorphologySyntaxSemanticPragmaticDiscourse
 Pragmatics: concerns how sentences are used in different situations and how use                                         ...
Morphology                                           Syntax ‘He’, ‘it’, ‘his’ can be inferred from                       ...
MorphologySyntaxSemanticPragmaticDiscourse
MorphologySyntaxSemanticPragmaticDiscourse
MorphologySyntaxSemanticPragmaticDiscourse
MorphologySyntaxSemanticPragmaticDiscourse
MorphologySyntaxSemanticPragmaticDiscourse
 Wordnet Mindnet Stanford Tagger Stanford Parser ……..
 Machine translation Search engine Information extraction Chat bot
 Can we use previously translated text to learn how to translate new texts?   Yes! But, it’s not so easy   Two paradigm...
 Simplest: Translation Memory   If S new= S source in corpus, output aligned S target Compositional EBMT   If fragment...
 Requires an Interlingua - language-neutral Knowledge  Representation (KR) Philosophical debate: Is there an interlingua...
 Intelingua-based MT Rule-based MT
 Each approach has its own strength   Rapidly adaptable: statistical, example-based   Good grammar: rule-based (grammar...
 Google Yahoo Alta-vista Answer.com
 Spider - a browser-like program that downloads web pages. Crawler – a program that automatically follows all of the    ...
   Spider - a browser-like program that downloads web pages.   Crawler – a program that automatically follows all of the...
 Idea is to ‘extract’ particular types of information from  arbitrary text or transcribed speech Examples:   Names enti...
 Often use a set of simple templates or frames with slots to be filled in from input text. Ignore everything else.   Hus...
 Named Entity recognition (NE)   Finds and classifies names, places etc. Co-reference Resolution (CO)   Identifies ide...
 AIML = Artificial Intelligent Mark-up Language Alice
 A.L.I.C.E. (Artificial Linguistic Internet Computer Entity)   an award-winning free natural language artificial    inte...
 NLP’s course , Husni Al-Muhtaseb Lexical descriptions for Vietnamese language  processing . en.wikipedia.org www.xuly...
Natural language processing 2
Natural language processing 2
Natural language processing 2
Natural language processing 2
Natural language processing 2
Natural language processing 2
Natural language processing 2
Natural language processing 2
Natural language processing 2
Natural language processing 2
Natural language processing 2
Natural language processing 2
Natural language processing 2
Natural language processing 2
Natural language processing 2
Natural language processing 2
Natural language processing 2
Natural language processing 2
Natural language processing 2
Natural language processing 2
Natural language processing 2
Natural language processing 2
Natural language processing 2
Natural language processing 2
Natural language processing 2
Natural language processing 2
Natural language processing 2
Natural language processing 2
Upcoming SlideShare
Loading in...5
×

Natural language processing 2

757

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
757
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
70
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Natural language processing 2

  1. 1.  Overview Basic knowledge Demonstration
  2. 2. LOGOUSER : Men are all alike.ELIZA : In what way?USER : They’re always bugging us about something or other.ELIZA : Can you think of a specific example?USER : Well, my boyfriend made me come here.ELIZA : Your boyfriend made you come here ?USER : He says I’m depressed much of the time.ELIZA : I am sorry to hear you are depressed.USER : It’s true, I am unhappy.ELIZA : Do you think coming here will help you not to beunhappy?USER : I need some help; that much seems certain.ELIZA : What would it mean to you if you got some help?USER : Perhaps I could learn to get along with my mother.ELIZA : Tell me more about your family.USER : My mother takes care of me.ELIZA : Who else in your family takes care of you?USER : My father.ELIZA : Your father ?USER : You are like my father in some ways.ELIZA : What resemblance do you see?
  3. 3.  A sub-field of Artificial Intelligent, since 1960s … Concerned with the interactions between computers and human languages with one ultimate goal : Computers can “understand” human Many applications in real world
  4. 4.  Natural language unit?  Natural language understanding  Natural language generation Data?  Speech processing  Text processingNatural language text understanding!
  5. 5.  Task of generating natural language from a machine representation May be viewed as the opposite of natural language understanding . Applications:  Jokes generation  Textual summaries of databases  Enhancing accessibility
  6. 6.  An advanced subtopic of NLP deals with reading comprehension More complex than NLG Many commercial interest in this field  News-gathering  Data-Mining  Voice-Activation  Large-scale content analysis
  7. 7.  Logic is too clear, the lost of flexibility cause difficulties in NLP Examples :  Time flies like an arrow Can be understood in 7 ways !!!  I never said she stole my money !  Someone else said it, but I didnt.
  8. 8.  Logic is too clear, the lost of flexibility become difficulties in NLP Examples :  Time flies like an arrow Can be understood in 7 ways !!!  I never said she stole my money !  I simply didnt ever say it
  9. 9.  Logic is too clear, the lost of flexibility become difficulties in NLP Examples :  Time flies like an arrow Can be understood in 7 ways !!!  I never said she stole my money !  I might have implied it in some way, but I never explicitly said it
  10. 10.  Logic is too clear, the lost of flexibility become difficulties in NLP Examples :  Time flies like an arrow Can be understood in 7 ways !!!  I never said she stole my money !  I said someone took it; I didnt say it was she
  11. 11.  Logic is too clear, the lost of flexibility become difficulties in NLP Examples:  Time flies like an arrow Can be understood in 7 ways !!!  I never said she stole my money !  I just said she probably borrowed it
  12. 12.  Logic is too clear, the lost of flexibility become difficulties in NLP Examples :  Time flies like an arrow Can be understood in 7 ways !!!  I never said she stole my money !  I said she stole someone elses money
  13. 13.  Logic is too clear, the lost of flexibility become difficulties in NLP Examples :  Time flies like an arrow Can be understood in 7 ways !!!  I never said she stole my money !  I said she stole something, but not my money
  14. 14.  Words combination and division Stress placing on words The properties of subjects  We gave the monkeys the bananas because they were hungry  We gave the monkeys the bananas because they were over-ripe Specifying which word an adjective applies to  A pretty little girls school
  15. 15.  Involves reasoning about the world Embedded a social system of people interacting  persuading, insulting and amusing them  changing over time Homonymous
  16. 16.  Automatic Summarization
  17. 17.  Information Extraction
  18. 18.  Grammar Testing
  19. 19.  ePi Group:  Automatic Vietnamese processing system  www.baomoi.com  Collecting news from all Vietnamese e-newspapers EVTrans – Softex Co Ltd. Cyclop VnKim
  20. 20.  Morphological analysis : Individual words are analyzed into their components Syntactic analysis Linear sequence of words are transformed into structures that show how the words relate to each other Semantic analysis  A transformation is made from the input text to an internal representation that reflects the meaning Pragmatic analysis  To reinterpret what was said to what was actually meant Discourse analysis  Resolving references between sentences
  21. 21. MorphologySyntaxSemanticPragmaticDiscourse
  22. 22. MorphologySyntaxSemanticPragmaticDiscourse
  23. 23.  Morphemes: smallest meaningful unit spoken units of language.  Stem: book, cat, car, …  Affixes : un-, -s, -es, .. Morphology  Clitic: ‘ve, ‘m Syntax Semantic Morphological parsing: parsing a word Pragmatic into stem and affixes and identifying the Discourse parts and their relationships
  24. 24.  Word Classes  Parts of speech: noun, verb, adjectives, etc. Morphology  Word class dictates how a word combines with morphemes to form new words Syntax Semantic Examples Pragmatic  Books: book + s Discourse  Unladylike = un + lady + like
  25. 25.  Vietnamese?  Ăn = ăn Morphology  Uống = uống  Xe = xe Syntax Semantic No ‘Xes’ in Vietnamese! Pragmatic Problems are text tokenizing. Discourse
  26. 26.  Why parse words? Morphology  To identify a word’s part-of-speech  To identify a word’s stem (IR) Syntax Semantic… then? Pragmatic  Spell- checking Discourse  To predict next words  To predict the word’s accent
  27. 27.  Ambiguity  I want her to go to the cinema with me Morphology To - infinitive? Syntax To - preposition? Semantic Pragmatic  Con ngựa đá đá con ngựa đá. Discourse đá = đá?
  28. 28.  How to implement?  Regular expression  Finite State Transducers (FST)  Finite State Accepter (FSA) Morphology Syntax *.exe Semantic ir??man Pragmatic b[0-9]+ *(Mb|[Mm]egabytes?)b Discourse
  29. 29.  Relate terms:  Stem, stemming Morphology  Part of speech Syntax  N-gram Semantic Pragmatic Discourse
  30. 30. MorphologySyntaxSemanticPragmaticDiscourse
  31. 31. MorphologySYNTAX Syntax Semantic Pragmatic Discourse
  32. 32.  Linear sequence of words are transformed into structures that show how the words relate to each other. Morphology Determine grammatical structure. Syntax Semantic Pragmatic I am a boy = [Subject] [Verb] [Cardinal] [Noun] Discourse
  33. 33. MorphologySyntaxSemanticPragmaticDiscourse
  34. 34.  Syntax  Actual structure of a sentence Morphology Syntax Grammar Semantic  The rule set used in the analysis Pragmatic Discourse
  35. 35.  A grammar define syntactically legal sentences  I ate an apple (syntactic legal)  I ate apple (not syntactic legal)  I ate a building (syntactic legal, but?) Morphology Syntax doesn’t mean that it’s meaningful! Semantic Pragmatic Discourse
  36. 36.  Ambiguities Morphology Syntax Semantic Pragmatic Discourse
  37. 37. MorphologySyntaxSemanticPragmaticDiscourse
  38. 38. Morphology SyntaxSEMANTIC Semantic Pragmatic Discourse
  39. 39.  What could this mean…  Representations of linguistic inputs that capture the meanings of those inputs For us it means Morphology  Representations that permit or facilitate Syntax semantic processing  Permit us to reason about their truth Semantic (relationship to some world) Pragmatic  Permit us to answer questions based on their content Discourse  Permit us to perform inference (answer questions and determine the truth of things we don’t actually know)
  40. 40. MorphologySyntaxSemanticPragmaticDiscourse
  41. 41.  Requirements  Verifiability  Ambiguity Morphology  Canonical Form  Inference Syntax  Expressiveness Semantic Pragmatic Discourse
  42. 42. MorphologySyntaxSemanticPragmaticDiscourse
  43. 43.  Pragmatics: concerns how sentences are used in different situations and how use Morphology affects the interpretation of the sentence Syntax Semantic Discourse: concerns how the Pragmatic immediately preceding sentences affect Discourse the interpretation of the next sentence
  44. 44. Morphology Syntax ‘He’, ‘it’, ‘his’ can be inferred from Semantic previous sentence Pragmatic It’s discourse Discourse
  45. 45. MorphologySyntaxSemanticPragmaticDiscourse
  46. 46. MorphologySyntaxSemanticPragmaticDiscourse
  47. 47. MorphologySyntaxSemanticPragmaticDiscourse
  48. 48. MorphologySyntaxSemanticPragmaticDiscourse
  49. 49. MorphologySyntaxSemanticPragmaticDiscourse
  50. 50.  Wordnet Mindnet Stanford Tagger Stanford Parser ……..
  51. 51.  Machine translation Search engine Information extraction Chat bot
  52. 52.  Can we use previously translated text to learn how to translate new texts?  Yes! But, it’s not so easy  Two paradigms, statistical MT, and EBMT Requirements:  Aligned large parallel corpus of translated sentences  {S source  S target }  Bilingual dictionary for intra-S alignment  Generalization patterns (names, numbers, dates…)
  53. 53.  Simplest: Translation Memory  If S new= S source in corpus, output aligned S target Compositional EBMT  If fragment of Snew matches fragment of Ss, output corresponding fragment of aligned St  Prefer maximal-length fragments  Maximize grammatical compositionality  Via a target language grammar  Or, via an N-gram statistical language model
  54. 54.  Requires an Interlingua - language-neutral Knowledge Representation (KR) Philosophical debate: Is there an interlingua?  FOL is not totally language neutral (predicates, functions, expressed in a language)  Other near-interlinguas (Conceptual Dependency) Requires a fully-disambiguating parser  Domain model of legal objects, actions, relations Requires a NL generator (KR -> text) Applicable only to well-defined technical domains Produces high-quality MT in those domains
  55. 55.  Intelingua-based MT Rule-based MT
  56. 56.  Each approach has its own strength  Rapidly adaptable: statistical, example-based  Good grammar: rule-based (grammar)  High precision in narrow domain: Intelingua
  57. 57.  Google Yahoo Alta-vista Answer.com
  58. 58.  Spider - a browser-like program that downloads web pages. Crawler – a program that automatically follows all of the links on each web page. Indexer - a program that analyzes web pages downloaded by the spider and the crawler. Database– storage for downloaded and processed pages. Results engine – extracts search results from the database. Web server – a server that is responsible for interaction between the user and other search engine components.
  59. 59.  Spider - a browser-like program that downloads web pages. Crawler – a program that automatically follows all of the links on each web page. Indexer - a program that analyzes web pages downloaded by the spider and the crawler. Database– storage for downloaded and processed pages. Results engine – extracts search results from the database. Web server – a server that is responsible for interaction between the user and other search engine components.
  60. 60.  Idea is to ‘extract’ particular types of information from arbitrary text or transcribed speech Examples:  Names entities: people, places, organization  Telephone numbers  Dates Many uses:  Question answering systems, fisting of news or mail…  Job ads, financial information, terrorist attacks
  61. 61.  Often use a set of simple templates or frames with slots to be filled in from input text. Ignore everything else.  Husni’s number is 966-3-860-2624.  The inventor of the First plane was Abbas ibnu Fernas  The British King died in March of 1932.
  62. 62.  Named Entity recognition (NE)  Finds and classifies names, places etc. Co-reference Resolution (CO)  Identifies identity relations between entities in texts. Template Element construction (TE)  Adds descriptive information to NE results (using CO). Template Relation construction (TR)  Finds relations between TE entities. Scenario Template production (ST)  Fits TE and TR results into specified event scenarios.
  63. 63.  AIML = Artificial Intelligent Mark-up Language Alice
  64. 64.  A.L.I.C.E. (Artificial Linguistic Internet Computer Entity)  an award-winning free natural language artificial intelligence chat robot. Ruled-base Human-like answer without complicated “brain” Multi-language
  65. 65.  NLP’s course , Husni Al-Muhtaseb Lexical descriptions for Vietnamese language processing . en.wikipedia.org www.xulyngonngu.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×