Natural language processing 2
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Natural language processing 2

on

  • 818 views

 

Statistics

Views

Total Views
818
Views on SlideShare
816
Embed Views
2

Actions

Likes
0
Downloads
61
Comments
0

2 Embeds 2

http://www.linkedin.com 1
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Natural language processing 2 Presentation Transcript

  • 1.  Overview Basic knowledge Demonstration
  • 2. LOGOUSER : Men are all alike.ELIZA : In what way?USER : They’re always bugging us about something or other.ELIZA : Can you think of a specific example?USER : Well, my boyfriend made me come here.ELIZA : Your boyfriend made you come here ?USER : He says I’m depressed much of the time.ELIZA : I am sorry to hear you are depressed.USER : It’s true, I am unhappy.ELIZA : Do you think coming here will help you not to beunhappy?USER : I need some help; that much seems certain.ELIZA : What would it mean to you if you got some help?USER : Perhaps I could learn to get along with my mother.ELIZA : Tell me more about your family.USER : My mother takes care of me.ELIZA : Who else in your family takes care of you?USER : My father.ELIZA : Your father ?USER : You are like my father in some ways.ELIZA : What resemblance do you see?
  • 3.  A sub-field of Artificial Intelligent, since 1960s … Concerned with the interactions between computers and human languages with one ultimate goal : Computers can “understand” human Many applications in real world
  • 4.  Natural language unit?  Natural language understanding  Natural language generation Data?  Speech processing  Text processingNatural language text understanding!
  • 5.  Task of generating natural language from a machine representation May be viewed as the opposite of natural language understanding . Applications:  Jokes generation  Textual summaries of databases  Enhancing accessibility
  • 6.  An advanced subtopic of NLP deals with reading comprehension More complex than NLG Many commercial interest in this field  News-gathering  Data-Mining  Voice-Activation  Large-scale content analysis
  • 7.  Logic is too clear, the lost of flexibility cause difficulties in NLP Examples :  Time flies like an arrow Can be understood in 7 ways !!!  I never said she stole my money !  Someone else said it, but I didnt.
  • 8.  Logic is too clear, the lost of flexibility become difficulties in NLP Examples :  Time flies like an arrow Can be understood in 7 ways !!!  I never said she stole my money !  I simply didnt ever say it
  • 9.  Logic is too clear, the lost of flexibility become difficulties in NLP Examples :  Time flies like an arrow Can be understood in 7 ways !!!  I never said she stole my money !  I might have implied it in some way, but I never explicitly said it
  • 10.  Logic is too clear, the lost of flexibility become difficulties in NLP Examples :  Time flies like an arrow Can be understood in 7 ways !!!  I never said she stole my money !  I said someone took it; I didnt say it was she
  • 11.  Logic is too clear, the lost of flexibility become difficulties in NLP Examples:  Time flies like an arrow Can be understood in 7 ways !!!  I never said she stole my money !  I just said she probably borrowed it
  • 12.  Logic is too clear, the lost of flexibility become difficulties in NLP Examples :  Time flies like an arrow Can be understood in 7 ways !!!  I never said she stole my money !  I said she stole someone elses money
  • 13.  Logic is too clear, the lost of flexibility become difficulties in NLP Examples :  Time flies like an arrow Can be understood in 7 ways !!!  I never said she stole my money !  I said she stole something, but not my money
  • 14.  Words combination and division Stress placing on words The properties of subjects  We gave the monkeys the bananas because they were hungry  We gave the monkeys the bananas because they were over-ripe Specifying which word an adjective applies to  A pretty little girls school
  • 15.  Involves reasoning about the world Embedded a social system of people interacting  persuading, insulting and amusing them  changing over time Homonymous
  • 16.  Automatic Summarization
  • 17.  Information Extraction
  • 18.  Grammar Testing
  • 19.  ePi Group:  Automatic Vietnamese processing system  www.baomoi.com  Collecting news from all Vietnamese e-newspapers EVTrans – Softex Co Ltd. Cyclop VnKim
  • 20.  Morphological analysis : Individual words are analyzed into their components Syntactic analysis Linear sequence of words are transformed into structures that show how the words relate to each other Semantic analysis  A transformation is made from the input text to an internal representation that reflects the meaning Pragmatic analysis  To reinterpret what was said to what was actually meant Discourse analysis  Resolving references between sentences
  • 21. MorphologySyntaxSemanticPragmaticDiscourse
  • 22. MorphologySyntaxSemanticPragmaticDiscourse
  • 23.  Morphemes: smallest meaningful unit spoken units of language.  Stem: book, cat, car, …  Affixes : un-, -s, -es, .. Morphology  Clitic: ‘ve, ‘m Syntax Semantic Morphological parsing: parsing a word Pragmatic into stem and affixes and identifying the Discourse parts and their relationships
  • 24.  Word Classes  Parts of speech: noun, verb, adjectives, etc. Morphology  Word class dictates how a word combines with morphemes to form new words Syntax Semantic Examples Pragmatic  Books: book + s Discourse  Unladylike = un + lady + like
  • 25.  Vietnamese?  Ăn = ăn Morphology  Uống = uống  Xe = xe Syntax Semantic No ‘Xes’ in Vietnamese! Pragmatic Problems are text tokenizing. Discourse
  • 26.  Why parse words? Morphology  To identify a word’s part-of-speech  To identify a word’s stem (IR) Syntax Semantic… then? Pragmatic  Spell- checking Discourse  To predict next words  To predict the word’s accent
  • 27.  Ambiguity  I want her to go to the cinema with me Morphology To - infinitive? Syntax To - preposition? Semantic Pragmatic  Con ngựa đá đá con ngựa đá. Discourse đá = đá?
  • 28.  How to implement?  Regular expression  Finite State Transducers (FST)  Finite State Accepter (FSA) Morphology Syntax *.exe Semantic ir??man Pragmatic b[0-9]+ *(Mb|[Mm]egabytes?)b Discourse
  • 29.  Relate terms:  Stem, stemming Morphology  Part of speech Syntax  N-gram Semantic Pragmatic Discourse
  • 30. MorphologySyntaxSemanticPragmaticDiscourse
  • 31. MorphologySYNTAX Syntax Semantic Pragmatic Discourse
  • 32.  Linear sequence of words are transformed into structures that show how the words relate to each other. Morphology Determine grammatical structure. Syntax Semantic Pragmatic I am a boy = [Subject] [Verb] [Cardinal] [Noun] Discourse
  • 33. MorphologySyntaxSemanticPragmaticDiscourse
  • 34.  Syntax  Actual structure of a sentence Morphology Syntax Grammar Semantic  The rule set used in the analysis Pragmatic Discourse
  • 35.  A grammar define syntactically legal sentences  I ate an apple (syntactic legal)  I ate apple (not syntactic legal)  I ate a building (syntactic legal, but?) Morphology Syntax doesn’t mean that it’s meaningful! Semantic Pragmatic Discourse
  • 36.  Ambiguities Morphology Syntax Semantic Pragmatic Discourse
  • 37. MorphologySyntaxSemanticPragmaticDiscourse
  • 38. Morphology SyntaxSEMANTIC Semantic Pragmatic Discourse
  • 39.  What could this mean…  Representations of linguistic inputs that capture the meanings of those inputs For us it means Morphology  Representations that permit or facilitate Syntax semantic processing  Permit us to reason about their truth Semantic (relationship to some world) Pragmatic  Permit us to answer questions based on their content Discourse  Permit us to perform inference (answer questions and determine the truth of things we don’t actually know)
  • 40. MorphologySyntaxSemanticPragmaticDiscourse
  • 41.  Requirements  Verifiability  Ambiguity Morphology  Canonical Form  Inference Syntax  Expressiveness Semantic Pragmatic Discourse
  • 42. MorphologySyntaxSemanticPragmaticDiscourse
  • 43.  Pragmatics: concerns how sentences are used in different situations and how use Morphology affects the interpretation of the sentence Syntax Semantic Discourse: concerns how the Pragmatic immediately preceding sentences affect Discourse the interpretation of the next sentence
  • 44. Morphology Syntax ‘He’, ‘it’, ‘his’ can be inferred from Semantic previous sentence Pragmatic It’s discourse Discourse
  • 45. MorphologySyntaxSemanticPragmaticDiscourse
  • 46. MorphologySyntaxSemanticPragmaticDiscourse
  • 47. MorphologySyntaxSemanticPragmaticDiscourse
  • 48. MorphologySyntaxSemanticPragmaticDiscourse
  • 49. MorphologySyntaxSemanticPragmaticDiscourse
  • 50.  Wordnet Mindnet Stanford Tagger Stanford Parser ……..
  • 51.  Machine translation Search engine Information extraction Chat bot
  • 52.  Can we use previously translated text to learn how to translate new texts?  Yes! But, it’s not so easy  Two paradigms, statistical MT, and EBMT Requirements:  Aligned large parallel corpus of translated sentences  {S source  S target }  Bilingual dictionary for intra-S alignment  Generalization patterns (names, numbers, dates…)
  • 53.  Simplest: Translation Memory  If S new= S source in corpus, output aligned S target Compositional EBMT  If fragment of Snew matches fragment of Ss, output corresponding fragment of aligned St  Prefer maximal-length fragments  Maximize grammatical compositionality  Via a target language grammar  Or, via an N-gram statistical language model
  • 54.  Requires an Interlingua - language-neutral Knowledge Representation (KR) Philosophical debate: Is there an interlingua?  FOL is not totally language neutral (predicates, functions, expressed in a language)  Other near-interlinguas (Conceptual Dependency) Requires a fully-disambiguating parser  Domain model of legal objects, actions, relations Requires a NL generator (KR -> text) Applicable only to well-defined technical domains Produces high-quality MT in those domains
  • 55.  Intelingua-based MT Rule-based MT
  • 56.  Each approach has its own strength  Rapidly adaptable: statistical, example-based  Good grammar: rule-based (grammar)  High precision in narrow domain: Intelingua
  • 57.  Google Yahoo Alta-vista Answer.com
  • 58.  Spider - a browser-like program that downloads web pages. Crawler – a program that automatically follows all of the links on each web page. Indexer - a program that analyzes web pages downloaded by the spider and the crawler. Database– storage for downloaded and processed pages. Results engine – extracts search results from the database. Web server – a server that is responsible for interaction between the user and other search engine components.
  • 59.  Spider - a browser-like program that downloads web pages. Crawler – a program that automatically follows all of the links on each web page. Indexer - a program that analyzes web pages downloaded by the spider and the crawler. Database– storage for downloaded and processed pages. Results engine – extracts search results from the database. Web server – a server that is responsible for interaction between the user and other search engine components.
  • 60.  Idea is to ‘extract’ particular types of information from arbitrary text or transcribed speech Examples:  Names entities: people, places, organization  Telephone numbers  Dates Many uses:  Question answering systems, fisting of news or mail…  Job ads, financial information, terrorist attacks
  • 61.  Often use a set of simple templates or frames with slots to be filled in from input text. Ignore everything else.  Husni’s number is 966-3-860-2624.  The inventor of the First plane was Abbas ibnu Fernas  The British King died in March of 1932.
  • 62.  Named Entity recognition (NE)  Finds and classifies names, places etc. Co-reference Resolution (CO)  Identifies identity relations between entities in texts. Template Element construction (TE)  Adds descriptive information to NE results (using CO). Template Relation construction (TR)  Finds relations between TE entities. Scenario Template production (ST)  Fits TE and TR results into specified event scenarios.
  • 63.  AIML = Artificial Intelligent Mark-up Language Alice
  • 64.  A.L.I.C.E. (Artificial Linguistic Internet Computer Entity)  an award-winning free natural language artificial intelligence chat robot. Ruled-base Human-like answer without complicated “brain” Multi-language
  • 65.  NLP’s course , Husni Al-Muhtaseb Lexical descriptions for Vietnamese language processing . en.wikipedia.org www.xulyngonngu.com