Natural language processing 2
Upcoming SlideShare
Loading in...5
×
 

Natural language processing 2

on

  • 743 views

 

Statistics

Views

Total Views
743
Views on SlideShare
741
Embed Views
2

Actions

Likes
0
Downloads
59
Comments
0

2 Embeds 2

http://www.linkedin.com 1
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Natural language processing 2 Natural language processing 2 Presentation Transcript

    •  Overview Basic knowledge Demonstration
    • LOGOUSER : Men are all alike.ELIZA : In what way?USER : They’re always bugging us about something or other.ELIZA : Can you think of a specific example?USER : Well, my boyfriend made me come here.ELIZA : Your boyfriend made you come here ?USER : He says I’m depressed much of the time.ELIZA : I am sorry to hear you are depressed.USER : It’s true, I am unhappy.ELIZA : Do you think coming here will help you not to beunhappy?USER : I need some help; that much seems certain.ELIZA : What would it mean to you if you got some help?USER : Perhaps I could learn to get along with my mother.ELIZA : Tell me more about your family.USER : My mother takes care of me.ELIZA : Who else in your family takes care of you?USER : My father.ELIZA : Your father ?USER : You are like my father in some ways.ELIZA : What resemblance do you see?
    •  A sub-field of Artificial Intelligent, since 1960s … Concerned with the interactions between computers and human languages with one ultimate goal : Computers can “understand” human Many applications in real world
    •  Natural language unit?  Natural language understanding  Natural language generation Data?  Speech processing  Text processingNatural language text understanding!
    •  Task of generating natural language from a machine representation May be viewed as the opposite of natural language understanding . Applications:  Jokes generation  Textual summaries of databases  Enhancing accessibility
    •  An advanced subtopic of NLP deals with reading comprehension More complex than NLG Many commercial interest in this field  News-gathering  Data-Mining  Voice-Activation  Large-scale content analysis
    •  Logic is too clear, the lost of flexibility cause difficulties in NLP Examples :  Time flies like an arrow Can be understood in 7 ways !!!  I never said she stole my money !  Someone else said it, but I didnt.
    •  Logic is too clear, the lost of flexibility become difficulties in NLP Examples :  Time flies like an arrow Can be understood in 7 ways !!!  I never said she stole my money !  I simply didnt ever say it
    •  Logic is too clear, the lost of flexibility become difficulties in NLP Examples :  Time flies like an arrow Can be understood in 7 ways !!!  I never said she stole my money !  I might have implied it in some way, but I never explicitly said it
    •  Logic is too clear, the lost of flexibility become difficulties in NLP Examples :  Time flies like an arrow Can be understood in 7 ways !!!  I never said she stole my money !  I said someone took it; I didnt say it was she
    •  Logic is too clear, the lost of flexibility become difficulties in NLP Examples:  Time flies like an arrow Can be understood in 7 ways !!!  I never said she stole my money !  I just said she probably borrowed it
    •  Logic is too clear, the lost of flexibility become difficulties in NLP Examples :  Time flies like an arrow Can be understood in 7 ways !!!  I never said she stole my money !  I said she stole someone elses money
    •  Logic is too clear, the lost of flexibility become difficulties in NLP Examples :  Time flies like an arrow Can be understood in 7 ways !!!  I never said she stole my money !  I said she stole something, but not my money
    •  Words combination and division Stress placing on words The properties of subjects  We gave the monkeys the bananas because they were hungry  We gave the monkeys the bananas because they were over-ripe Specifying which word an adjective applies to  A pretty little girls school
    •  Involves reasoning about the world Embedded a social system of people interacting  persuading, insulting and amusing them  changing over time Homonymous
    •  Automatic Summarization
    •  Information Extraction
    •  Grammar Testing
    •  ePi Group:  Automatic Vietnamese processing system  www.baomoi.com  Collecting news from all Vietnamese e-newspapers EVTrans – Softex Co Ltd. Cyclop VnKim
    •  Morphological analysis : Individual words are analyzed into their components Syntactic analysis Linear sequence of words are transformed into structures that show how the words relate to each other Semantic analysis  A transformation is made from the input text to an internal representation that reflects the meaning Pragmatic analysis  To reinterpret what was said to what was actually meant Discourse analysis  Resolving references between sentences
    • MorphologySyntaxSemanticPragmaticDiscourse
    • MorphologySyntaxSemanticPragmaticDiscourse
    •  Morphemes: smallest meaningful unit spoken units of language.  Stem: book, cat, car, …  Affixes : un-, -s, -es, .. Morphology  Clitic: ‘ve, ‘m Syntax Semantic Morphological parsing: parsing a word Pragmatic into stem and affixes and identifying the Discourse parts and their relationships
    •  Word Classes  Parts of speech: noun, verb, adjectives, etc. Morphology  Word class dictates how a word combines with morphemes to form new words Syntax Semantic Examples Pragmatic  Books: book + s Discourse  Unladylike = un + lady + like
    •  Vietnamese?  Ăn = ăn Morphology  Uống = uống  Xe = xe Syntax Semantic No ‘Xes’ in Vietnamese! Pragmatic Problems are text tokenizing. Discourse
    •  Why parse words? Morphology  To identify a word’s part-of-speech  To identify a word’s stem (IR) Syntax Semantic… then? Pragmatic  Spell- checking Discourse  To predict next words  To predict the word’s accent
    •  Ambiguity  I want her to go to the cinema with me Morphology To - infinitive? Syntax To - preposition? Semantic Pragmatic  Con ngựa đá đá con ngựa đá. Discourse đá = đá?
    •  How to implement?  Regular expression  Finite State Transducers (FST)  Finite State Accepter (FSA) Morphology Syntax *.exe Semantic ir??man Pragmatic b[0-9]+ *(Mb|[Mm]egabytes?)b Discourse
    •  Relate terms:  Stem, stemming Morphology  Part of speech Syntax  N-gram Semantic Pragmatic Discourse
    • MorphologySyntaxSemanticPragmaticDiscourse
    • MorphologySYNTAX Syntax Semantic Pragmatic Discourse
    •  Linear sequence of words are transformed into structures that show how the words relate to each other. Morphology Determine grammatical structure. Syntax Semantic Pragmatic I am a boy = [Subject] [Verb] [Cardinal] [Noun] Discourse
    • MorphologySyntaxSemanticPragmaticDiscourse
    •  Syntax  Actual structure of a sentence Morphology Syntax Grammar Semantic  The rule set used in the analysis Pragmatic Discourse
    •  A grammar define syntactically legal sentences  I ate an apple (syntactic legal)  I ate apple (not syntactic legal)  I ate a building (syntactic legal, but?) Morphology Syntax doesn’t mean that it’s meaningful! Semantic Pragmatic Discourse
    •  Ambiguities Morphology Syntax Semantic Pragmatic Discourse
    • MorphologySyntaxSemanticPragmaticDiscourse
    • Morphology SyntaxSEMANTIC Semantic Pragmatic Discourse
    •  What could this mean…  Representations of linguistic inputs that capture the meanings of those inputs For us it means Morphology  Representations that permit or facilitate Syntax semantic processing  Permit us to reason about their truth Semantic (relationship to some world) Pragmatic  Permit us to answer questions based on their content Discourse  Permit us to perform inference (answer questions and determine the truth of things we don’t actually know)
    • MorphologySyntaxSemanticPragmaticDiscourse
    •  Requirements  Verifiability  Ambiguity Morphology  Canonical Form  Inference Syntax  Expressiveness Semantic Pragmatic Discourse
    • MorphologySyntaxSemanticPragmaticDiscourse
    •  Pragmatics: concerns how sentences are used in different situations and how use Morphology affects the interpretation of the sentence Syntax Semantic Discourse: concerns how the Pragmatic immediately preceding sentences affect Discourse the interpretation of the next sentence
    • Morphology Syntax ‘He’, ‘it’, ‘his’ can be inferred from Semantic previous sentence Pragmatic It’s discourse Discourse
    • MorphologySyntaxSemanticPragmaticDiscourse
    • MorphologySyntaxSemanticPragmaticDiscourse
    • MorphologySyntaxSemanticPragmaticDiscourse
    • MorphologySyntaxSemanticPragmaticDiscourse
    • MorphologySyntaxSemanticPragmaticDiscourse
    •  Wordnet Mindnet Stanford Tagger Stanford Parser ……..
    •  Machine translation Search engine Information extraction Chat bot
    •  Can we use previously translated text to learn how to translate new texts?  Yes! But, it’s not so easy  Two paradigms, statistical MT, and EBMT Requirements:  Aligned large parallel corpus of translated sentences  {S source  S target }  Bilingual dictionary for intra-S alignment  Generalization patterns (names, numbers, dates…)
    •  Simplest: Translation Memory  If S new= S source in corpus, output aligned S target Compositional EBMT  If fragment of Snew matches fragment of Ss, output corresponding fragment of aligned St  Prefer maximal-length fragments  Maximize grammatical compositionality  Via a target language grammar  Or, via an N-gram statistical language model
    •  Requires an Interlingua - language-neutral Knowledge Representation (KR) Philosophical debate: Is there an interlingua?  FOL is not totally language neutral (predicates, functions, expressed in a language)  Other near-interlinguas (Conceptual Dependency) Requires a fully-disambiguating parser  Domain model of legal objects, actions, relations Requires a NL generator (KR -> text) Applicable only to well-defined technical domains Produces high-quality MT in those domains
    •  Intelingua-based MT Rule-based MT
    •  Each approach has its own strength  Rapidly adaptable: statistical, example-based  Good grammar: rule-based (grammar)  High precision in narrow domain: Intelingua
    •  Google Yahoo Alta-vista Answer.com
    •  Spider - a browser-like program that downloads web pages. Crawler – a program that automatically follows all of the links on each web page. Indexer - a program that analyzes web pages downloaded by the spider and the crawler. Database– storage for downloaded and processed pages. Results engine – extracts search results from the database. Web server – a server that is responsible for interaction between the user and other search engine components.
    •  Spider - a browser-like program that downloads web pages. Crawler – a program that automatically follows all of the links on each web page. Indexer - a program that analyzes web pages downloaded by the spider and the crawler. Database– storage for downloaded and processed pages. Results engine – extracts search results from the database. Web server – a server that is responsible for interaction between the user and other search engine components.
    •  Idea is to ‘extract’ particular types of information from arbitrary text or transcribed speech Examples:  Names entities: people, places, organization  Telephone numbers  Dates Many uses:  Question answering systems, fisting of news or mail…  Job ads, financial information, terrorist attacks
    •  Often use a set of simple templates or frames with slots to be filled in from input text. Ignore everything else.  Husni’s number is 966-3-860-2624.  The inventor of the First plane was Abbas ibnu Fernas  The British King died in March of 1932.
    •  Named Entity recognition (NE)  Finds and classifies names, places etc. Co-reference Resolution (CO)  Identifies identity relations between entities in texts. Template Element construction (TE)  Adds descriptive information to NE results (using CO). Template Relation construction (TR)  Finds relations between TE entities. Scenario Template production (ST)  Fits TE and TR results into specified event scenarios.
    •  AIML = Artificial Intelligent Mark-up Language Alice
    •  A.L.I.C.E. (Artificial Linguistic Internet Computer Entity)  an award-winning free natural language artificial intelligence chat robot. Ruled-base Human-like answer without complicated “brain” Multi-language
    •  NLP’s course , Husni Al-Muhtaseb Lexical descriptions for Vietnamese language processing . en.wikipedia.org www.xulyngonngu.com