Natural language processing 2

  • 670 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
670
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
64
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1.  Overview Basic knowledge Demonstration
  • 2. LOGOUSER : Men are all alike.ELIZA : In what way?USER : They’re always bugging us about something or other.ELIZA : Can you think of a specific example?USER : Well, my boyfriend made me come here.ELIZA : Your boyfriend made you come here ?USER : He says I’m depressed much of the time.ELIZA : I am sorry to hear you are depressed.USER : It’s true, I am unhappy.ELIZA : Do you think coming here will help you not to beunhappy?USER : I need some help; that much seems certain.ELIZA : What would it mean to you if you got some help?USER : Perhaps I could learn to get along with my mother.ELIZA : Tell me more about your family.USER : My mother takes care of me.ELIZA : Who else in your family takes care of you?USER : My father.ELIZA : Your father ?USER : You are like my father in some ways.ELIZA : What resemblance do you see?
  • 3.  A sub-field of Artificial Intelligent, since 1960s … Concerned with the interactions between computers and human languages with one ultimate goal : Computers can “understand” human Many applications in real world
  • 4.  Natural language unit?  Natural language understanding  Natural language generation Data?  Speech processing  Text processingNatural language text understanding!
  • 5.  Task of generating natural language from a machine representation May be viewed as the opposite of natural language understanding . Applications:  Jokes generation  Textual summaries of databases  Enhancing accessibility
  • 6.  An advanced subtopic of NLP deals with reading comprehension More complex than NLG Many commercial interest in this field  News-gathering  Data-Mining  Voice-Activation  Large-scale content analysis
  • 7.  Logic is too clear, the lost of flexibility cause difficulties in NLP Examples :  Time flies like an arrow Can be understood in 7 ways !!!  I never said she stole my money !  Someone else said it, but I didnt.
  • 8.  Logic is too clear, the lost of flexibility become difficulties in NLP Examples :  Time flies like an arrow Can be understood in 7 ways !!!  I never said she stole my money !  I simply didnt ever say it
  • 9.  Logic is too clear, the lost of flexibility become difficulties in NLP Examples :  Time flies like an arrow Can be understood in 7 ways !!!  I never said she stole my money !  I might have implied it in some way, but I never explicitly said it
  • 10.  Logic is too clear, the lost of flexibility become difficulties in NLP Examples :  Time flies like an arrow Can be understood in 7 ways !!!  I never said she stole my money !  I said someone took it; I didnt say it was she
  • 11.  Logic is too clear, the lost of flexibility become difficulties in NLP Examples:  Time flies like an arrow Can be understood in 7 ways !!!  I never said she stole my money !  I just said she probably borrowed it
  • 12.  Logic is too clear, the lost of flexibility become difficulties in NLP Examples :  Time flies like an arrow Can be understood in 7 ways !!!  I never said she stole my money !  I said she stole someone elses money
  • 13.  Logic is too clear, the lost of flexibility become difficulties in NLP Examples :  Time flies like an arrow Can be understood in 7 ways !!!  I never said she stole my money !  I said she stole something, but not my money
  • 14.  Words combination and division Stress placing on words The properties of subjects  We gave the monkeys the bananas because they were hungry  We gave the monkeys the bananas because they were over-ripe Specifying which word an adjective applies to  A pretty little girls school
  • 15.  Involves reasoning about the world Embedded a social system of people interacting  persuading, insulting and amusing them  changing over time Homonymous
  • 16.  Automatic Summarization
  • 17.  Information Extraction
  • 18.  Grammar Testing
  • 19.  ePi Group:  Automatic Vietnamese processing system  www.baomoi.com  Collecting news from all Vietnamese e-newspapers EVTrans – Softex Co Ltd. Cyclop VnKim
  • 20.  Morphological analysis : Individual words are analyzed into their components Syntactic analysis Linear sequence of words are transformed into structures that show how the words relate to each other Semantic analysis  A transformation is made from the input text to an internal representation that reflects the meaning Pragmatic analysis  To reinterpret what was said to what was actually meant Discourse analysis  Resolving references between sentences
  • 21. MorphologySyntaxSemanticPragmaticDiscourse
  • 22. MorphologySyntaxSemanticPragmaticDiscourse
  • 23.  Morphemes: smallest meaningful unit spoken units of language.  Stem: book, cat, car, …  Affixes : un-, -s, -es, .. Morphology  Clitic: ‘ve, ‘m Syntax Semantic Morphological parsing: parsing a word Pragmatic into stem and affixes and identifying the Discourse parts and their relationships
  • 24.  Word Classes  Parts of speech: noun, verb, adjectives, etc. Morphology  Word class dictates how a word combines with morphemes to form new words Syntax Semantic Examples Pragmatic  Books: book + s Discourse  Unladylike = un + lady + like
  • 25.  Vietnamese?  Ăn = ăn Morphology  Uống = uống  Xe = xe Syntax Semantic No ‘Xes’ in Vietnamese! Pragmatic Problems are text tokenizing. Discourse
  • 26.  Why parse words? Morphology  To identify a word’s part-of-speech  To identify a word’s stem (IR) Syntax Semantic… then? Pragmatic  Spell- checking Discourse  To predict next words  To predict the word’s accent
  • 27.  Ambiguity  I want her to go to the cinema with me Morphology To - infinitive? Syntax To - preposition? Semantic Pragmatic  Con ngựa đá đá con ngựa đá. Discourse đá = đá?
  • 28.  How to implement?  Regular expression  Finite State Transducers (FST)  Finite State Accepter (FSA) Morphology Syntax *.exe Semantic ir??man Pragmatic b[0-9]+ *(Mb|[Mm]egabytes?)b Discourse
  • 29.  Relate terms:  Stem, stemming Morphology  Part of speech Syntax  N-gram Semantic Pragmatic Discourse
  • 30. MorphologySyntaxSemanticPragmaticDiscourse
  • 31. MorphologySYNTAX Syntax Semantic Pragmatic Discourse
  • 32.  Linear sequence of words are transformed into structures that show how the words relate to each other. Morphology Determine grammatical structure. Syntax Semantic Pragmatic I am a boy = [Subject] [Verb] [Cardinal] [Noun] Discourse
  • 33. MorphologySyntaxSemanticPragmaticDiscourse
  • 34.  Syntax  Actual structure of a sentence Morphology Syntax Grammar Semantic  The rule set used in the analysis Pragmatic Discourse
  • 35.  A grammar define syntactically legal sentences  I ate an apple (syntactic legal)  I ate apple (not syntactic legal)  I ate a building (syntactic legal, but?) Morphology Syntax doesn’t mean that it’s meaningful! Semantic Pragmatic Discourse
  • 36.  Ambiguities Morphology Syntax Semantic Pragmatic Discourse
  • 37. MorphologySyntaxSemanticPragmaticDiscourse
  • 38. Morphology SyntaxSEMANTIC Semantic Pragmatic Discourse
  • 39.  What could this mean…  Representations of linguistic inputs that capture the meanings of those inputs For us it means Morphology  Representations that permit or facilitate Syntax semantic processing  Permit us to reason about their truth Semantic (relationship to some world) Pragmatic  Permit us to answer questions based on their content Discourse  Permit us to perform inference (answer questions and determine the truth of things we don’t actually know)
  • 40. MorphologySyntaxSemanticPragmaticDiscourse
  • 41.  Requirements  Verifiability  Ambiguity Morphology  Canonical Form  Inference Syntax  Expressiveness Semantic Pragmatic Discourse
  • 42. MorphologySyntaxSemanticPragmaticDiscourse
  • 43.  Pragmatics: concerns how sentences are used in different situations and how use Morphology affects the interpretation of the sentence Syntax Semantic Discourse: concerns how the Pragmatic immediately preceding sentences affect Discourse the interpretation of the next sentence
  • 44. Morphology Syntax ‘He’, ‘it’, ‘his’ can be inferred from Semantic previous sentence Pragmatic It’s discourse Discourse
  • 45. MorphologySyntaxSemanticPragmaticDiscourse
  • 46. MorphologySyntaxSemanticPragmaticDiscourse
  • 47. MorphologySyntaxSemanticPragmaticDiscourse
  • 48. MorphologySyntaxSemanticPragmaticDiscourse
  • 49. MorphologySyntaxSemanticPragmaticDiscourse
  • 50.  Wordnet Mindnet Stanford Tagger Stanford Parser ……..
  • 51.  Machine translation Search engine Information extraction Chat bot
  • 52.  Can we use previously translated text to learn how to translate new texts?  Yes! But, it’s not so easy  Two paradigms, statistical MT, and EBMT Requirements:  Aligned large parallel corpus of translated sentences  {S source  S target }  Bilingual dictionary for intra-S alignment  Generalization patterns (names, numbers, dates…)
  • 53.  Simplest: Translation Memory  If S new= S source in corpus, output aligned S target Compositional EBMT  If fragment of Snew matches fragment of Ss, output corresponding fragment of aligned St  Prefer maximal-length fragments  Maximize grammatical compositionality  Via a target language grammar  Or, via an N-gram statistical language model
  • 54.  Requires an Interlingua - language-neutral Knowledge Representation (KR) Philosophical debate: Is there an interlingua?  FOL is not totally language neutral (predicates, functions, expressed in a language)  Other near-interlinguas (Conceptual Dependency) Requires a fully-disambiguating parser  Domain model of legal objects, actions, relations Requires a NL generator (KR -> text) Applicable only to well-defined technical domains Produces high-quality MT in those domains
  • 55.  Intelingua-based MT Rule-based MT
  • 56.  Each approach has its own strength  Rapidly adaptable: statistical, example-based  Good grammar: rule-based (grammar)  High precision in narrow domain: Intelingua
  • 57.  Google Yahoo Alta-vista Answer.com
  • 58.  Spider - a browser-like program that downloads web pages. Crawler – a program that automatically follows all of the links on each web page. Indexer - a program that analyzes web pages downloaded by the spider and the crawler. Database– storage for downloaded and processed pages. Results engine – extracts search results from the database. Web server – a server that is responsible for interaction between the user and other search engine components.
  • 59.  Spider - a browser-like program that downloads web pages. Crawler – a program that automatically follows all of the links on each web page. Indexer - a program that analyzes web pages downloaded by the spider and the crawler. Database– storage for downloaded and processed pages. Results engine – extracts search results from the database. Web server – a server that is responsible for interaction between the user and other search engine components.
  • 60.  Idea is to ‘extract’ particular types of information from arbitrary text or transcribed speech Examples:  Names entities: people, places, organization  Telephone numbers  Dates Many uses:  Question answering systems, fisting of news or mail…  Job ads, financial information, terrorist attacks
  • 61.  Often use a set of simple templates or frames with slots to be filled in from input text. Ignore everything else.  Husni’s number is 966-3-860-2624.  The inventor of the First plane was Abbas ibnu Fernas  The British King died in March of 1932.
  • 62.  Named Entity recognition (NE)  Finds and classifies names, places etc. Co-reference Resolution (CO)  Identifies identity relations between entities in texts. Template Element construction (TE)  Adds descriptive information to NE results (using CO). Template Relation construction (TR)  Finds relations between TE entities. Scenario Template production (ST)  Fits TE and TR results into specified event scenarios.
  • 63.  AIML = Artificial Intelligent Mark-up Language Alice
  • 64.  A.L.I.C.E. (Artificial Linguistic Internet Computer Entity)  an award-winning free natural language artificial intelligence chat robot. Ruled-base Human-like answer without complicated “brain” Multi-language
  • 65.  NLP’s course , Husni Al-Muhtaseb Lexical descriptions for Vietnamese language processing . en.wikipedia.org www.xulyngonngu.com