Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Samatha  Gagan  Sunil
What is NLP? <ul><li>NLP provides means of analyzing text  </li></ul><ul><li>The goal of NLP is to make computers analyze ...
Why Natural Language Processing? <ul><li>kJfmmfj  mmmvvv  nnnffn333 </li></ul><ul><li>Uj iheale eleee mnster vensi credur ...
raw (unstructured) text part-of-speech tagging named entity recognition deep syntactic parsing annotated (structured) text...
Uses of NLP <ul><ul><li>Text based application </li></ul></ul><ul><ul><li>Dialogue based application </li></ul></ul><ul><u...
What is  ? <ul><li>OpenNLP is a open source, java-based NLP tools which perform  </li></ul><ul><li>sentence detection, </l...
Use of openNLP in our University project <ul><li>It can be used in  “searching”  names using  Named entity recognition. </...
OpenNLP is used for: <ul><li>Sentence splitting </li></ul><ul><li>Tokenization </li></ul><ul><li>Part-of-speech tagging </...
Sentence splitting sentence boundary  = period + space(s) + capital letter Unusually, the gender of crocodiles is determin...
sentDetect(s, language = &quot;en&quot;, model = NULL)   A character vector with texts from which sentences  should be det...
Tokenization <ul><li>Convert a sentence into a sequence of  tokens </li></ul><ul><li>Divides the text into smallest units ...
Tokenization &quot;A Saudi Arabian woman can get a divorce if her husband doesn't give her coffee.&quot; &quot; A Saudi Ar...
Part-of-speech tagging Assign a part-of-speech tag to each token in a sentence. Most/ JJS  lipstick/ NN  is/ VBZ  partiall...
Part of speech tags 1 CC  - Coordinating conjunction CD   - Cardinal number DT   - Determiner EX   - Existential there FW ...
Named-Entity Recognition <ul><li>Named entity recognition classify tokens in text into predefined categories such as date,...
Named-Entity Recognition Diana Hayden  was in Philadelphia city  on 3rd october <namefind/person> Diana Hayden </namefind/...
Chunking (shallow parsing) He   reckons   the  current  account  deficit   will  narrow   to NP  VP  NP  VP  PP only  #   ...
Tree bank parser It tags tokens and groups phrases into a tree. (TOP (S (NP (DT  A ) (NN  hospital ) (NN  bed )) (VP (VBZ ...
S NP VP DT NN NN VBZ NP NP DT VBN NN PP IN NP DT NN VBG a hospital bed is a parked taxi with the meter running Visualizati...
 
Upcoming SlideShare
Loading in …5
×

OpenNLP demo

35,678 views

Published on

this ppt was prepared on ubuntu ,so might effect some formatting while opened in windows

Published in: Technology
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Thank you
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Thanks for your sharing.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Thanks :-)
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • I just posted a whitepaper on SlideShare describing the differences between lemmatization and stemming which may be of interest:

    http://www.slideshare.net/andrew_paulsen/high-quality-search-european-languages
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

OpenNLP demo

  1. 1. Samatha Gagan Sunil
  2. 2. What is NLP? <ul><li>NLP provides means of analyzing text </li></ul><ul><li>The goal of NLP is to make computers analyze and understand the languages that humans use naturally </li></ul><ul><li>Interaction between Computers-Humans </li></ul>
  3. 3. Why Natural Language Processing? <ul><li>kJfmmfj mmmvvv nnnffn333 </li></ul><ul><li>Uj iheale eleee mnster vensi credur </li></ul><ul><li>Baboi oi cestnitze </li></ul><ul><li>Computers “see” text in English the same way you have seen above! </li></ul><ul><li>People have no trouble understanding language </li></ul><ul><li>Computers have </li></ul><ul><ul><li>No common sense knowledge </li></ul></ul><ul><ul><li>No reasoning capacity </li></ul></ul>
  4. 4. raw (unstructured) text part-of-speech tagging named entity recognition deep syntactic parsing annotated (structured) text Natural Language Processing ……………………………… ..………………………………………….……….... ... Secretion of TNF was abolished by BHA in PMA-stimulated U937 cells. …………………………………………………………….. Secretion of TNF was abolished by BHA in PMA-stimulated U937 cells . NN IN NN VBZ VBN IN NN IN JJ NN NNS . PP PP NP PP VP VP NP NP S Source: personalpages.manchester.ac.uk/staff/Sophia.Ananiadou/ DTCII .ppt
  5. 5. Uses of NLP <ul><ul><li>Text based application </li></ul></ul><ul><ul><li>Dialogue based application </li></ul></ul><ul><ul><li>Information extraction </li></ul></ul><ul><li>Extract useful information. e.g. resumes </li></ul><ul><li>Automatic summarization </li></ul><ul><li>Condense 1 book into 1 page </li></ul>
  6. 6. What is ? <ul><li>OpenNLP is a open source, java-based NLP tools which perform </li></ul><ul><li>sentence detection, </li></ul><ul><li>Tokenization, </li></ul><ul><li>pos-tagging, </li></ul><ul><li>parsing, </li></ul><ul><li>named-entity detection </li></ul><ul><li>using the OpenNLP package. 1 </li></ul>1 http://opennlp.sourceforge.net/
  7. 7. Use of openNLP in our University project <ul><li>It can be used in “searching” names using Named entity recognition. </li></ul>
  8. 8. OpenNLP is used for: <ul><li>Sentence splitting </li></ul><ul><li>Tokenization </li></ul><ul><li>Part-of-speech tagging </li></ul><ul><li>Named entity recognition </li></ul><ul><li>Chunking </li></ul><ul><li>Treebank Parser </li></ul>
  9. 9. Sentence splitting sentence boundary = period + space(s) + capital letter Unusually, the gender of crocodiles is determined by temperature. If the eggs are incubated tat over 33c, then the egg hatches into a male or 'bull' crocodile. At lower temperatures only female or 'cow' crocodiles develop. Unusually, the gender of crocodiles is determined by temperature. If the eggs are incubated tat over 33c, then the egg hatches into a male or 'bull' crocodile. At lower temperatures only female or 'cow' crocodiles develop.
  10. 10. sentDetect(s, language = &quot;en&quot;, model = NULL) A character vector with texts from which sentences should be detected. A character string giving the language of s. This argument is only used if model is NULL for selecting a default model. A model. If model is NULL then a default model for sentence detection is loaded from the corresponding openNLP models language package. s language model http://opennlp.sourceforge.net/
  11. 11. Tokenization <ul><li>Convert a sentence into a sequence of tokens </li></ul><ul><li>Divides the text into smallest units (usually words), removing punctuation. </li></ul><ul><li>Rule: </li></ul><ul><li>Use spaces as the boundaries </li></ul><ul><li>Adds spaces before and after special characters </li></ul>tokenize(s, language = &quot;en&quot;, model = NULL) http://opennlp.sourceforge.net/
  12. 12. Tokenization &quot;A Saudi Arabian woman can get a divorce if her husband doesn't give her coffee.&quot; &quot; A Saudi Arabian woman can get a divorce if her husband does n't give her coffee . &quot;
  13. 13. Part-of-speech tagging Assign a part-of-speech tag to each token in a sentence. Most/ JJS lipstick/ NN is/ VBZ partially/ RB made/ VBN of/ IN fish/ NN scales/ NNS Most lipstick is partially made of fish scales tagPOS(sentence, language = &quot;en&quot;, model = NULL, tagdict = NULL) http://opennlp.sourceforge.net/
  14. 14. Part of speech tags 1 CC - Coordinating conjunction CD - Cardinal number DT - Determiner EX - Existential there FW - Foreign word IN - Preposition or subordinating conjunction JJ - Adjective JJR - Adjective, comparative JJS - Adjective, superlative NN - Noun, singular or mass NNS - Noun, plural NNP - Proper noun, singular NNPS - Proper noun, plural PDT – Predeterminer NP - Noun Phrase. PP - Prepositional Phrase VP - Verb Phrase. PRP - Personal pronoun RB - Adverb RBR - Adverb, comparative RBS - Adverb, superlative RP - Particle SYM - Symbol TO - to UH - Interjection VB - Verb, base form VBD - Verb, past tense VBG - Verb, gerund or present participle VBN - Verb, past participle VBP - Verb, non-3rd person singular present VBZ - Verb, 3rd person singular present WDT - Wh-determiner WP - Wh-pronoun WRB - Wh-adverb 1 http://bulba.sdsu.edu/jeanette/thesis/PennTags.html
  15. 15. Named-Entity Recognition <ul><li>Named entity recognition classify tokens in text into predefined categories such as date, location, person, time. </li></ul><ul><li>The name finder can find up to seven different types of entities - date, location, money, organization, percentage, person, and time. </li></ul>
  16. 16. Named-Entity Recognition Diana Hayden was in Philadelphia city on 3rd october <namefind/person> Diana Hayden </namefind/person> was in<namefind/location> Philadelphia </namefind/location> city on<namefind/date> 3rd october </namefind/date>
  17. 17. Chunking (shallow parsing) He reckons the current account deficit will narrow to NP VP NP VP PP only # 1.8 billion in September . NP PP NP A chunker (shallow parser) segments a sentence into meaningful phrases. Source: personalpages.manchester.ac.uk/staff/Sophia.Ananiadou/ DTCII .ppt
  18. 18. Tree bank parser It tags tokens and groups phrases into a tree. (TOP (S (NP (DT A ) (NN hospital ) (NN bed )) (VP (VBZ is ) (NP (NP (DT a ) (VBN parked ) (NN taxi )) (PP (IN with ) (NP (DT the ) (NN meter ) (VBG running ))))))) A hospital bed is a parked taxi with the meter running
  19. 19. S NP VP DT NN NN VBZ NP NP DT VBN NN PP IN NP DT NN VBG a hospital bed is a parked taxi with the meter running Visualization of Treebank Parser

×