4. raw (unstructured) text part-of-speech tagging named entity recognition deep syntactic parsing annotated (structured) text Natural Language Processing ……………………………… ..………………………………………….……….... ... Secretion of TNF was abolished by BHA in PMA-stimulated U937 cells. …………………………………………………………….. Secretion of TNF was abolished by BHA in PMA-stimulated U937 cells . NN IN NN VBZ VBN IN NN IN JJ NN NNS . PP PP NP PP VP VP NP NP S Source: personalpages.manchester.ac.uk/staff/Sophia.Ananiadou/ DTCII .ppt
5.
6.
7.
8.
9. Sentence splitting sentence boundary = period + space(s) + capital letter Unusually, the gender of crocodiles is determined by temperature. If the eggs are incubated tat over 33c, then the egg hatches into a male or 'bull' crocodile. At lower temperatures only female or 'cow' crocodiles develop. Unusually, the gender of crocodiles is determined by temperature. If the eggs are incubated tat over 33c, then the egg hatches into a male or 'bull' crocodile. At lower temperatures only female or 'cow' crocodiles develop.
10. sentDetect(s, language = "en", model = NULL) A character vector with texts from which sentences should be detected. A character string giving the language of s. This argument is only used if model is NULL for selecting a default model. A model. If model is NULL then a default model for sentence detection is loaded from the corresponding openNLP models language package. s language model http://opennlp.sourceforge.net/
11.
12. Tokenization "A Saudi Arabian woman can get a divorce if her husband doesn't give her coffee." " A Saudi Arabian woman can get a divorce if her husband does n't give her coffee . "
13. Part-of-speech tagging Assign a part-of-speech tag to each token in a sentence. Most/ JJS lipstick/ NN is/ VBZ partially/ RB made/ VBN of/ IN fish/ NN scales/ NNS Most lipstick is partially made of fish scales tagPOS(sentence, language = "en", model = NULL, tagdict = NULL) http://opennlp.sourceforge.net/
14. Part of speech tags 1 CC - Coordinating conjunction CD - Cardinal number DT - Determiner EX - Existential there FW - Foreign word IN - Preposition or subordinating conjunction JJ - Adjective JJR - Adjective, comparative JJS - Adjective, superlative NN - Noun, singular or mass NNS - Noun, plural NNP - Proper noun, singular NNPS - Proper noun, plural PDT – Predeterminer NP - Noun Phrase. PP - Prepositional Phrase VP - Verb Phrase. PRP - Personal pronoun RB - Adverb RBR - Adverb, comparative RBS - Adverb, superlative RP - Particle SYM - Symbol TO - to UH - Interjection VB - Verb, base form VBD - Verb, past tense VBG - Verb, gerund or present participle VBN - Verb, past participle VBP - Verb, non-3rd person singular present VBZ - Verb, 3rd person singular present WDT - Wh-determiner WP - Wh-pronoun WRB - Wh-adverb 1 http://bulba.sdsu.edu/jeanette/thesis/PennTags.html
15.
16. Named-Entity Recognition Diana Hayden was in Philadelphia city on 3rd october <namefind/person> Diana Hayden </namefind/person> was in<namefind/location> Philadelphia </namefind/location> city on<namefind/date> 3rd october </namefind/date>
17. Chunking (shallow parsing) He reckons the current account deficit will narrow to NP VP NP VP PP only # 1.8 billion in September . NP PP NP A chunker (shallow parser) segments a sentence into meaningful phrases. Source: personalpages.manchester.ac.uk/staff/Sophia.Ananiadou/ DTCII .ppt
18. Tree bank parser It tags tokens and groups phrases into a tree. (TOP (S (NP (DT A ) (NN hospital ) (NN bed )) (VP (VBZ is ) (NP (NP (DT a ) (VBN parked ) (NN taxi )) (PP (IN with ) (NP (DT the ) (NN meter ) (VBG running ))))))) A hospital bed is a parked taxi with the meter running
19. S NP VP DT NN NN VBZ NP NP DT VBN NN PP IN NP DT NN VBG a hospital bed is a parked taxi with the meter running Visualization of Treebank Parser