0
NLP/NIF                 Knowledge and Media 2012-2013                 Lecture 11Monday, December 3, 12
Monday, December 3, 12
Overview                  Natural Language Processing 101                  The NLP pipeline                  NLP tasks    ...
NLP: What is it?                  NLP or text analytics adds semantic understanding of:                         named enti...
80% of relevant information to businesses is in                  ‘unstructured’ textual form:                         web ...
NLP: What is it for?                  NLP transforms unstructured text into structured                  information which ...
NLP: Some history                  1950 - 1980: Handwritten rules                         Russian - English translation sy...
NLP: Tasks                              IMAGE SOURCE: HTTP://NLTK.ORG/IMAGES/DIALOGUE.PNGMonday, December 3, 12
Morphological/Lexical                 Analysis                  Language identification                  Tokenisation      ...
Syntactic Analysis                  Text segmentation                  Part of Speech (POS) tagging                  Chunk...
Semantic Analysis                  Named entity recognition (NER)                  Relation finding                  Semant...
Semantic Analysis (ctd)                  Topic detection/segmentation                  Machine Translation (MT)           ...
NLP: Approaches                  Rule-based                  Statistical                  Hybrid methodsMonday, December 3...
Named Entity                 Recognition ExplainedMonday, December 3, 12
NER: State-of-the-Art                  Statistical methods: Conditional Random Fields                  (CRF)              ...
Precision                     How many predictions were correct?                     P=TP/(TP+FP)                         ...
Recall                     Of the total number of instances in a class, how many                     were found?          ...
F-Score                     Harmonic mean of Precision and Recall                     F=2 • P • R/(P+R)                   ...
Machine Learning 101            Training                  1. Collect a set of representative training documents           ...
k-NN                         HTTP://WWW.YOUTUBE.COM/USER/ANTALVANDENBOSCH#P/                                          U/2/...
NER Training Data                  IOB Scheme                  Inside, Outside, Begin                  For each type of en...
Features for text                 learning task                  Is the word capitalised?                  Is the word at ...
Relation Finding                 Explained                Amphibia   AnuraMonday, December 3, 12
Relation Finding:                 State-of-the-Art                  Induce relation-dictionaries using slot filling        ...
Relation Finding: pattern                 finding over shallow parses                                     relation         ...
RL for domain modelling                                                                           Species                 ...
RL for template filling                  Date    Ship       Type      Crew   Ransom          2005/04/10 Feisty Gas LNG carr...
Opinion Mining                 ExplainedMonday, December 3, 12
Opinion Mining:                 State-of-the-Art                  Supervised learning using features such as:             ...
Positive or negative?                  “I bought an iPhone a few days ago. It was such a                  nice phone. The ...
IBM’s Watson                         HTTP://WWW.YOUTUBE.COM/WATCH?V=DYWO4ZKSFXWMonday, December 3, 12
NLP: Challenges                  Negation                  Messy text (twitter and SMS language)                  Domain a...
NIF: Natural Language                 Processing Interchange FormatMonday, December 3, 12
Monday, December 3, 12
Look familiar?Monday, December 3, 12
NIF: Why do we need it?                  Integration of NLP tools                  Bridge between LOD and NLP communitiesM...
NIF Claims            1.    NIF provides global interoperability. If an NLP tool incorporates a NIF parser and a NIF seria...
Structural                 interoperability                  NIF specifies how to create an identifier for                  ...
Conceptual                 Interoperability                  Lemma and stem annotations are data type                  pro...
Access Interoperability                  Main interface: wrapper to NIF Web service                                     IM...
NLP/NIF: Wrap up                  NLP History and tasks                  Machine learning 101                  Use-cases N...
Further reading/Tools                  Peter Jackson and Isabelle Moulinier (2007)Natural                  Language Proces...
Upcoming SlideShare
Loading in...5
×

KM Lecture11 nlp/nif

1,176

Published on

Slides of the 11th lecture in the 2012 Knowledge and Media course concerning natural language processing

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,176
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "KM Lecture11 nlp/nif"

  1. 1. NLP/NIF Knowledge and Media 2012-2013 Lecture 11Monday, December 3, 12
  2. 2. Monday, December 3, 12
  3. 3. Overview Natural Language Processing 101 The NLP pipeline NLP tasks NLP Challenges NIF (NLP Interchange Format)Monday, December 3, 12
  4. 4. NLP: What is it? NLP or text analytics adds semantic understanding of: named entities: people, companies, locations, etc. pattern-based entities: email-addresses, phone numbers concepts: abstractions of entities facts and relationships concrete and abstract attributes (e.g., 5 years, expensive) subjectivity in the form of opinions, sentiments and emotions SLIDE INSPIRATION: HTTP://WWW.SLIDESHARE.NET/SETHGRIMES/TEXT-ANALYTICS-OVERVIEW-2011Monday, December 3, 12
  5. 5. 80% of relevant information to businesses is in ‘unstructured’ textual form: web pages, news and blog articles, forum postings, other social media email and messages surveys, feedback forms, warranty claims scientific literature, books, legal documents, patents ... SLIDE INSPIRATION: HTTP://WWW.SLIDESHARE.NET/SETHGRIMES/TEXT-ANALYTICS-OVERVIEW-2011Monday, December 3, 12
  6. 6. NLP: What is it for? NLP transforms unstructured text into structured information which may be: categorised queried mined for patterns, topics or themes presented intelligently visualised and explored SLIDE INSPIRATION: HTTP://WWW.SLIDESHARE.NET/SETHGRIMES/TEXT-ANALYTICS-OVERVIEW-2011Monday, December 3, 12
  7. 7. NLP: Some history 1950 - 1980: Handwritten rules Russian - English translation system ELIZA Since 1980: Machine learning IBM’s WatsonMonday, December 3, 12
  8. 8. NLP: Tasks IMAGE SOURCE: HTTP://NLTK.ORG/IMAGES/DIALOGUE.PNGMonday, December 3, 12
  9. 9. Morphological/Lexical Analysis Language identification Tokenisation Stemming/LemmatisationMonday, December 3, 12
  10. 10. Syntactic Analysis Text segmentation Part of Speech (POS) tagging Chunking Shallow ParsingMonday, December 3, 12
  11. 11. Semantic Analysis Named entity recognition (NER) Relation finding Semantic role labelling (SRL) Word-sense disambiguation (WSD) Co-reference/anaphora resolutionMonday, December 3, 12
  12. 12. Semantic Analysis (ctd) Topic detection/segmentation Machine Translation (MT) Sentiment analysis/opinion mining Automatic summarisationMonday, December 3, 12
  13. 13. NLP: Approaches Rule-based Statistical Hybrid methodsMonday, December 3, 12
  14. 14. Named Entity Recognition ExplainedMonday, December 3, 12
  15. 15. NER: State-of-the-Art Statistical methods: Conditional Random Fields (CRF) Precision: 92.15% Recall: 92.39% F-Measure: 92.27%Monday, December 3, 12
  16. 16. Precision How many predictions were correct? P=TP/(TP+FP) ACTUAL Spam Not Spam True Positive False Positive Spam PREDICTED (TP) (FP) False True Negative Not Spam Negative (FN) (TN)Monday, December 3, 12
  17. 17. Recall Of the total number of instances in a class, how many were found? R=TP/(TP+FN) ACTUAL Spam Not Spam True Positive False Positive Spam PREDICTED (TP) (FP) False True Negative Not Spam Negative (FN) (TN)Monday, December 3, 12
  18. 18. F-Score Harmonic mean of Precision and Recall F=2 • P • R/(P+R) [Acc=(TP+TN)/(TP+FP+FN+TN)] ACTUAL Spam Not Spam True Positive False Positive Spam PREDICTED (TP) (FP) False True Negative Not Spam Negative (FN) (TN)Monday, December 3, 12
  19. 19. Machine Learning 101 Training 1. Collect a set of representative training documents 2. Label each token for its entity class or other (O) 3. Design feature extractors appropriate to the text and classes 4. Train a sequence classifier to predict the labels from the data Testing 1. Receive a set of testing documents 2. Run sequence model inference to label each token 3. Appropriately output the recognised entities SLIDE FROM: HTTP://WWW.STANFORD.EDU/CLASS/CS124/LEC/INFORMATION_EXTRACTION_AND_NAMED_ENTITY_RECOGNITION.PDFMonday, December 3, 12
  20. 20. k-NN HTTP://WWW.YOUTUBE.COM/USER/ANTALVANDENBOSCH#P/ U/2/PB4QATZITLQMonday, December 3, 12
  21. 21. NER Training Data IOB Scheme Inside, Outside, Begin For each type of entity there is an I-XXX and a B-XXX tag Non-entities are tagged O B-XXX only used if two entities of same type next to each other Assumes that named entities are non-recursive and don’t overlap Example: Meg Whitman CEO of eBay I-PER I-PER O O I-ORG SLIDE FROM: HTTP://WWW.INF.ED.AC.UK/TEACHING/COURSES/EMNLP/SLIDES/EMNLP07.PDFMonday, December 3, 12
  22. 22. Features for text learning task Is the word capitalised? Is the word at the start of a sentence? What is the Part of speech tag? Previous and following words Info from gazetteers Useful features help your learner, badly chosen features may harm it SLIDE BASED ON: HTTP://WWW.INF.ED.AC.UK/TEACHING/COURSES/EMNLP/SLIDES/EMNLP07.PDFMonday, December 3, 12
  23. 23. Relation Finding Explained Amphibia AnuraMonday, December 3, 12
  24. 24. Relation Finding: State-of-the-Art Induce relation-dictionaries using slot filling (AutoSlog) Example-based learning (Snowball) Pattern-recognition over shallow parses (LEILA)Monday, December 3, 12
  25. 25. Relation Finding: pattern finding over shallow parses relation direction frequency rating candidate is a municipality 45 + and a town in is a municipality 19 + and a city in is a municipality 10 + in is one of the five 5 - districts of is the name of 5 - two provinces inMonday, December 3, 12
  26. 26. RL for domain modelling Species Order Town Type is a (1.000) is a town in on the island of is a (0.794) (0.500) (0.854) is a is a is found in (1.000) (0.833) (0.566) Location is a municipality in Family (0.891) is a (1.000) is a town in is in is a (0.759) (0.500) Genus (0.750) may refer to is found in (0.560) Type Name occur in (0.635) Class (0.750) Country is found in (0.573) occur in may refer to (0.333) (0.482) ProvinceMonday, December 3, 12
  27. 27. RL for template filling Date Ship Type Crew Ransom 2005/04/10 Feisty Gas LNG carrier 12 $315,000 2005/06/27 Semlow Freighter 10 $50,000 Bulk 2005/10/28 Panagia 22 $700,000 Carrier Seabourn 2005/11/05 Cruise ship 210 none SpiritMonday, December 3, 12
  28. 28. Opinion Mining ExplainedMonday, December 3, 12
  29. 29. Opinion Mining: State-of-the-Art Supervised learning using features such as: opinion words and phrases negation part-of-speech-tags dependency parsingMonday, December 3, 12
  30. 30. Positive or negative? “I bought an iPhone a few days ago. It was such a nice phone. The touch screen was really cool. The voice quality was clear too. Although the battery life was not long, that is ok for me. However, my mother was mad with me as I did not tell her before I bought it. She also thought the phone was too expensive, and wanted me to return it to the shop. … ” EXAMPLE FROM: BING LIU (2010) SENTIMENT ANALYSIS AND SUBJECTIVITY, IN: NLP HANDBOOK, 2ND EDITION, N. INDURKHYA AND F. J. DAMERAU (EDS), 2010.Monday, December 3, 12
  31. 31. IBM’s Watson HTTP://WWW.YOUTUBE.COM/WATCH?V=DYWO4ZKSFXWMonday, December 3, 12
  32. 32. NLP: Challenges Negation Messy text (twitter and SMS language) Domain adaptation Cross- and multi-document text analysis Resource-scarce languagesMonday, December 3, 12
  33. 33. NIF: Natural Language Processing Interchange FormatMonday, December 3, 12
  34. 34. Monday, December 3, 12
  35. 35. Look familiar?Monday, December 3, 12
  36. 36. NIF: Why do we need it? Integration of NLP tools Bridge between LOD and NLP communitiesMonday, December 3, 12
  37. 37. NIF Claims 1. NIF provides global interoperability. If an NLP tool incorporates a NIF parser and a NIF serializer, it is compatible with all other tools, which implement NIF. 2. NIF achieves this interoperability by using and defining a most common denominator for annotations. This means that some standard annotations are required to be used. On the other hand NIF is flexible and allows the NLP tools to add any extra annotations at will. 3. NIF allows to create tool chains without a large amount of up-front development work. As the output of each tool is compatible, you can try and test really fast, whether the tools you selected actually produce what you need to solve a certain task. 4. As NIF is based on RDF/OWL, you can choose from a broad range of tools and technologies to work with it: RDF makes data integration easy: URIs, LinkedData OWL is based on Description Logics (Types, Type inheritance) Availability of open data sets (access and licence) Reusability of Vocabularies and Ontologies Diverse serializations for annotations: XML, Turtle, RDFa+XHTML Scalable tool support (Databases, Reasoning) Data is flexible and can be queried / transformed in many waysMonday, December 3, 12
  38. 38. Structural interoperability NIF specifies how to create an identifier for uniquely locating arbitrary substrings in a document either using offset- or context-hash-based URIs String ontology to describe Strings Structured Sentence OntologyMonday, December 3, 12
  39. 39. Conceptual Interoperability Lemma and stem annotations are data type properties in the Structured Sentence Ontology POS tags use OLiA (Ontologies or Linguistic Annotations) NER tags use Semantic Content Management System (SCMS) EU ProjectMonday, December 3, 12
  40. 40. Access Interoperability Main interface: wrapper to NIF Web service IMG: HTTP://NLP2RDF.ORG/FILES/2011/09/NIF_ARCHITECTURE.PNGMonday, December 3, 12
  41. 41. NLP/NIF: Wrap up NLP History and tasks Machine learning 101 Use-cases NER, relation finding and opinion mining Interoperability NLP results with NIFMonday, December 3, 12
  42. 42. Further reading/Tools Peter Jackson and Isabelle Moulinier (2007)Natural Language Processing for Online Applications: Text Retrieval, Extraction and Categorization. John Benjamins. ISBN: 9027249938 ACL Anthology: A Digital Archive of Research Papers in Computational Linguistics Machine learning: WEKA Natural language processing: GATEMonday, December 3, 12
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×