Learning with the Web: SpottingLearning with the Web: SpottingNamed Entities on the intersectionNamed Entities on the inte...
May 13, 2013 2/13Making Sense of Microposts (#MSM2013)NERD-ML @ MSM13
May 13, 2013 3/13Making Sense of Microposts (#MSM2013)Preprocessing➢Dataset is converted in CoNLL IOBformat➢Applied 10 cro...
May 13, 2013 4/13Making Sense of Microposts (#MSM2013)NERD extractors➢Retrieves named entities from 10 extractors (WebAPIs...
May 13, 2013 5/13Making Sense of Microposts (#MSM2013)Ritter et al. (2011)➢Off-the-shelf tool tailored to a Twitterstream ...
May 13, 2013 6/13Making Sense of Microposts (#MSM2013)Stanford CRF➢Re-trained on the MSM13 corpora➢Parameters based onengl...
May 13, 2013 7/13Making Sense of Microposts (#MSM2013)Textual features➢POS➢Capitalisation information– initial capital– al...
May 13, 2013 8/13Making Sense of Microposts (#MSM2013)ML settingsRun01: 7 textual features (POS, initial capital,proportio...
May 13, 2013 9/13Making Sense of Microposts (#MSM2013)Precision – MSM13 training,10 cross-fold validation
May 13, 2013 10/13Making Sense of Microposts (#MSM2013)Recall - MSM13 training,10 cross-fold validation
May 13, 2013 11/13Making Sense of Microposts (#MSM2013)F1 – MSM13 training,10 cross-fold validation
May 13, 2013 12/13Making Sense of Microposts (#MSM2013)Lessons learned➢MISC class is ambiguously defined➢8.1% of the named...
May 13, 2013 13/13Making Sense of Microposts (#MSM2013)Thanks for your time and attentionhttp://www.slideshare.net/giusepp...
Upcoming SlideShare
Loading in …5
×

Learning with the Web: Spotting Named Entities on the intersection of NERD and Machine Learning

1,155 views
1,102 views

Published on

Talk "Learning with the web: spotting named entities on the intersection of nerd and machine learning" event during #MSM'13 (WWW'13), Rio de Janeiro, Brazil

Microposts shared on social platforms instantaneously report facts, opinions or emotions. In these posts, entities are often used but they are continuously changing depending on what is currently trending. In such a scenario, recognising these named entities is a challenging task, for which off-the-shelf approaches are not well equipped. We propose NERD-ML, an approach that unifies the benefits of a crowd entity recognizer through Web entity extractors combined with the linguistic strengths of a machine learning classifier.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,155
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
11
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Learning with the Web: Spotting Named Entities on the intersection of NERD and Machine Learning

  1. 1. Learning with the Web: SpottingLearning with the Web: SpottingNamed Entities on the intersectionNamed Entities on the intersectionof NERD and Machine Learningof NERD and Machine LearningMarieke van Erp, Giuseppe Rizzo, Raphaël Troncy@giusepperizzo
  2. 2. May 13, 2013 2/13Making Sense of Microposts (#MSM2013)NERD-ML @ MSM13
  3. 3. May 13, 2013 3/13Making Sense of Microposts (#MSM2013)Preprocessing➢Dataset is converted in CoNLL IOBformat➢Applied 10 cross-fold validation➢Chunked the set of tweets in 50KB partsin order to comply with NERD filesizelimitations
  4. 4. May 13, 2013 4/13Making Sense of Microposts (#MSM2013)NERD extractors➢Retrieves named entities from 10 extractors (WebAPIs)➢Harmonizes the classification according to theNERD Ontology v0.5http://nerd.eurecom.fr/ontology➢75 entity classes mapped to 4 MSM13 classeshttp://nerd.eurecom.fr
  5. 5. May 13, 2013 5/13Making Sense of Microposts (#MSM2013)Ritter et al. (2011)➢Off-the-shelf tool tailored to a Twitterstream based on:– LabelledLDA (+CRF)– Textual features (POS,Capitalization,Suffix, etc.)– Freebase gazetters (names of PER, ORG, LOC)➢10 entity classes mapped to 4 classesRitter, A., Clark, S., Mausam, Etzioni, O.: Named Entity Recognition in Tweets: AnExperimental Study. In: Empirical Methods in Natural Language Processing(EMNLP’11) (2011)
  6. 6. May 13, 2013 6/13Making Sense of Microposts (#MSM2013)Stanford CRF➢Re-trained on the MSM13 corpora➢Parameters based onenglish.conll.4class.distsim.crf.ser.gzproperties file provided with theStanford distribution➢Baseline of our approachJenny Rose Finkel, Trond Grenager, and Christopher Manning. Incorporating Non-localInformation into Information Extraction Systems by Gibbs Sampling. In: 43nd AnnualMeeting of the Association for Computational Linguistics (ACL05) (2005)
  7. 7. May 13, 2013 7/13Making Sense of Microposts (#MSM2013)Textual features➢POS➢Capitalisation information– initial capital– all capitalized– proportion of token capitals➢Prefix (first three letters of the token)➢Suffix (last three letters of the token)➢Whether token is at the beginning of at theend of the micropostRitter, A., Clark, S., Mausam, Etzioni, O.: Named Entity Recognition in Tweets: An ExperimentalStudy. In: Empirical Methods in Natural Language Processing (EMNLP’11) (2011)
  8. 8. May 13, 2013 8/13Making Sense of Microposts (#MSM2013)ML settingsRun01: 7 textual features (POS, initial capital,proportion of capitals, prefix, sufix, end/start token); 0extractor; ML=k-NN, k =1, Euclidean distanceRun02: 0 textual feature; 12 extractors (AlchemyAPI,DBpedia Spotlight, Extractiv, Lupedia, OpenCalais,Saplo, Yahoo, Textrazor, Wikimeta, Zemanta,Stanford NER, Ritter et al.); ML=SVM, polynomialkernel, SMORun03: 4 textual features (POS, initial capital, suffix,Proportion of Capitals); 8 extractors (AlchemyAPI,DBpedia Spotlight, Extractiv, Opencalais, Textrazor,Wikimeta, Stanford NER, Ritter et al.); ML=SVM,polynomial kernel, SMO
  9. 9. May 13, 2013 9/13Making Sense of Microposts (#MSM2013)Precision – MSM13 training,10 cross-fold validation
  10. 10. May 13, 2013 10/13Making Sense of Microposts (#MSM2013)Recall - MSM13 training,10 cross-fold validation
  11. 11. May 13, 2013 11/13Making Sense of Microposts (#MSM2013)F1 – MSM13 training,10 cross-fold validation
  12. 12. May 13, 2013 12/13Making Sense of Microposts (#MSM2013)Lessons learned➢MISC class is ambiguously defined➢8.1% of the named entities from thetraining data occurs in the test data➢Best Run03: not all extractors and sometextual features➢For the next challenge what aboutentity linking?
  13. 13. May 13, 2013 13/13Making Sense of Microposts (#MSM2013)Thanks for your time and attentionhttp://www.slideshare.net/giusepperizzoN ERD-MLhttp://github.com/giusepperizzo/nerdml

×