Your SlideShare is downloading. ×
Zouaq wole2013
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Zouaq wole2013

438
views

Published on

Web of Linked Entities Workshop - WWW 2013

Web of Linked Entities Workshop - WWW 2013

Published in: Education, Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
438
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Can we use Linked DataSemantic Annotators forthe Extraction of Domain-Relevant Expressions?Prof. Michel Gagnon, Ecole Polytechnique de Montréal, CanadaProf. Amal Zouaq, Royal Military College of Canada, CanadaDr. Ludovic Jean-louis, Ecole Polytechnique deMontréal, Canada
  • 2. IntroductionWhat is Semantic Annotation on the LOD?Candidate SelectionDisambiguationTraditionally focused on named entity annotation
  • 3. Semantic AnnotatorsLOD-based Semantic AnnotatorsMost used annotatorsREST API / Web ServicesDefault configurationAlchemy(entities +keywords)YahooZemantaWikimetaDBPediaSpotlightOpenCalaisLupedia
  • 4. Annotation EvaluationFew comparative studies have been publishedon the performance of linked data SemanticAnnotatorsThe available studies generally focus on“traditional” named entity annotation evaluation(ORG, Person, etc.)
  • 5. NEs versus domain topicsSemantic Annotators start to go beyond thesetraditional NEsPredefined classes such as PERSON, ORGExpansion of possible classes (NERD Taxonomy :sport events, operating systems, political events, …) RDF (NE? Domain topic ?) Semantic Web (NE? Domain topic ?)Domain Topics: important “concepts” for thedomain of interest
  • 6. NE versus Topics Annotation
  • 7. Research Question 1Previous studies did not draw any conclusion aboutthe relevance of the annotated entitiesBut relevance might prove to be a crucial point fortasks such as information seeking or queryansweringRQ1: Can we use linked data-based semanticannotators to accurately identify Domain-relevantexpressions?
  • 8. Research MethodologyWe relied on a corpus of 8 texts taken fromWikipedia, all related to the artificial intelligencedomain. Together, these texts represent 10570words.We obtained 2023 expression occurrences amongwhich 1151 were distinct.
  • 9. Building the Gold StandardWe asked a human evaluator to analyze eachexpression and make the following decisions:Does the annotated expression represent anunderstandable named entity or topic?Is the expression a relevant keyword accordingto the document?Is the expression a relevant named entityaccording to the document?
  • 10. Distribution of ExpressionsNote that only 639 out of 1151 detected expressionsare understandable (56%). Thus a substantialnumber of detected expressions represents noise.
  • 11. Frequency of detected ExpressionsMost of the expressions were detected by onlyone (76%) or two (16%) annotators.Semantic annotators seemed complementary.(326 relevant)(117 relevant)(40 relevant)(15 relevant)(4 relevant)(4 relevant)(1 relevant)
  • 12. EvaluationStandard precision, recall and F-score“Partial” recall, since our Gold Standardconsiders only the expressions detected byat least one annotator, instead of consideringall possible expressions in the corpus.
  • 13. All Expressionsmend
  • 14. Named Entities
  • 15. Domain Topics
  • 16. Few HighlightsF-score is unsatisfactory for almost all annotatorsBest F-score around 62-63 (All expressions, topics)Alchemy seems to stand outIt obtains a high value for recall when all expressionsor only topics are consideredBut the precision of Alchemy (59%) is not veryhigh, compared to the results of Zemanta(81%), OpenCalais (75%) and Yahoo (74%)None of the annotators was good at detecting relevantnamed entities.DBpedia Spotlight annotated many expressions thatare not considered understandable
  • 17. Research Question 2Given the complementarity of SemanticAnnotators, we decided to experiment withthis second research question:Can we improve the overall performanceof semantic annotators using votingmethods and machine learning?
  • 18. Voting and machine learningmethodsWe experimented with the followingmethods:Simple votesWeighted votes where the weight is theprecision of individual annotators on atraining corpusK-nearest-neighbours classifierNaïve Bayes classifierDecision tree (C4.5)Rule induction (PART algorithm)
  • 19. Experiment SetupWe divided the set of annotatedexpressions into 5 balanced partitionsOne partition for testingRemaining partitions for trainingWe ran 5 experiments for each of thefollowing situations: all entities, onlynamed entities and only topics
  • 20. Baseline: Simple VotesAll Expressions:Best prec : 77%(Zemanta)Best rec/F : 69% -62% (Alchemy)Best P: 81% (R/F12%-21%)(Zemanta)Best R/F: 69%-63%(Alchemy) P=59%Threshold: Number of annotators
  • 21. Weighted VotesBest P: 81%(Zemanta) (R/F12%-21%)Best R/F: 69%-63%(Alchemy) P=59%
  • 22. Machine LearningBest P: 81% (Zemanta)(R/F 12%-21%)Best R/F: 69%-63%(Alchemy) P=59%
  • 23. HighlightsOn the average, weighted vote displays thebest precision (0.66) among combinationmethods if only topics are consideredAlchemy (0.59)Alchemy finds many relevant expressionsthat are not discovered by other annotatorsMachine learning methods achieve betterrecall and F-score on the average, especiallyif decision tree or rule induction (PART) isused
  • 24. ConclusionCan we use Linked Data SemanticAnnotators for the Extraction of Domain-Relevant Expressions?Best overall performance (Recall/F-Score:Alchemy) (59%- 69%-63%)Best precision: Zemanta (81%)Need of filtering!Combination/machine learning methodsIncreased recall with weighted votingscheme
  • 25. Future WorkLimitationsThe size of the gold standardA limited set of semantic annotatorsIncrease the size of the GSExplore various combinations for annotators/featuresCompare Semantic Annotators with keywordextractors
  • 26. QUESTIONS?Thank you for your attention!amal.zouaq@rmc.cahttp://azouaq.athabascau.ca