Uploaded on

Web of Linked Entities Workshop - WWW 2013

Web of Linked Entities Workshop - WWW 2013

More in: Education , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
410
On Slideshare
0
From Embeds
0
Number of Embeds
3

Actions

Shares
Downloads
5
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Can we use Linked DataSemantic Annotators forthe Extraction of Domain-Relevant Expressions?Prof. Michel Gagnon, Ecole Polytechnique de Montréal, CanadaProf. Amal Zouaq, Royal Military College of Canada, CanadaDr. Ludovic Jean-louis, Ecole Polytechnique deMontréal, Canada
  • 2. IntroductionWhat is Semantic Annotation on the LOD?Candidate SelectionDisambiguationTraditionally focused on named entity annotation
  • 3. Semantic AnnotatorsLOD-based Semantic AnnotatorsMost used annotatorsREST API / Web ServicesDefault configurationAlchemy(entities +keywords)YahooZemantaWikimetaDBPediaSpotlightOpenCalaisLupedia
  • 4. Annotation EvaluationFew comparative studies have been publishedon the performance of linked data SemanticAnnotatorsThe available studies generally focus on“traditional” named entity annotation evaluation(ORG, Person, etc.)
  • 5. NEs versus domain topicsSemantic Annotators start to go beyond thesetraditional NEsPredefined classes such as PERSON, ORGExpansion of possible classes (NERD Taxonomy :sport events, operating systems, political events, …) RDF (NE? Domain topic ?) Semantic Web (NE? Domain topic ?)Domain Topics: important “concepts” for thedomain of interest
  • 6. NE versus Topics Annotation
  • 7. Research Question 1Previous studies did not draw any conclusion aboutthe relevance of the annotated entitiesBut relevance might prove to be a crucial point fortasks such as information seeking or queryansweringRQ1: Can we use linked data-based semanticannotators to accurately identify Domain-relevantexpressions?
  • 8. Research MethodologyWe relied on a corpus of 8 texts taken fromWikipedia, all related to the artificial intelligencedomain. Together, these texts represent 10570words.We obtained 2023 expression occurrences amongwhich 1151 were distinct.
  • 9. Building the Gold StandardWe asked a human evaluator to analyze eachexpression and make the following decisions:Does the annotated expression represent anunderstandable named entity or topic?Is the expression a relevant keyword accordingto the document?Is the expression a relevant named entityaccording to the document?
  • 10. Distribution of ExpressionsNote that only 639 out of 1151 detected expressionsare understandable (56%). Thus a substantialnumber of detected expressions represents noise.
  • 11. Frequency of detected ExpressionsMost of the expressions were detected by onlyone (76%) or two (16%) annotators.Semantic annotators seemed complementary.(326 relevant)(117 relevant)(40 relevant)(15 relevant)(4 relevant)(4 relevant)(1 relevant)
  • 12. EvaluationStandard precision, recall and F-score“Partial” recall, since our Gold Standardconsiders only the expressions detected byat least one annotator, instead of consideringall possible expressions in the corpus.
  • 13. All Expressionsmend
  • 14. Named Entities
  • 15. Domain Topics
  • 16. Few HighlightsF-score is unsatisfactory for almost all annotatorsBest F-score around 62-63 (All expressions, topics)Alchemy seems to stand outIt obtains a high value for recall when all expressionsor only topics are consideredBut the precision of Alchemy (59%) is not veryhigh, compared to the results of Zemanta(81%), OpenCalais (75%) and Yahoo (74%)None of the annotators was good at detecting relevantnamed entities.DBpedia Spotlight annotated many expressions thatare not considered understandable
  • 17. Research Question 2Given the complementarity of SemanticAnnotators, we decided to experiment withthis second research question:Can we improve the overall performanceof semantic annotators using votingmethods and machine learning?
  • 18. Voting and machine learningmethodsWe experimented with the followingmethods:Simple votesWeighted votes where the weight is theprecision of individual annotators on atraining corpusK-nearest-neighbours classifierNaïve Bayes classifierDecision tree (C4.5)Rule induction (PART algorithm)
  • 19. Experiment SetupWe divided the set of annotatedexpressions into 5 balanced partitionsOne partition for testingRemaining partitions for trainingWe ran 5 experiments for each of thefollowing situations: all entities, onlynamed entities and only topics
  • 20. Baseline: Simple VotesAll Expressions:Best prec : 77%(Zemanta)Best rec/F : 69% -62% (Alchemy)Best P: 81% (R/F12%-21%)(Zemanta)Best R/F: 69%-63%(Alchemy) P=59%Threshold: Number of annotators
  • 21. Weighted VotesBest P: 81%(Zemanta) (R/F12%-21%)Best R/F: 69%-63%(Alchemy) P=59%
  • 22. Machine LearningBest P: 81% (Zemanta)(R/F 12%-21%)Best R/F: 69%-63%(Alchemy) P=59%
  • 23. HighlightsOn the average, weighted vote displays thebest precision (0.66) among combinationmethods if only topics are consideredAlchemy (0.59)Alchemy finds many relevant expressionsthat are not discovered by other annotatorsMachine learning methods achieve betterrecall and F-score on the average, especiallyif decision tree or rule induction (PART) isused
  • 24. ConclusionCan we use Linked Data SemanticAnnotators for the Extraction of Domain-Relevant Expressions?Best overall performance (Recall/F-Score:Alchemy) (59%- 69%-63%)Best precision: Zemanta (81%)Need of filtering!Combination/machine learning methodsIncreased recall with weighted votingscheme
  • 25. Future WorkLimitationsThe size of the gold standardA limited set of semantic annotatorsIncrease the size of the GSExplore various combinations for annotators/featuresCompare Semantic Annotators with keywordextractors
  • 26. QUESTIONS?Thank you for your attention!amal.zouaq@rmc.cahttp://azouaq.athabascau.ca