Zouaq wole2013
Upcoming SlideShare
Loading in...5

Zouaq wole2013



Web of Linked Entities Workshop - WWW 2013

Web of Linked Entities Workshop - WWW 2013



Total Views
Views on SlideShare
Embed Views



3 Embeds 206

http://eventifier.co 202
http://eventifier.com 3
http://translate.googleusercontent.com 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Zouaq wole2013 Zouaq wole2013 Presentation Transcript

  • Can we use Linked DataSemantic Annotators forthe Extraction of Domain-Relevant Expressions?Prof. Michel Gagnon, Ecole Polytechnique de Montréal, CanadaProf. Amal Zouaq, Royal Military College of Canada, CanadaDr. Ludovic Jean-louis, Ecole Polytechnique deMontréal, Canada
  • IntroductionWhat is Semantic Annotation on the LOD?Candidate SelectionDisambiguationTraditionally focused on named entity annotation
  • Semantic AnnotatorsLOD-based Semantic AnnotatorsMost used annotatorsREST API / Web ServicesDefault configurationAlchemy(entities +keywords)YahooZemantaWikimetaDBPediaSpotlightOpenCalaisLupedia
  • Annotation EvaluationFew comparative studies have been publishedon the performance of linked data SemanticAnnotatorsThe available studies generally focus on“traditional” named entity annotation evaluation(ORG, Person, etc.)
  • NEs versus domain topicsSemantic Annotators start to go beyond thesetraditional NEsPredefined classes such as PERSON, ORGExpansion of possible classes (NERD Taxonomy :sport events, operating systems, political events, …) RDF (NE? Domain topic ?) Semantic Web (NE? Domain topic ?)Domain Topics: important “concepts” for thedomain of interest
  • NE versus Topics Annotation
  • Research Question 1Previous studies did not draw any conclusion aboutthe relevance of the annotated entitiesBut relevance might prove to be a crucial point fortasks such as information seeking or queryansweringRQ1: Can we use linked data-based semanticannotators to accurately identify Domain-relevantexpressions?
  • Research MethodologyWe relied on a corpus of 8 texts taken fromWikipedia, all related to the artificial intelligencedomain. Together, these texts represent 10570words.We obtained 2023 expression occurrences amongwhich 1151 were distinct.
  • Building the Gold StandardWe asked a human evaluator to analyze eachexpression and make the following decisions:Does the annotated expression represent anunderstandable named entity or topic?Is the expression a relevant keyword accordingto the document?Is the expression a relevant named entityaccording to the document?
  • Distribution of ExpressionsNote that only 639 out of 1151 detected expressionsare understandable (56%). Thus a substantialnumber of detected expressions represents noise.
  • Frequency of detected ExpressionsMost of the expressions were detected by onlyone (76%) or two (16%) annotators.Semantic annotators seemed complementary.(326 relevant)(117 relevant)(40 relevant)(15 relevant)(4 relevant)(4 relevant)(1 relevant)
  • EvaluationStandard precision, recall and F-score“Partial” recall, since our Gold Standardconsiders only the expressions detected byat least one annotator, instead of consideringall possible expressions in the corpus.
  • All Expressionsmend
  • Named Entities
  • Domain Topics
  • Few HighlightsF-score is unsatisfactory for almost all annotatorsBest F-score around 62-63 (All expressions, topics)Alchemy seems to stand outIt obtains a high value for recall when all expressionsor only topics are consideredBut the precision of Alchemy (59%) is not veryhigh, compared to the results of Zemanta(81%), OpenCalais (75%) and Yahoo (74%)None of the annotators was good at detecting relevantnamed entities.DBpedia Spotlight annotated many expressions thatare not considered understandable
  • Research Question 2Given the complementarity of SemanticAnnotators, we decided to experiment withthis second research question:Can we improve the overall performanceof semantic annotators using votingmethods and machine learning?
  • Voting and machine learningmethodsWe experimented with the followingmethods:Simple votesWeighted votes where the weight is theprecision of individual annotators on atraining corpusK-nearest-neighbours classifierNaïve Bayes classifierDecision tree (C4.5)Rule induction (PART algorithm)
  • Experiment SetupWe divided the set of annotatedexpressions into 5 balanced partitionsOne partition for testingRemaining partitions for trainingWe ran 5 experiments for each of thefollowing situations: all entities, onlynamed entities and only topics
  • Baseline: Simple VotesAll Expressions:Best prec : 77%(Zemanta)Best rec/F : 69% -62% (Alchemy)Best P: 81% (R/F12%-21%)(Zemanta)Best R/F: 69%-63%(Alchemy) P=59%Threshold: Number of annotators
  • Weighted VotesBest P: 81%(Zemanta) (R/F12%-21%)Best R/F: 69%-63%(Alchemy) P=59%
  • Machine LearningBest P: 81% (Zemanta)(R/F 12%-21%)Best R/F: 69%-63%(Alchemy) P=59%
  • HighlightsOn the average, weighted vote displays thebest precision (0.66) among combinationmethods if only topics are consideredAlchemy (0.59)Alchemy finds many relevant expressionsthat are not discovered by other annotatorsMachine learning methods achieve betterrecall and F-score on the average, especiallyif decision tree or rule induction (PART) isused
  • ConclusionCan we use Linked Data SemanticAnnotators for the Extraction of Domain-Relevant Expressions?Best overall performance (Recall/F-Score:Alchemy) (59%- 69%-63%)Best precision: Zemanta (81%)Need of filtering!Combination/machine learning methodsIncreased recall with weighted votingscheme
  • Future WorkLimitationsThe size of the gold standardA limited set of semantic annotatorsIncrease the size of the GSExplore various combinations for annotators/featuresCompare Semantic Annotators with keywordextractors
  • QUESTIONS?Thank you for your attention!amal.zouaq@rmc.cahttp://azouaq.athabascau.ca