Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Mapping Keywords to

1,365 views

Published on

Published in: Education, Technology
  • Be the first to comment

Mapping Keywords to

  1. 1. Mapping Keywords to Linked Data ResourcesforAutomatic Query ExpansionIsabelle Augenstein1Anna Lisa Gentile1Barry Norton2Ziqi Zhang1Fabio Ciravegna11Department of Computer Science, University of Sheffield, UK2Ontotext, UK{i.augenstein,a.l.gentile,z.zhang,f.ciravegna}@dcs.shef.ac.uk,barry.norton@ontotext.comMay 26, 2013Augenstein, Gentile, Norton, Zhang, Ciravegna Query Expansion May 26, 2013 1 / 21
  2. 2. MotivationIn order to consume Linked Data, end users need to befamiliar with RDF’s data model and query language SPARQLhave knowledge of datasets and their contentsKeyword search is a means to overcome these barriersMap keywords to RDF resources“film” → dbpedia-owl:FilmAugenstein, Gentile, Norton, Zhang, Ciravegna Query Expansion May 26, 2013 2 / 21
  3. 3. MotivationChallenges for keyword search:Spelling mistakes, e.g. “flim”Lexical derivations, e.g. “films”Synonyms, e.g. “movie”How to address these challenges? → query expansionWhat is query expansion?Process of expanding a seed query with additional query termsto improve recall“film”, “flim”, “films”, “movie” → dbpedia-owl:FilmAugenstein, Gentile, Norton, Zhang, Ciravegna Query Expansion May 26, 2013 3 / 21
  4. 4. State of the art: How to find mappingsString similarity (Sindice [4])“flim”, “films” → dbpedia-owl:FilmDomain-independent approachProblem: Only does expansion for spelling mistakes andlexically similar wordsDictionary-based methods (WordNet [3], Wikipedia [2])“movie” → dbpedia-owl:FilmApproach finds synonymsProblem: Dictionaries contain limited, domain-specificvocabulary, WordNet has very few named entitiesAugenstein, Gentile, Norton, Zhang, Ciravegna Query Expansion May 26, 2013 4 / 21
  5. 5. State of the art: How to rank resourcesString similarityRank by decreasing String similarity (SearchWebDB [5])Rank by decreasing tf-idf (Falcons Object Search [1])Dictionary-based methods (WordNet ([3]), Wikipedia ([2]))Rank by decreasing semantic similarity (ESA [2])Combine frequency and confidence (PowerAqua [3])Augenstein, Gentile, Norton, Zhang, Ciravegna Query Expansion May 26, 2013 5 / 21
  6. 6. Our approachHow to find mappingsIn order to have a highly adaptable, domain-independentapproach, use knowledge contained within the datasetBenefit from properties between resources in Linked Data,which are potentially useful for finding semantically similarkeywordsHow to rank resourcesFollow intuition of state of the art methodsCombine tf-idf and confidence of our measureAugenstein, Gentile, Norton, Zhang, Ciravegna Query Expansion May 26, 2013 6 / 21
  7. 7. Method: OverviewScenario: User wants to find representative Linked Dataresources for keyword wExample: “movie” → dbpedia-owl:FilmDataset: Set of triples consisting of a ‘subject’, ‘predicate’ andan ‘object’.Current application: Find classes and propertiesAugenstein, Gentile, Norton, Zhang, Ciravegna Query Expansion May 26, 2013 7 / 21
  8. 8. Method: OverviewStep 1: For each keyword w, learn an expanded set ofkeywords Ew and a ranking of Ew to find semantically similarwords in the target vocabularyExample: “movie” → “movie”’, “film”, “films”Step 2: Identify concepts through labelling properties“movie” → ∅“film” → dbpedia-owl:Film“films” → ∅Step 3: Rank resulting concepts and return concept withhighest rank“movie” → dbpedia-owl:FilmAugenstein, Gentile, Norton, Zhang, Ciravegna Query Expansion May 26, 2013 8 / 21
  9. 9. Method: OverviewSet of labelling properties to identify conceptsrdfs:labelfoaf:namedc:titleskos:prefLabelskos:altLabelfb:type.object.nameTable : Labelling propertiesAugenstein, Gentile, Norton, Zhang, Ciravegna Query Expansion May 26, 2013 9 / 21
  10. 10. Method: Candidate IdentificationStep 1: For each keyword w, learn an expanded set of keywords Ewand a ranking of EwStep 1.1: Which properties are useful in expressing semanticsimilarity?Get resource r for keyword using labelling properties“movie” → fb:MovieGet all triples where r is a subjectfb:Movie wn:containsWordSense dbpedia-owl:showfb:Movie dc:description ‘‘movie director’’fb:Movie commontag:label ‘‘Film’’For each of the objects of the triples, find the labels andtokenise them“movie director” → “movie”, “director”Find resources for them in the target vocabulary“show” → dbpedia-owl:show“Film” → dbpedia-owl:FilmAugenstein, Gentile, Norton, Zhang, Ciravegna Query Expansion May 26, 2013 10 / 21
  11. 11. Method: TrainingStep 1.2: How to learn a ranking for these properties?Select a list of keywords, run step 1.1 for each of themManually chose the best resources among the candidates“show”, “Film” → “Film”Produce a precision measure for every property used to findcandidatesprec(p) = w∈−−−−→Wtrainhits(p, w)w∈−−−−→Wtraincandidates(p, w)Define treshold 0 ≤ θ ≤ 1, use prec to define an ordered(ranked) subset of properties to encode semantic relatednessAugenstein, Gentile, Norton, Zhang, Ciravegna Query Expansion May 26, 2013 11 / 21
  12. 12. Method: TestTest set of keywords−−−−→Wtest, apply algorithm to obtaincandidatesOnly take subset of properties instead of all properties to findcandidatesCombine, as a numerical product, the precision of the property,p, used to find that candidate and a tf-idf score for rAugenstein, Gentile, Norton, Zhang, Ciravegna Query Expansion May 26, 2013 12 / 21
  13. 13. Evaluation: Gold standard and MetricUsed DBpedia ontology as the target vocabulary and Sindicecache as dataset for computing expanded set of keywords EwGold standard from Freitas et al. [2], contains 178 keywords ofwhich 134 have a representation in the DBpedia ontologyCorrected minor errors in gold-standard and re-evaluatedapproach by Freitas et al. [2]Actual figures don’t change significantlyUse Mean Reciprocal Rank (MRR) which measures the qualityof the ranking by calculating the inverse rank of the best resultAugenstein, Gentile, Norton, Zhang, Ciravegna Query Expansion May 26, 2013 13 / 21
  14. 14. Evaluation: ApproachManually create training set consisting of 40 keywords forsupervised training phaseResult of training phase: 194 candidate propertiesUse precision threshold of of 0.045 to cut off the candidateproperty list. This resulted in 23 properties.Augenstein, Gentile, Norton, Zhang, Ciravegna Query Expansion May 26, 2013 14 / 21
  15. 15. Evaluation: Result of Trainingfoaf:name 0.267 owl:sameAs 0.121fb-common:topic 0.189 wn20schema:gloss 0.1rdfs:label 0.187 opencyc:seeAlsoURI 0.093dc:subject 0.182 rdfs:seeAlso 0.082dc:title 0.169 dbpedia-owl:abstract 0.0774sindice:label 0.168 rdfs:comment 0.0676rdfs:suBClassOf 0.168 rdfs:range 0.0667skos:prefLabel 0.157 rdfs:subClassOf 0.0656fb-type:object 0.143 fb:documented object 0.0619wn20schema:derivationallyRelated 0.143 dbpedia-owl:wikiPageWikiLink 0.0487wn20schema:containsWordSense 0.138 dc:description 0.0471commontag:label 0.133Table : Top 23 properties used and their precisionAugenstein, Gentile, Norton, Zhang, Ciravegna Query Expansion May 26, 2013 15 / 21
  16. 16. Evaluation: Example results of TestQuery: [spacecraft] Query: [engine] Query: [factory]dbpedia-owl:Spacecraft dbpedia-owl:engine dbpedia-owl:manufacturerdbpedia-owl:spacecraft dbpedia-owl:gameEngine dbpedia-owl:plantdbpedia-owl:satellite dbpedia-owl:Artwork dbpedia-owl:Canaldbpedia-owl:missions dbpedia-owl:AutomobileEngine dbpedia-owl:enginedbpedia-owl:launches dbpedia-owl:Locomotive dbpedia-owl:classdbpedia-owl:closed dbpedia-owl:fuel dbpedia-owl:Albumdbpedia-owl:vehicle dbpedia-owl:added dbpedia-owl:productdbpedia-owl:Rocket dbpedia-owl:Musical dbpedia-owl:assemblyQuery: [bass] Query: [wife] Query: [honda]dbpedia-owl:Fish dbpedia-owl:spouse dbpedia-owl:manufacturerdbpedia-owl:Instrument dbpedia-owl:Criminal dbpedia-owl:discovereddbpedia-owl:instrument dbpedia-owl:person dbpedia-owl:Asteroiddbpedia-owl:voice dbpedia-owl:status dbpedia-owl:enginedbpedia-owl:partner dbpedia-owl:education dbpedia-owl:seasondbpedia-owl:note dbpedia-owl:Language dbpedia-owl:vehicledbpedia-owl:Musical dbpedia-owl:sex dbpedia-owl:Automobiledbpedia-owl:lowest dbpedia-owl:family dbpedia-owl:participantAugenstein, Gentile, Norton, Zhang, Ciravegna Query Expansion May 26, 2013 16 / 21
  17. 17. Evaluation: Results of TestModel MRRESA 0.6LOD Keyword Expansion 0.77Table : Mean reciprocal rank (MRR)Model RecallStrich match 0.45String match + WordNet 0.52ESA 0.87LOD Keyword Expansion 0.90Table : Percentage queries answered (Recall)Augenstein, Gentile, Norton, Zhang, Ciravegna Query Expansion May 26, 2013 17 / 21
  18. 18. ConclusionMethod for automatic query expansion for Linked Dataresources based on using properties between resources withinthe Linked Open Data cloud“film”, “flim”, “films”, “movie” → dbpedia-owl:FilmEvaluation showed how useful these different properties are forfinding semantic similarities and thereby finding expandedkeywordsImprovement of 17% in MRR over state of the artAugenstein, Gentile, Norton, Zhang, Ciravegna Query Expansion May 26, 2013 18 / 21
  19. 19. Future WorkRelated work ([6]) shows that best results can be achieved byfollowing a multi-strategy approach (String similarity + WordNet+ ESA), which we could also integrate our approach inAdjust labelling properties (add the ones discovered in training)Treat labelling properties seperately from other propertiesFine-tune tokenisation of literalsPerform in vivo evaluation of the task, e.g. in the context ofquestion answeringReevaluate for instancesAugenstein, Gentile, Norton, Zhang, Ciravegna Query Expansion May 26, 2013 19 / 21
  20. 20. Bibliography ICheng, G., Ge, W., Qu, Y.: Falcons: searching and browsingentities on the semantic web. In: Proceedings of the 17thinternational conference on World Wide Web. pp. 1101–1102.ACM (2008)Freitas, A., Curry, E., Oliveira, J.G., O’Riain, S.: A distributionalstructured semantic space for querying rdf graph data.International Journal of Semantic Computing 5(04), 433–462(2011)Lopez, V., Nikolov, A., Fernandez, M., Sabou, M., Uren, V.,Motta, E.: Merging and ranking answers in the semantic web:The wisdom of crowds. The semantic web pp. 135–152 (2009)Augenstein, Gentile, Norton, Zhang, Ciravegna Query Expansion May 26, 2013 20 / 21
  21. 21. Bibliography IIOren, E., Delbru, R., Catasta, M., Cyganiak, R., Stenzhorn, H.,Tummarello, G.: Sindice. com: a document-oriented lookupindex for open linked data. International Journal of Metadata,Semantics and Ontologies 3(1), 37–52 (2008)Tran, T., Wang, H., Haase, P.: Searchwebdb: Data web searchon a pay-as-you-go integration infrastructure. Tech. rep.,Technical report, University of Karlsruhe (2008)Walter, S., Unger, C., Cimiano, P., B¨ar, D.: Evaluation of alayered approach to question answering over linked data. In:The Semantic Web–ISWC 2012, pp. 362–374. Springer (2012)Augenstein, Gentile, Norton, Zhang, Ciravegna Query Expansion May 26, 2013 21 / 21

×