Keyword Search over RDF Graphs

Keyword Search over RDF Graphs
Shady Elbassuoni* and Roi Blanco**
* Max-Planck Institute for Informatics
** Yahoo! Research, Barcelona

Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011
RDF Datasets
subject predicate object
Traffic hasWonPrize Academy_Award
Innerspace hasWonPrize Academy_Award
Innerspace hasGenre Comedy
Joe_Dante directed Innerspace
Toy_Story hasWonPrize Academy_Award
Road_Trip hasGenre Comedy
Toy_Story hasGenre Comedy
Tom_Hanks actedIn Toy_Story
Diner hasWonPrize Academy_Award
Diner type Comedy_films
Steve_Guttenberg actedIn Diner
The_Pink_Panther type Criminal_comedy_films
The_Pink_Panther hasWonPrize Academy_Award
Police_Academy type Comedy_films
Steve_Guttenberg actedIn Police_Academy
The_Darwin_Awards type Comedy_films

Searching RDF Data
 Structured triple-pattern queries (SPARQL)
 Example: comedies that have won an
academy award
SELECT ?m
WHERE {?m hasGenre Comedy . ?m hasWonPrize Academy_Award}

Searching RDF Data
 Triple-pattern queries are very expressive
but are not that useable
 Most users/ Search APIs prefer keyword queries
Support keyword search over RDF graphs

Keyword Search over RDF Data
 How to process keyword queries?
 Translate keyword queries into SPARQL
 Directly process the queries over the RDF graph
 What are the results to a keyword query?
 Resources
 Triples
 Tuples of triples (subgraphs)

Processing Keyword Queries
 Construct a document D(t) for each triple t
 D(t) contains all literals in t and any text
associated with the URIs in t
 Example:
t: Innerspace hasGenre Comedy
innerspace USA1987 science fiction comedy film Joe
Dante Michael Finnell Dennis Quaid Martin Short Meg
Ryan academy award best visual effects …
innerspace USA1987 science fiction comedy film Joe
Dante Michael Finnell Dennis Quaid Martin Short Meg
Ryan academy award best visual effects …
We can now create triple-term indexes

Retrieving Query Results
 For each query keyword, retrieve a list of triples
 Join the triples from different lists based on their URIs
comedy award
...
...
`
T: Innerspace hasGenre Comedy . Innerspace hasWonPrize Academy_Award

 Retrieve a list of triples matching a query keyword
comedy award
...
...
`
T: Toy_Story hasGenre Comedy . Toy_Story hasWonPrize Academy_Award

 Retrieve a list of triples matching a query keyword
comedy award
...
...
`
Result Ranking is crucial!!
T: Toy_Story hasGenre Comedy . Toy_Story hasWonPrize Academy_Award
T: Police_Academy type Comedy_Films . The_Darwin_Awards type Comedy_Films

Language Models for Triples
D(t)
t:Innerspace hasGenre Comedy
Esitmate from
w P(w|D(t))
innerspace 0.234
1987 0.123
science 0.012
fiction 0.020
comedy 0.111
film 0.179
classic 0.111
meg 0.019
ryan 0.019
oscar 0.148
. . . . . .
w
P(w)

Ranking Model
comedy award
but we treat triples as bags of words!

Ranking Model
comedy award
probability of the structure of triple t
being relevant to keyword w

Estimating Structural Relevance
 For each keyword, construct a probability
distribution over predicates
 Example: award
r P(r|w)
hasWonPrize 0.459
wasNominatedFor 0.387
type 0.112
directed 0.020
actedIn 0.021
producedIn 0.025
bornIn 0.008
. . . . . .
estimated from the whole dataset
P(Innerspace hasWonPrize Academy_Award|award) = P(hasWonPrize|award)

Example Ranked Query Results
comedy award
Bag of Words
Combat_Academy type Comedy_films . The_Darwin_Awards type Comedy_films
Police_Academy type Comedy_films . The_Darwin_Awards type Comedy_films
Innerspace hasGenre Comedy . Innerspace hasWonPrize Academy_Award
Structure Aware
Innerspace hasGenre Comedy . Innerspace hasWonPrize Academy_Award
Toy_Story hasGenre Comedy . Toy_Story hasWonPrize Academy_Award
Shrek hasWonPrize Academy_Award_Best_Animated_Feature . Shrek hasGenre Comedy

Experimental Setup
 User study over two RDF datasets:
 movies from IMDB
 books from LibraryThing
 Models compared:
 Structure Aware Approach
 Bag of Words Approach
 Language-model-based Object Retrieval
 BANKS (keyword search over databases)

Experimental Setup
 30 evaluation queries
 Gathered relevance assessments for the top-
50 results retrieved by each model

Experimental Results
P-value < 0.05

Conclusion
 Keyword Search over RDF data is crucial
 To support keyword search over RDF data
 Combine structured triples with text
 Construct a document for each triple
 Retrieve meaningful query results
 Tuples of joined triples
 Can be extended to larger subgraphs of the RDF
graph
 Rank the retrieved results
 A language model approach that uses both text and
structure

Ranking Model

RDF Graphs

Keyword Search over RDF Graphs

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (19)

Similar to Keyword Search over RDF Graphs

Similar to Keyword Search over RDF Graphs (9)

More from Roi Blanco

More from Roi Blanco (13)

Recently uploaded

Recently uploaded (20)

Keyword Search over RDF Graphs

Editor's Notes