DBtrends
Exploring Query Logs for
Ranking RDF Data
AKSW
Edgard Marx, Amrapali Javeri,
Diego Moussallem, Sandro Rautenberg
12th International Conference on Semantic Systems
Outline
• Motivation
• Background
• Ranking using Query Logs
• Evaluation
• Results
• Discussion
• Conclusion
• Future Works
2
AKSW
3
Personal Data Enterprise Data
Motivation
Open Data
AKSW
4
http://linkeddatacatalog.dws.informatik.uni-annheim.de/state/
"The size of LOD by 2014 was 31 billion triples"
"Facebook users generates 2.7 billion Like actions
per day and 300 million new
photos are uploaded daily"
Josh Constine, 2012
We Have Data
"Google Processing 20,000
Terabytes A Day, And Growing"
Erick Schonfeld, 2008
techcrunch.com
techcrunch.com
AKSW
Motivation
Not all of
data is relevant
We Have Data
Motivation
5
AKSW
6
We Have Data
Motivation
AKSW
We Have Data
7
AKSW
Motivation
Ranking
8
AKSW
Motivation
Scenarios
Search Machine Learning Link Discovery
9
AKSW
Motivation
Resource Description
Framework (RDF)
Concrete
E=MC²
Abstract
10
Background
AKSW
Web of Data
Things
11
Background
AKSW
Web of Data
• Semantic Search
• Entity Search
• Question Answering
• Named Entity Recognition
• Link Discovery
• Machine Learning
Use RDF Data
E=MC²
Ranking Functions (Types)
12
"Give me all persons"
AKSW
Retrieve
Processing
&
Ranking
Background
...
Ranking Functions (Types)
13
"Give me all persons"
AKSW
Retrieve
Persons
Sort
Processing
&
Ranking
Answer
Background
...
Ranking Functions (Types)
14
"Give me all persons"
AKSW
Retrieve
Persons
Sort
Processing
&
Ranking
Answer
Background
...Query dependent Query independent
Ranking
15
AKSW
Background
Page et al.1999
Ranking
16
AKSW
Background
Page et al.1999
2001
Lee et al.
Web of Data
Ranking RDF Data
17
AKSW
Background
Page et al.
2011
1999
Cheng et al. (Property)
2001
Lee et al.
Web of Data
Ranking RDF Data
18
AKSW
Background
Page et al.
Thalhammer et al.
2011
1999
2014
Cheng et al. (Property)
2001
Lee et al.
Web of Data
Benchmarks
19
DBtrends Benchmark (Marx, 2016)
• 60 users from different countries (USA, India)
• 9 entity ranking functions applied to DBpedia Knowledge Base
• Users sort relevant classes, properties and entities
extracted from the top twenty entities belonging to the top four
classes
• Task were executed using Amazon Mechanical Turk
Previous Benchmarks
• Not public available
• Evaluate performace of 30 profiles
AKSW
Background
Why use query logs?
AKSW
20
Ranking using Query Logs
Why use query logs?
AKSW
21
Ranking using Query Logs
Why use query logs?
AKSW
22
Ranking using Query Logs
Query Logs
search...
Why use query logs?
AKSW
23
Ranking using Query Logs
Why use query logs?
• Query logs provide relevant
information about user's
preference
• They refer to the real-world
entities
E=MC²
AKSW
24
Ranking using Query Logs
Questions
• How to map real-world entities
to Web of Data?
• How to measure it's relevance?
• Where to find a good and trustable
query log?
AKSW
25
Ranking using Query Logs
How to map real world
resources?
• Rocha et al. (2004)
• Ding et al. (2005)
• Hogan et al. (2006)
• Alsarem et al (2015)
AKSW
26
Ranking using Query Logs
Query Logs
search...
Web of Data
How to measure the
resource's relevance?
AKSW
27
Ranking using Query Logs
• Users search (more often) for
things that are relevant
• Query logs register how often
something is searched
• Query logs can be used for
better estimate resource's
relevance by looking how often
it is searched
Where to find a good and
trustable query log?
AKSW
28
Ranking using Query Logs
Where to find a good and
trustable query log?
AKSW
29
Ranking using Query Logs
Where to find a good and
trustable query log?
• Public API
• Filters
 Geographic
• Country
• State
• City
 Period
 Day
 Week
 Month
 Year
AKSW
30
Ranking using Query Logs
DBtrends Ranking Function
AKSW
31
Ranking using Query Logs
DBtrends Ranking Function
AKSW
32
Ranking using Query Logs
36
Trendsdbr:New_York_City
“New York”
dbo:City
dbo:Place
2
1
1
• First, the labels of the entities are extracted
and used to acquire the search history in
query logs e.g. GoogleTrends ( )2-
DBtrends Ranking Function
18
36
Trendsdbr:New_York_City
“New York”
dbo:City
dbo:Place
1
2
3
4
9 • First, the labels of the entities are extracted
and used to acquire the search history in
query logs e.g. GoogleTrends ( )
• Thereafter, the entity ranks are used as a
base to propagate the rank to the classes
( )3 4-
2-
AKSW
1
33
Ranking using Query Logs
Entity Ranking Functions
• DBtrends
• MIXED-RANK
• DB-IN
• DB-OUT
• DB-RANK
• PAGE-IN
• PAGE-OUT
• PAGE-RANK
• E-PAGE-IN
• SEO-PA
• SHARED-LINKS
+
Evaluation
34
AKSW
Property/Class Ranking
Functions
• Instances
• Instances
Property
Class
AKSW
35
Evaluation
• Relin
• RandomRank
• Instances
• Instances
Results
AKSW
• PAGE-RANK
• E-PAGE-IN
• SHARED-LINKS
• SEO-PA
• DB-OUT
• PAGE-IN
• PAGE-OUT
• DB-IN
• DB-RANK
36
Evaluation Entity
Results
AKSW
• MIXED-RANK
• PAGE-RANK
• E-PAGE-IN
• SHARED-LINKS
• SEO-PA
• DB-OUT
• PAGE-IN
• DBtrends
• PAGE-OUT
• DB-IN
• DB-RANK
37
Evaluation Entity
Discussion
AKSW
• Functions that take into
consideration external information
provide more insights about
resource's relevance
• RDF Links reflect natural connections
rather than resouce's relevance
• MIXED-RANK
• PAGE-RANK
• E-PAGE-IN
• SHARED-LINKS
• SEO-PA
• DB-OUT
• PAGE-IN
• DBtrends
• PAGE-OUT
• DB-IN
• DB-RANK
Entity
38
Evaluation
Discussion
AKSW
• There is no pattern in the impact
distribution of query longs
• Queries (not necessarly) help to
improve a ranking functions
• Internal agreement ~63%
39
Evaluation Entity
Results
AKSW
• RandomRank
• Relin
• Instances
• Instances
• Instances
• Instances
Property
Class
40
Evaluation
Discussion
AKSW
• RandomRank
• Relin
• Instances
• Instances
• Internal agreement ~37%
• Ranks are very sparse
• Not conclusive
41
Evaluation Property
Discussion
AKSW
• Internal agreement ~67%
• Instances
• Instances
42
Evaluation Class
Discussion
AKSW
dbo:PopulatedPlace
dbo:Settlement
dbo:Place
owl:Thing
A simple sort
can be very
effective
43
Evaluation
dbo:PopulatedPlace
dbo:Settlement
dbo:Place
owl:Thing
• Instances
• Instances
Class
Discussion
AKSW
• Confidence in executing the tasks:
 Indians 90%
 Americans 60%
• Ranks produced by Indians were
more sparse
• Abstract entities appear before
entities
44
Evaluation Caviats
Summary
AKSW
• Entity Ranking functions produce better results
when considering external information
• A simple sort of the number of instances can be
very effective for ranking classes
• Query logs can (not necessarily) improve entity
ranking functions
45
Evaluation
Benchmark
AKSW
• Benchmark
• Ranking functions
• Library (Java)
46
Evaluation
dbtrends.aksw.org
Future Works
AKSW
• Extend the evaluation to other
countries and ranking functions
• Evaluate the impact of
contex-aware ranking functions
• Use others similarity ranking
functions
47
Acknowledgements
48
AKSW
Contact
http://emarx.org

Edgard Marx, Amrapali Zaveri, Diego Moussallem and Sandro Rautenberg | DBtrends: Exploring query logs for ranking RDF data