PhD Presentation: Exploring Semantic Relationships in the Web of Data

Exploring Semantic Relationships in the Web of Data
Laurens De Vocht – 3.7.2017
DEPARTMENT OF ELECTRONICS AND INFORMATION SYSTEMS
IDLAB

I. Searching the web
II. Reveal releationships
III. Visually explore relationships
3

4

5
EXAMPLE: FIND OUT MORE ABOUT EINSTEIN

How to search so many pages this fast?

SEACHING IN PHYSICAL DOCUMENTS

CLASSICAL RETRIEVAL MODEL OF SEARCH ENGINES
Document
Document-
representation
Query Information ‘need’
[Bates, 1989; Robertson, 1977]
Search engine
‘match’
Index

SEARCHING THE WEB
Impressive state of the art
Millions of results, almost always relevant results among the first
10
Increidbly fast
< 1s
Billions of documents, spread across the globe, within a few
>50 billion estimated in index of largest search engines.
[van den Bosch, Bogers & Kunder, 2016]

LIMITATIONS OF CURRENT WEB SEARCH ENGINES
A. Further explore search results?
Exploratory search
B. What if the keywords intended something else?
Semantics
C. Combine different search results?
Find relationships
17

SEARCHING THE WEB: NEXT STEPS
A. Exploratory search
B. Semantics
C. Find relationships
18

Lookup Learn Investigate
Exploratory search
DIFFERENT TYPES OF SEARCH ACTIVITIES
21
Classic
information retrieval model
[Marchionini, 2006]

B. Semantics
22

SEMANTIEK KORT UITGELEGD
23
I can eat an apple.
But I can’t eat Apple.

SEMANTICS IN WEB DOCUMENTS
24
𝒆 = 𝒎𝒄 𝟐
General relativity
Theory of special relativity
Albert Einstein
Twin paradox

ANALOGY ON HOW MACHINES FIND THINGS IN WEB
DOCUMENTS
25
Unieke identification on the web:
Uniform Resource Identifier: URI
Unique identification in a printed atlas
7
L

URI’S VOOR DATA OP HET WEB
𝒆 = 𝒎𝒄 𝟐
http://dbpedia.org/resource/Albert_Einstein
http://dbpedia.org/resource/Special_relativity
http://dbpedia.org/resource/Twin_Paradoxhttp://dbpedia.org/resource/General_relativity
26

EVERY CONCEPT IS ASSIGNED
PROPERTIES/ATTRIBUTES
27
𝒆 = 𝒎𝒄 𝟐
http://dbpedia.org/resource/Special_relativity
http://dbpedia.org/page/Special_relativity
𝒆 = 𝒎𝒄 𝟐
(…)
dbr:Special_relativity dbc:subject dbc:Concepts_in_physics .
(…)
dbr:Albert_Einstein dbp:knownFor dbr:Special_relativity .
(…)

TRIPLE
28
dbr:Albert_Einstein dbp:knownFor dbr:Special_relativity .
subject predicate object
Resource Description Framework
RDF
Namespace vocabularia
dbr: http://dbpedia.org/resource/
dbp: http://dbpedia.org/property/

B. Semantics
29

?
EXAMPLE: CONNECTION BETWEEN EINSTEIN AND
NEWTON

FIND RELATIONSHIPS
Non common properties? More distant relationships?
Not all things are being related to each other and described within a single
document.

1
2
Efficient revealing relationships between data.
Allow users to gradually refine their search queries.
Map the influence of different search actions on the search precision.
Determine the contribution of revealed relationships while searching.
34
3
PURPOSE OF THIS PHD RESEARCH
4
part II
part III

35

REVEALING RELATIONSHIPS: SERENDIPITY

37
EXAMPLE: REVEALING RELATIONSHIPS
Einstein
Newton
Physics
Hume :influences
:discipline
:birthplace
:residenc
e
:discipline
:influences
(…)

38
MANY, MANY POSSIBILITIES, EVEN FOR ‘SIMPLE’
RELATIONSHIPS

ITERATIVE ALGORITHM TO FIND RELATIONSHIPS
39
initialisation
filtering
find relationships
score relationships
next iteration?
(…)
index
RDF
relationships with
different path lengths

ITERATIVE ALGORITHM TO FIND RELATIONSHIPS
40
expand
search space
filtering
find relationships
score relationships
next iteration?
(…)
index
RDF
relationships with
different path lengths

INCREASING COMPLEXITY
Number of data elements (resources) to check with increasing path length
41

ARE THE RELATIONSHIPS RANDOM, ARBITRARY?

A PRIORI ESTIMATION
43
Heuristics
“the art of finding”
Examples:
 Jaccard distance
difference in semantic
relationships
 Normalized (DBpedia) distance
based on common references
 Confidence
possibility a resource does not
occur if another already does

A PRIORI ESTIMATION
44
Weights
Assign value to a relationship
Examples:
 Jaccard distance
difference in properties
 Combined node degree
rare things
 Jiang & Conrath
relations on the same level of
abstraction

A POSTERIORI SCORING
45
Semantic ranking
The score includes all relationships and all
resources along the entire path

EVALUATION: TRIVIAABOUT (KNOWN) SCIENTISTS
A priori estimates evaluated according to
som semantic ranking mechanisms and a user study.
Different relationships combined in a short ‘story’ about combinations of pairs:
Carl Linnaeus
Charles Darwin
Albert Einstein
Isaac Newton
Dataset
46

PATH SCORE: SEMANTIC RANKING
Focus on
Semantic commonalities
Focus on
Semantic differencesMixed
47

USER STUDY RESULTS
% voorkeur relatief t.o.v
mediaan in paarsgewijze
A/B beoordelingen.
48

EVALUATION: CONFERENCES & DIGITAL LIBRARIES IN COMPUTER
SCIENCES
Check the precision of search results during the search.
Comparison between:
own method (minimal cost paths with optimale estimates)
de de-facto baseline for many semantic applications, ‘Virtuoso’ (kortste paden)
Datasets
49

Eigen methode
SEARCH PRECISION
50
Virtuoso (baseline)
Baseline: more stable and on average similar
Ownl method: notable high scores for Q1, Q4 en Q7
Gemiddelde
Precisie

51

WHEN EXPLORATORY SEARCH?
When users
(i) Do not know exactly how to formulate the most suited search query;
(ii) Rather want to browse or surf information than lookup something
specific.
52

FROM SEARCHING IN DOCUMENTS TO SEARCHING IN
DATA
53
Zoekmachine
Zoekresultaten
Vraagstellin
g
(…)

FROM SEARCHING IN DOCUMENTS TO SEARCHING IN
DATA
Zoekmachine
Zoekresultaten
Vraagstellin
g
(…)
?
54

EXPLORATORY SEARCH IN DATA
[Tvazorek et al., 2010] [Smith et al., 2005]
Via interacting the underlying data structure
Network based Tabular or faceted
55

PROPOSED DATA INTERACTION FLOW
56

Hypothesis
Revealing realtionships
among indirect related computer science publications, conferences and researchers,
facilities adding new relevant results to already found results.
Testing
A. Added value of revealing relationships among search results
B. Effectiveness and productivity of different search actions
Datasets
58
EVALUATION: SCENARIO

A. ADDED VALUE OF REVEALING RELATIONSHIPS AMONG SEARCH RESULTS
59
Effect with a simple and a complex query.
Simple
“Find a publication. Find a number of publications that have common co-authors with
the found publications.”
Complex
“Find multiple persons that had a publication two consequent years in the same
conference series”.
Search details to be filled in by the users.
The users were not aware if the pathfinding functionality was activated or not

A. ADDED VALUE OF REVEALING RELATIONSHIPS AMONG SEARCH RESULTS
60
0
10
20
30
40
50
60
70
Simple Query Complex Query
Negative (%) Positive (%)

B. EFFECTIVENESS AND PRODUCTIVITY OF DIFFERENT SEARCH ACTIONS
Tested actions:
1. Keyword-based search query
2. Add a top related resource
3. Expand neighbours
4. Expand neighbour of neigbour
5. Expand further related resource
61
Einstein
Search Query
Top related
Special Relativity
General Relativity
Twin Paradox

EFFECTIVITY OF A SEARCH ACTION
62
‘All’ data Showed data Relevante
showed data
Effectivity here equals precision
𝑬
E = amount of
relevant showed data
to showed data

PRODUCTIVITY OF CONSECUTIVE SEARCH ACTIONS
0
1
2
…
k
Consecutive
Search Actions
𝑬 𝟎
63
P = average increase of effectivity
after k search actions measured from
the second search action on (I)

B. EFFECTIVENESS AND PRODUCTIVITY OF DIFFERENT SEARCH ACTIONS
0
10
20
30
40
50
60
Lookup Add top related Neighbour expand Neighbour of
neigbour expand
Expand further
related resource
Effectiveness (%) Productivity (%)
64

EXPLORING SEMANTIC RELATIONSHIPS ON THE WEB
66
Compared searching the web vs. searching physical documents; impressive state of
the art.
From searching to exploring via ‘berrypicking’, more possibilities than pure ‘lookup’.
Semantics:
the meaning of resources, aside from their expression, description or
representation;
documents describe resources and consist of data;
‘linked’ data has a threefold structure ‘triples’ to express semantics.
Exploring relationships between resources is not trivial for non-common properties.

 Alternative for searching in different data sources using each time another search interface:
→ exploratory search via semantic relationships between data
 Choice of heuristics and weights contribute to and influence the serendipity among results.
 Focus on revealing semantic relationships
→ supporting visually exploratory search in data on the web
 The techniques are mainly tested with data on:
→ encyclopedic facts from Wikipedia (DBpedia)
→ academic digital libraries (DBLP) en conferences (COLINDA)
 Proposed techniques remain close to the structure of the linked data (RDF),
→ methods applicable in other domains that have linked data.
MOST IMPORTANT TAKEAWAYS
67

PhD Presentation: Exploring Semantic Relationships in the Web of Data

PhD Presentation: Exploring Semantic Relationships in the Web of Data

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to PhD Presentation: Exploring Semantic Relationships in the Web of Data

Similar to PhD Presentation: Exploring Semantic Relationships in the Web of Data (20)

More from Laurens De Vocht

More from Laurens De Vocht (12)

Recently uploaded

Recently uploaded (20)

PhD Presentation: Exploring Semantic Relationships in the Web of Data