More Related Content Similar to Discovery hub : an exploratory search engine on the top of DBpedia (20) Discovery hub : an exploratory search engine on the top of DBpedia1. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery hub - a discovery engine on the top of DBpedia
Nicolas Marie, Fabien Gandon, Damien Legrand, Myriam Ribière
2. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
2
CONTEXT
RESEARCH QUESTION
RESEARCH
- Proposition
- Implementation
- Operational prototypes
- Users evaluations
PUBLICATION
3. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
3
CONTEXT
RESEARCH QUESTION
RESEARCH
- Proposition
- Implementation
- Operational prototypes
- Users evaluations
PUBLICATION
4. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
Search is only a partially
solved problem
[ White, 2006]
5. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
Gary Marchioninni, 2006
6. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
Exploration/discovery todayLookup today
« Claude Monet » + impressionism« Claude Monet » + birthday
9. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
Search is only a partially
solved problem, White 2006
The degree of structure of the web
content is the determining factor
for the type of functionality that
search engines can provide,
Bizer and al., 2012
10. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
SEMANTIC WEB
• “The Semantic Web is a mesh of information linked up in such a way as to
be easily processable by machines, on a global scale. You can think of it
as being an efficient way of representing data on the World Wide Web, or
as a globally linked database.” Marianna Sigala, Luisa Mich, Jamie Murphy
11. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
Tim Berners-Lee, WWW1994
[Stankovic, 2012]
12. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
3 000+ words, 13p. 191 triples
http://en.wikipedia.org/wiki/Claude_Monet
http://dbpedia.org/resource/Claude_Monet
13. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
•Accessible through:
- Browsers
- Dumps
- SPARQL endpoint
Select * where {
<http://dbpedia.org/resource/Claude_Monet>
<http://dbpedia.org/property/influencedBy>
?x
}
14. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
DBpedia
Google
Knowledge
Graph
Linked Open
Data cloud3.77 millions things
270 millions facts
500 millions things
3.5 billions facts
31+ billions facts
Close OpenOpen
15. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
Google knowledge graph, 2012
16. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
Since 1995 Since 2007
2001 2007
17. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
• Linked-data based exploratory
search systems
User interest, start point
Interactive result space
Results choice
Ranking
Sorting/categorization
Explanation
dbpedia:
Claude Monet
18. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub State of the art: Seevl, 2010
19. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub State of the art: Yovisto, 2010
20. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub State of the art: LED, 2010
21. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub State of the art: MORE, 2012
22. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub State of the art: Aemoo, 2011
23. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub State of the art: Kaminskas et al., 2011
24. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub State of the art: Google Knowledge Panel, 2012
25. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
25
CONTEXT
RESEARCH QUESTION
RESEARCH
- Proposition
- Implementation
- Operational prototypes
- Users evaluations
PUBLICATION
26. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
•Challenge: enable on-the-fly linked data processing
for exploratory search
•3 major benefits:
- Results freshness
- Composite exploration enablement
- Fine-grained querying capabilities
27. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub Freshness
28. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
• 3.77 millions resources
• 2 resources possible combinations: 14.212.900.000.000
• 3 resources possible combinations: 53.582.633.000.000.000.000
&
Composite interest exploration: knowing my interest
for X and Y what can I discover/learn which is related
to all these resources?
29. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub Fine-grained querying capabilities
• Artists_from_Paris
• French_painters --
• Impressionist_painters ++
painted ++
influenced by ++
30. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub Fine-grained querying
• 1960s_science_fiction_films
• American_epic_films
• Films_set_on_the_Moon
• Artificial_intelligence_in_fiction
• Space_adventure_films, …
directed by
31. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
31
CONTEXT
RESEARCH QUESTION
RESEARCH
- Proposition
- Implementation
- Operational prototypes
- Users evaluations
PUBLICATION
32. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
Refer to publications for
the complete algorithm
33. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
Spreading activation basis –
monocentric
Claude_Monet
…
…
…
…
…
…
Iteration 0 Iteration 1 Iteration 2
34. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
Claude_Monet
Musée d’Orsay
Musée de l’Orangerie
…
……
Vincent Van Gogh
Spreading activation basis –
polycentric
35. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
Wheeler_School
Art Institute of Chicago Gustave_Courbet
Cadmium_sulfide
Farmington_Mountain
DBO:Museum
DBO:ChemicalSubstance
DBO:Mountain
DBO:Artist
cat:Impressionist_p…
cat:Alumni_of_beaux…
2
0 0 0
3 +2
Propagation domain: artist, book,
film, museum, river, television show,
university, writer,…
cat:Impressionist painters
cat:Alumni_of_Beaux_Arts
DBO:School
36. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
• How to be fast ?
How to execute it fast
On very a large graph
37. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub Very large graph
Locate the processing
39. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
1.sparql endpoint = http://xxx/sparql
2.seed(s) = xxx_Beatles
3. compute the propagation domain (w(i,o))
(4. find a path between the seeds)
5. import path nodes & their neighbors
6. for(i=1; i<=maxPulse; i++){
7. pulse
8. if(sampleSize <= maxSampleSize){
9. extend the sample
10. }
11.}
40. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
select distinct ?x ?y where {
service <sparqlEndpoint>
{
select * where {
?a(<…wikiPageWikiLink>|
^<…wikiPageWikiLink>){0,X} :: $path ?b
filter (?a=<resource1> &&?b=<resource2>)
}
}
graph $path {?x ?p ?y}
filter(?x!=<resource1> && ?x!=<resource2>)
}
Path query using Kgram for
polycentric SA
41. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
41
CONTEXT
RESEARCH QUESTION
RESEARCH
- Proposition
- Implementation
- Operational prototypes
- Users evaluations
PUBLICATION
42. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
• Analysis method
Analysis performed on a set of 100.000 queries
43. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
500
1000
1500
2000
2500
3000
3500
4000
4500
0 5000 10000 15000 20000
KT
Ms
Triples loading limit
Similarity of top 100 results (Kendall-Tau)
from one loading limit to another
maxSampleSize
44. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Similarity of top 100 results (shared results, KT)
from one iteration to another
maxPulse
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9 10
Kendall-Tau
Sharedresults
Iterations
KT
shared results
45. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
Milliseconds
Queries response time histogram
Response time histogram
46. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
Algorithm visualization, available @
http://www.youtube.com/user/wearediscoveryhub
47. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
Polycentric query propagation
visualization, iteration 0
• In red: Claude Monet
• In blue : Musée d’Orsay
• In purple: Recovery
48. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
Polycentric query propagation
visualization, iteration 6
• In red: Claude Monet
• In blue : Musée d’Orsay
• In purple: Recovery
49. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Semantic spreading activation
Distance A Distance B Gap A - B Max
distance A
Max
distance B
poly1 - top 10 1.53 1.68 0.34 / /
poly2 - top 10 1.52 1.66 0.33 / /
poly1 - top 100 1.90 2.12 0.49 2.60 2.60
poly2 - top 100 1.88 2.11 0.48 2.58 2.58
Polycentric
Polycentric queries, average distances
of top results from each seed.
50. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
• We studied the convergence of the algorithm according to various graph
metrics. For this purpose we generated many graphs thanks to the
Graphstream graph library, conclusion : the diameter is crucial.
Discovery Hub Influence of graph diameter
on algorithm convergence
http://graphstream-project.org/doc/Generators/
51. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 2 3 4 5 6 7 8 9 1011121314151617181920
Resultssimilarity
Iterations
diamètre 4.14
diamètre 6.72
diamètre 9.94
diamètre 13.28
diamètre 15.43
diamètre 19.59
diamètre 22.03
diamètre 24.87
diamètre 28.85
Influence of graph diameter
on algorithm convergence
52. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
0
10
20
30
40
50
60
70
80
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Averagerankvariation
Iterations
diamètre 4.14
diamètre 6.72
diamètre 9.94
diamètre 13.28
diamètre 15.43
diamètre 19.59
diamètre 22.03
diamètre 24.87
diamètre 28.85
Influence of graph diameter
on algorithm convergence
53. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
• Discovery hub is an exploratory search engine which helps you to
discover things you might like or be interested in. It widens your
cultural and knowledge horizons by revealing and explaining
unattended information.
• It allows performing queries in an innovative way and helps you to
navigate rich results. As a hub, it proposes redirections to others
platforms to make you benefit from your discoveries (Youtube,
Deezer and more). The results are explained in depth thanks to 3
explanatory features.
• Discovery Hub supports simple and composite explorations i.e.
starting from one or several items of interest. It proposes and is able
to combine advanced exploration modes such as serendipitous,
multi-lingual, and fine-grained ones.
Discovery Hub powered
54. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
54
CONTEXT
RESEARCH QUESTION
RESEARCH
- Proposition
- Implementation
- Operational prototypes
- Users evaluations
PUBLICATION
55. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
1. Start from what you
like or are interested in
2.
Explore, discover, under
stand
3. Be redirected on great
platforms to experience
your discoveries
powered
Book
58. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
• 3 features to understand the results: common properties
Discovery Hub
59. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
• 3 features to understand the results: Wikipedia crossed references
60. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
• 3 features to understand the results: explanatory graph
61. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub Internationalization
62. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub Serendipitous mode
?
?
?
?
Claude_Monet
…
?
…
?
?
…
?
…
?
…
?
…
Iteration 0 Iteration 1 Iteration 2
63. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub Multi-lingual mode
dbpedia:Claude_Monet sameAs fr.dbpedia:Claude_Monet
64. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub Fine search mode
1960s_science_fiction_films
Films_set_on_the_Moon
Artificial_intelligence_in_fiction
Space_adventure_films
Top 4 films
65. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
Directed by Stanley Kubrick
Top 4 films
Fine search mode
66. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
1960s_science_fiction_films
Films_set_on_the_Moon
Artificial_intelligence_in_fiction
Space_adventure_films
Directed by Stanley Kubrick
Top 4 films
Fine search mode
67. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub Multi-criterias mode
68. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
« Surprise mode »
Multi-lingual
Fine-search
Multi-criterias mode
69. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
Demo videos, available @
http://www.youtube.com/user/wearediscoveryhub
70. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
70
CONTEXT
RESEARCH QUESTION
RESEARCH
- Proposition
- Implementation
- Operational prototypes
- Users evaluations
PUBLICATION
71. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
• Mono-centric queries: evaluated positively on movie domain against
another algorithm: the sSVM implemented in MORE movie recommender
Very
interesting
Not
interesting
at all
72. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Scores for partial lists
Discovery Hub
73. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub - Films_about_criticism_and_refusal_of_work
- Anti-modernist_films
- Fiction_with_unreliable_narrators
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Neutral Personalized
Interesting
74. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub • Poly-centric queries: evaluated positively
0
0.5
1
1.5
2
2.5
3
0
0.5
1
1.5
2
2.5
3
Relevance
Discovery
Very
interesting
Not interesting
at all
Very surprizing
Not suprizing
at all
75. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
•Recovery between relevance and unexpectedness:
- 61.6% of results were rated as strongly relevant or relevant
by the participants.
- 65% of results were rated as strongly unexpected or
unexpected.
- 35.42% of results were rated both as strongly relevant or
relevant and strongly unexpected or unexpected.
76. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub •Explanation features
0
0.5
1
1.5
2
2.5
3
In
Common
Wikipedia Graph Overall
Monocentric Polycentric
Common
prop.
Wiki-based Graph-based Overall
Very
Helpful
Not helpful
At all
77. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Publications:
• Nicolas Marie, Fabien Gandon, Myriam Ribière, Florentin Rodio, Discovery Hub:
on-the-fly linked data exploratory search. I-Semantics 2013, Graz, 4 – 6
september (paper).
• Nicolas Marie, Fabien Gandon, Damien Legrand, Myriam Ribière, Exploratory
Search on the top of DBpedia chapters with the Discovery Hub Application.
ESWC2013, Montpellier, 26 – 30 may (demo+poster).
• Nicolas Marie, Olivier Corby, Fabien Gandon, Myriam Ribière, Composite interests
exploration thanks to on-the-fly linked data spreading activation, Hypertext 2013,
1-3 may, Paris (paper).
• Clare J. Hooper, Nicolas Marie, Evangelos Kalampokis, Dissecting the Butterfly:
Representation of Disciplines Publishing at the Web Science Conference Series,
Web Science 2012, Northeastern university, Evanston, United States, 22-24 june
(paper).
• Nicolas Marie, Fabien Gandon. Advanced social objects recommendation in
multidimensional social networks. Social Object Workshop 2011, MIT, Boston, USA
(paper).
Discovery Hub
78. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
78
CONTEXT
RESEARCH QUESTION
RESEARCH
- Proposition
- Implementation
- Operational prototypes
- Users evaluation
PUBLICATION
80. COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
ncmarie3&@gmail.com
http://ncmarie.tumblr.com
http://discoveryhub.co
werarediscoveryhub@gmail.com
Thank you !