PIKM 2010 – Workshop for Ph.D. Students in Information and Knowledge Management
October 30, 2010 – Fairmont Royal York, Toronto, Canada
FROM EXPLORATORY SEARCH
TO WEB SEARCH AND BACK
Politecnico di Bari
Via Orabona, 4
70125 Bari (ITALY)
Roberto Mirizzi, Tommaso Di Noia
mirizzi@deemail.poliba.it, t.dinoia@poliba.it
PIKM 2010 – Workshop for Ph.D. Students in Information and Knowledge Management
October 30, 2010 – Fairmont Royal York, Toronto, Canada
Outline
Tags to improve Web Search
Exploratory Search
LED (Lookup Explore Discover): exploratory
search in the Web (of Data)
DBpediaRanker: RDF ranking in DBpedia
Conclusion and Future work
PIKM 2010 – Workshop for Ph.D. Students in Information and Knowledge Management
October 30, 2010 – Fairmont Royal York, Toronto, Canada
Why we use tags?
and many
more…
PIKM 2010 – Workshop for Ph.D. Students in Information and Knowledge Management
October 30, 2010 – Fairmont Royal York, Toronto, Canada
What is Exploratory Search?
[Gary Marchionini. Exploratory Search: From Finding to understanding. Communications of the ACM, 49(4): 41-46, 2006]
PIKM 2010 – Workshop for Ph.D. Students in Information and Knowledge Management
October 30, 2010 – Fairmont Royal York, Toronto, Canada
Can Semantic tags support Exploratory search?
Plugged into the Web 3.0
Disambiguation
Relations among tags
Machine understandable
Semantic-aided query refinement
LED: Lookup Explore Discover
http://sisinflab.poliba.it/led/
If Semantic tags helped 10% of Internet users to save 10 minutes per month on their searches, this would save globally over 4,000,000 of working hours per year
PIKM 2010 – Workshop for Ph.D. Students in Information and Knowledge Management
October 30, 2010 – Fairmont Royal York, Toronto, Canada
LED: Lookup Explore Discover
Objectives
 Enable users to properly
explore the semantics of a
keyword
 Guide users to refine a
query suggesting related
topics/keywords
Improve lookup search to explore knowledge
PIKM 2010 – Workshop for Ph.D. Students in Information and Knowledge Management
October 30, 2010 – Fairmont Royal York, Toronto, Canada
What is behind LED? (i)
PIKM 2010 – Workshop for Ph.D. Students in Information and Knowledge Management
October 30, 2010 – Fairmont Royal York, Toronto, Canada
What is behind LED? (ii)
Comments
 DBpedia resources are
highly interconnected
in the RDF graph
 Not all the relevant
resources for a given
node are its direct
neighbors
1. Explore the
neighborhood of a
resource to discover
new relevant
resources not
directly connected to
it
2. Rank the results
PIKM 2010 – Workshop for Ph.D. Students in Information and Knowledge Management
October 30, 2010 – Fairmont Royal York, Toronto, Canada
DBpedia graph exploration in LED
Semantic_Web XML-based_standards
Knowledge_representation Data_management Internet_architecture
Triplestores Folksonomy
…
…
XML Computer_and_telecommunication_stantards
Web_services User_interface_markup_languages Scalable_Vector_GraphicsMicroformats
skos:subject skos:broaderCategoryArticle
Legend
……
…
Resource Description Framework
Microformat
RDFa
…
…
PIKM 2010 – Workshop for Ph.D. Students in Information and Knowledge Management
October 30, 2010 – Fairmont Royal York, Toronto, Canada
The functional architecture
Back-end
Query engine
Storage
GUI
Ext.InfoSources
DBpedia
Lookup
Service
Interface
Delicious
Yahoo!
Bing
Google
Graph
Explorer
SPARQL
Context
Analyzer
Ranker
Offline computation
Linked Data graph
exploration
Rank nodes exploiting
external information
Store results as pairs of
nodes together with their
similarity
Runtime Search
Start typing a query
Query the system for
relevant tags
(corresponding to DBpedia
resources) and aggregate
results
Show the semantic tag
cloud and the results
1
2
3
1
2
3
OfflinecomputationRuntimesearch
1
2
3
1
2
3
Tag Cloud
Generator
Meta-search
engine
PIKM 2010 – Workshop for Ph.D. Students in Information and Knowledge Management
October 30, 2010 – Fairmont Royal York, Toronto, Canada
DBpediaRanker: ranking
?r1 ?r2
isSimilar
v
hasValue
einfo_sourc2
21
1
21
einfo_sourc21
)(
),(
)(
),(
),(
rf
rrf
rf
rrf
rrsim 






viceversaandrandrbetweenwikilink,2
saor viceverrandrbetweenkwikilin,1
randrbetweenwikilinkno,0
),(
21
21
21
21 rrorewikilinkSc
)(
),(
),(
2
12
21
rl
rrl
rroreabstractSc 
Graph-based and text-based ranking
Ranking based on external sources
PIKM 2010 – Workshop for Ph.D. Students in Information and Knowledge Management
October 30, 2010 – Fairmont Royal York, Toronto, Canada
DBpediaRanker: an example (i)
wikilinkScore(RDFa, Resource_Description_Framework) = 2
abstractScore(RDFa, Resource_Description_Framework) = 1.0
PIKM 2010 – Workshop for Ph.D. Students in Information and Knowledge Management
October 30, 2010 – Fairmont Royal York, Toronto, Canada
DBpediaRanker: an example (ii)
sim(RDFa, Resource_Description_Framework)Google = 1.67e5 / 4.42e5 + 1.67e5 / 1.19e7 = 0.39
delicious
PIKM 2010 – Workshop for Ph.D. Students in Information and Knowledge Management
October 30, 2010 – Fairmont Royal York, Toronto, Canada
DBpediaRanker: context analysis
The same similarity measure is used in the context analysis
?r1
?c1
belongsTo
v
hasValue
?c2
?c…
?cN
C
Example:
C = {Programming Languages, Databases, Software}
Does Dennis Ritchie belongs to the given context?
Algorithm:
If(v>THRESHOLD) then
r1 belongs to the context;
add r1 to the graph exploration queue
Else
r1 does not belong to the context;
exclude r1 from graph exploration
EndIf
PIKM 2010 – Workshop for Ph.D. Students in Information and Knowledge Management
October 30, 2010 – Fairmont Royal York, Toronto, Canada
Evaluation (i)
http://sisinflab.poliba.it/evaluation
 Comparison of 5 different algorithms
 50 volunteers
 Researchers in the ICT area
 244 votes collected (on average 5 votes for each users)
 Average time to vote: 1min and 40secs
PIKM 2010 – Workshop for Ph.D. Students in Information and Knowledge Management
October 30, 2010 – Fairmont Royal York, Toronto, Canada
Evaluation (ii)
http://sisinflab.poliba.it/evaluation/data
3.91 - Good
PIKM 2010 – Workshop for Ph.D. Students in Information and Knowledge Management
October 30, 2010 – Fairmont Royal York, Toronto, Canada
Conclusion
 LED: a system for exploratory search and query
refinement on the (Semantic) Web
 DBpediaRanker: ranking algorithms for resources in
DBpedia
Future work
 Expose a RESTful API for building novel mashups and for
comparing with different systems
 Improve ranking algorithms
 Deal with cases where a single knowledge base in not
sufficient
 Combine a content-based recommendation and a
collaborative-filtering approach
PIKM 2010 – Workshop for Ph.D. Students in Information and Knowledge Management
October 30, 2010 – Fairmont Royal York, Toronto, Canada
FROM EXPLORATORY SEARCH TO WEB SEARCH AND BACK (PIKM 2010)
If you're interested in learning more…
1. Roberto Mirizzi, Azzurra Ragone, Tommaso Di Noia, Eugenio Di Sciascio. Semantic tags generation and retrieval for online
advertising. 19th ACM International Conference on Information and Knowledge Management (CIKM 2010)
2. Roberto Mirizzi, Azzurra Ragone, Tommaso Di Noia, Eugenio Di Sciascio. Ranking the Linked Data: the case of DBpedia. 10th
International Conference on Web Engineering (ICWE 2010)
3. Roberto Mirizzi, Azzurra Ragone, Tommaso Di Noia, Eugenio Di Sciascio. Semantic tag cloud generation via DBpedia. 11th
International Conference on Electronic Commerce and Web Technologies (EC-Web 2010)
4. Roberto Mirizzi, Azzurra Ragone, Tommaso Di Noia, Eugenio Di Sciascio. Semantic tagging for crowd computing. 18th Italian
Symposium on Advanced Database Systems (SEBD 2010)
5. Roberto Mirizzi, Azzurra Ragone, Tommaso Di Noia, Eugenio Di Sciascio. Semantic Wonder Cloud: exploratory search in DBpedia.
2th International Workshop on Semantic Web Information Management (SWIM 2010) - Best Workshop Paper at International
Conference on Web Engineering (ICWE 2010)
Roberto Mirizzi - mirizzi@deemail.poliba.it
Thanks for your attention!
PIKM 2010 – Workshop for Ph.D. Students in Information and Knowledge Management
October 30, 2010 – Fairmont Royal York, Toronto, Canada

From Exploratory Search to Web Search and back - PIKM 2010

  • 1.
    PIKM 2010 –Workshop for Ph.D. Students in Information and Knowledge Management October 30, 2010 – Fairmont Royal York, Toronto, Canada FROM EXPLORATORY SEARCH TO WEB SEARCH AND BACK Politecnico di Bari Via Orabona, 4 70125 Bari (ITALY) Roberto Mirizzi, Tommaso Di Noia mirizzi@deemail.poliba.it, t.dinoia@poliba.it
  • 2.
    PIKM 2010 –Workshop for Ph.D. Students in Information and Knowledge Management October 30, 2010 – Fairmont Royal York, Toronto, Canada Outline Tags to improve Web Search Exploratory Search LED (Lookup Explore Discover): exploratory search in the Web (of Data) DBpediaRanker: RDF ranking in DBpedia Conclusion and Future work
  • 3.
    PIKM 2010 –Workshop for Ph.D. Students in Information and Knowledge Management October 30, 2010 – Fairmont Royal York, Toronto, Canada Why we use tags? and many more…
  • 4.
    PIKM 2010 –Workshop for Ph.D. Students in Information and Knowledge Management October 30, 2010 – Fairmont Royal York, Toronto, Canada What is Exploratory Search? [Gary Marchionini. Exploratory Search: From Finding to understanding. Communications of the ACM, 49(4): 41-46, 2006]
  • 5.
    PIKM 2010 –Workshop for Ph.D. Students in Information and Knowledge Management October 30, 2010 – Fairmont Royal York, Toronto, Canada Can Semantic tags support Exploratory search? Plugged into the Web 3.0 Disambiguation Relations among tags Machine understandable Semantic-aided query refinement LED: Lookup Explore Discover http://sisinflab.poliba.it/led/ If Semantic tags helped 10% of Internet users to save 10 minutes per month on their searches, this would save globally over 4,000,000 of working hours per year
  • 6.
    PIKM 2010 –Workshop for Ph.D. Students in Information and Knowledge Management October 30, 2010 – Fairmont Royal York, Toronto, Canada LED: Lookup Explore Discover Objectives  Enable users to properly explore the semantics of a keyword  Guide users to refine a query suggesting related topics/keywords Improve lookup search to explore knowledge
  • 7.
    PIKM 2010 –Workshop for Ph.D. Students in Information and Knowledge Management October 30, 2010 – Fairmont Royal York, Toronto, Canada What is behind LED? (i)
  • 8.
    PIKM 2010 –Workshop for Ph.D. Students in Information and Knowledge Management October 30, 2010 – Fairmont Royal York, Toronto, Canada What is behind LED? (ii) Comments  DBpedia resources are highly interconnected in the RDF graph  Not all the relevant resources for a given node are its direct neighbors 1. Explore the neighborhood of a resource to discover new relevant resources not directly connected to it 2. Rank the results
  • 9.
    PIKM 2010 –Workshop for Ph.D. Students in Information and Knowledge Management October 30, 2010 – Fairmont Royal York, Toronto, Canada DBpedia graph exploration in LED Semantic_Web XML-based_standards Knowledge_representation Data_management Internet_architecture Triplestores Folksonomy … … XML Computer_and_telecommunication_stantards Web_services User_interface_markup_languages Scalable_Vector_GraphicsMicroformats skos:subject skos:broaderCategoryArticle Legend …… … Resource Description Framework Microformat RDFa … …
  • 10.
    PIKM 2010 –Workshop for Ph.D. Students in Information and Knowledge Management October 30, 2010 – Fairmont Royal York, Toronto, Canada The functional architecture Back-end Query engine Storage GUI Ext.InfoSources DBpedia Lookup Service Interface Delicious Yahoo! Bing Google Graph Explorer SPARQL Context Analyzer Ranker Offline computation Linked Data graph exploration Rank nodes exploiting external information Store results as pairs of nodes together with their similarity Runtime Search Start typing a query Query the system for relevant tags (corresponding to DBpedia resources) and aggregate results Show the semantic tag cloud and the results 1 2 3 1 2 3 OfflinecomputationRuntimesearch 1 2 3 1 2 3 Tag Cloud Generator Meta-search engine
  • 11.
    PIKM 2010 –Workshop for Ph.D. Students in Information and Knowledge Management October 30, 2010 – Fairmont Royal York, Toronto, Canada DBpediaRanker: ranking ?r1 ?r2 isSimilar v hasValue einfo_sourc2 21 1 21 einfo_sourc21 )( ),( )( ),( ),( rf rrf rf rrf rrsim        viceversaandrandrbetweenwikilink,2 saor viceverrandrbetweenkwikilin,1 randrbetweenwikilinkno,0 ),( 21 21 21 21 rrorewikilinkSc )( ),( ),( 2 12 21 rl rrl rroreabstractSc  Graph-based and text-based ranking Ranking based on external sources
  • 12.
    PIKM 2010 –Workshop for Ph.D. Students in Information and Knowledge Management October 30, 2010 – Fairmont Royal York, Toronto, Canada DBpediaRanker: an example (i) wikilinkScore(RDFa, Resource_Description_Framework) = 2 abstractScore(RDFa, Resource_Description_Framework) = 1.0
  • 13.
    PIKM 2010 –Workshop for Ph.D. Students in Information and Knowledge Management October 30, 2010 – Fairmont Royal York, Toronto, Canada DBpediaRanker: an example (ii) sim(RDFa, Resource_Description_Framework)Google = 1.67e5 / 4.42e5 + 1.67e5 / 1.19e7 = 0.39 delicious
  • 14.
    PIKM 2010 –Workshop for Ph.D. Students in Information and Knowledge Management October 30, 2010 – Fairmont Royal York, Toronto, Canada DBpediaRanker: context analysis The same similarity measure is used in the context analysis ?r1 ?c1 belongsTo v hasValue ?c2 ?c… ?cN C Example: C = {Programming Languages, Databases, Software} Does Dennis Ritchie belongs to the given context? Algorithm: If(v>THRESHOLD) then r1 belongs to the context; add r1 to the graph exploration queue Else r1 does not belong to the context; exclude r1 from graph exploration EndIf
  • 15.
    PIKM 2010 –Workshop for Ph.D. Students in Information and Knowledge Management October 30, 2010 – Fairmont Royal York, Toronto, Canada Evaluation (i) http://sisinflab.poliba.it/evaluation  Comparison of 5 different algorithms  50 volunteers  Researchers in the ICT area  244 votes collected (on average 5 votes for each users)  Average time to vote: 1min and 40secs
  • 16.
    PIKM 2010 –Workshop for Ph.D. Students in Information and Knowledge Management October 30, 2010 – Fairmont Royal York, Toronto, Canada Evaluation (ii) http://sisinflab.poliba.it/evaluation/data 3.91 - Good
  • 17.
    PIKM 2010 –Workshop for Ph.D. Students in Information and Knowledge Management October 30, 2010 – Fairmont Royal York, Toronto, Canada Conclusion  LED: a system for exploratory search and query refinement on the (Semantic) Web  DBpediaRanker: ranking algorithms for resources in DBpedia Future work  Expose a RESTful API for building novel mashups and for comparing with different systems  Improve ranking algorithms  Deal with cases where a single knowledge base in not sufficient  Combine a content-based recommendation and a collaborative-filtering approach
  • 18.
    PIKM 2010 –Workshop for Ph.D. Students in Information and Knowledge Management October 30, 2010 – Fairmont Royal York, Toronto, Canada FROM EXPLORATORY SEARCH TO WEB SEARCH AND BACK (PIKM 2010) If you're interested in learning more… 1. Roberto Mirizzi, Azzurra Ragone, Tommaso Di Noia, Eugenio Di Sciascio. Semantic tags generation and retrieval for online advertising. 19th ACM International Conference on Information and Knowledge Management (CIKM 2010) 2. Roberto Mirizzi, Azzurra Ragone, Tommaso Di Noia, Eugenio Di Sciascio. Ranking the Linked Data: the case of DBpedia. 10th International Conference on Web Engineering (ICWE 2010) 3. Roberto Mirizzi, Azzurra Ragone, Tommaso Di Noia, Eugenio Di Sciascio. Semantic tag cloud generation via DBpedia. 11th International Conference on Electronic Commerce and Web Technologies (EC-Web 2010) 4. Roberto Mirizzi, Azzurra Ragone, Tommaso Di Noia, Eugenio Di Sciascio. Semantic tagging for crowd computing. 18th Italian Symposium on Advanced Database Systems (SEBD 2010) 5. Roberto Mirizzi, Azzurra Ragone, Tommaso Di Noia, Eugenio Di Sciascio. Semantic Wonder Cloud: exploratory search in DBpedia. 2th International Workshop on Semantic Web Information Management (SWIM 2010) - Best Workshop Paper at International Conference on Web Engineering (ICWE 2010) Roberto Mirizzi - mirizzi@deemail.poliba.it Thanks for your attention!
  • 19.
    PIKM 2010 –Workshop for Ph.D. Students in Information and Knowledge Management October 30, 2010 – Fairmont Royal York, Toronto, Canada