10th International Conference on Web Engineering, Vienna
July 5-9, 2010
Ranking the LinkedData:
the case of DBpedia
Roberto Mirizzi1, Azzurra Ragone1,2,
Tommaso Di Noia1, Eugenio Di Sciascio1
1Politecnico di Bari
Via Orabona, 4
70125 Bari (ITALY)
2University of Trento
Via Sommarive, 14
38100 Trento (ITALY)
10th International Conference on Web Engineering, Vienna
July 5-9, 2010
Outline
• Tags are all around
• NOT (Not Only Tag): what is it?
• NOT a look behind the curtains:
– Ranking of RDF resources: an hybrid approach
• Evaluation
• Conclusion and Future Work
10th International Conference on Web Engineering, Vienna
July 5-9, 2010
Tags are all around
10th International Conference on Web Engineering, Vienna
July 5-9, 2010
Tag cloud
and many
more…
10th International Conference on Web Engineering, Vienna
July 5-9, 2010
Tagging: a double face
Annotation phase Retrieval phase
10th International Conference on Web Engineering, Vienna
July 5-9, 2010
Problems with annotation
• Insert as much as possible tags (time
consuming):
– different versions of the same tag to catch all the
possible searches
– Multilingual tags
10th International Conference on Web Engineering, Vienna
July 5-9, 2010
Problem with retrieval
• Exactly (syntactic) match among tags: web
service is different from web services,
webservices,…
10th International Conference on Web Engineering, Vienna
July 5-9, 2010
Why not to use Semantic tags?
Plugged into the Web 3.0
Disambiguation
Relations among tags
Machine understandable
NOT: Not Only Tag
http://sisinflab.poliba.it/not-only-tag/
10th International Conference on Web Engineering, Vienna
July 5-9, 2010
Demo
• Let’s imagine to tag the book:
10th International Conference on Web Engineering, Vienna
July 5-9, 2010
NOT
http://sisinflab.poliba.it/not-only-tag/
10th International Conference on Web Engineering, Vienna
July 5-9, 2010
Smarter taggingAnnotationphaseRetrievalphase
10th International Conference on Web Engineering, Vienna
July 5-9, 2010
What is behind NOT?
• DBpedia graph exploration
• Computation of similarity value between each
pair of RDF resources using external
information sources (search engines,
bookmarking systems)
10th International Conference on Web Engineering, Vienna
July 5-9, 2010
What is behind NOT? (II)
10th International Conference on Web Engineering, Vienna
July 5-9, 2010
What is behind NOT? (III)
10th International Conference on Web Engineering, Vienna
July 5-9, 2010
What is behind NOT? (IV)
Semantic_Web XML-based_standards
Knowledge_representation Data_management Internet_architecture
Triplestores Folksonomy
…
…
XML Computer_and_telecommunication_stantards
Web_services User_interface_markup_languages Scalable_Vector_GraphicsMicroformats
skos:subject skos:broaderCategoryArticle
Legend
……
…
Resource Description Framework
Microformat
RDFa
…
…
10th International Conference on Web Engineering, Vienna
July 5-9, 2010
DBpedia-Ranker: hybrid ranking
?r1 ?r2
isSimilar
v
hasValue
)(
),(
)(
),(
),(
2
21
1
21
21
rf
rrf
rf
rrf
rrsim 






viceversaandrandrbetweenwikilink,2
saor viceverrandrbetweenkwikilin,1
randrbetweenwikilinkno,0
),(
21
21
21
21 rrorewikilinkSc
)(
),(
),(
2
12
21
rl
rrl
rroreabstractSc 
Graph-based ranking
External sources-based ranking
10th International Conference on Web Engineering, Vienna
July 5-9, 2010
Functional Architecture
Back-end
Query engine
Storage
Cloud
Generator
GUI
Ext.InfoSources
DBpedia
Lookup
Service
Delicious
Yahoo!
Bing
Graph
Explorer
SPARQL
Context
Analyzer
Ranker
Offline computation
Linked Data graph
exploration
Rank nodes exploiting
external information
Store results as pairs of
nodes together with their
similarity
Runtime Search
Start typing a tag
Query the system for
relevant tags
(corresponding to DBpedia
resources)
Show the semantic tag
cloud
1
2
3
1
2
3
1
Offlinecomputation
2
3
1
2
3
GoogleGoogle
Runtimesearch
10th International Conference on Web Engineering, Vienna
July 5-9, 2010
Evaluation
We evaluate five different algorithms:
1. DBpediaRanker
2. DBpediaRanker minus Wikipedia info
3. DBpediaRanker minus ext info sources
4. Co-occurrence
5. Similarity Distance
),()()(
),(
),(
2121
21
21
rrfrfrf
rrf
rrcoOcc


 
)}(log),(min{loglog
),(log)(log),(logmax
),(
21
2121
21
rfrfN
rrfrfrf
rrngd



10th International Conference on Web Engineering, Vienna
July 5-9, 2010
Evaluation (II)
http://sisinflab.poliba.it/evaluation
 50 volunteers
Researchers in the ICT area
244 votes collected (on average 5
votes for each users)
Time to vote: 1min and 40secs
10th International Conference on Web Engineering, Vienna
July 5-9, 2010
Evaluation (III)
http://sisinflab.poliba.it/evaluation/data
3.91 - Good
10th International Conference on Web Engineering, Vienna
July 5-9, 2010
Conclusion
• NOT *is* useful in the annotation phase:
– suggestions of semantically related tags
– Tags enrichment
• NOT *is* useful in the retrieval phase:
– Semantic match among tags
10th International Conference on Web Engineering, Vienna
July 5-9, 2010
Future Work
10th International Conference on Web Engineering, Vienna
July 5-9, 2010
Impakt Revolution
http://sisinflab.poliba.it/impakt-revolution/
10th International Conference on Web Engineering, Vienna
July 5-9, 2010
Inspiration: Google Wonder Wheel
Exploratory Search in Google…
…nice, but there is no “semantics” in it.
You can not discover new knowledge exploiting the meaning of a term (keyword/tag/query)
10th International Conference on Web Engineering, Vienna
July 5-9, 2010
SWOC: Semantic Wonder Cloud
http://sisinflab.poliba.it/semantic-wonder-cloud/index/
10th International Conference on Web Engineering, Vienna
July 5-9, 2010
Q&A
a.ragone@poliba.it
Thanks for being here on Friday! :-)
http://sisinflab.poliba.it/not-only-tag/
http://sisinflab.poliba.it/semantic-wonder-cloud/index/
http://sisinflab.poliba.it/impakt-revolution/
10th International Conference on Web Engineering, Vienna
July 5-9, 2010
Conclusion
 NOT: a tool for smarter tagging
 Ranking algorithm for RDF graphs
Future work
 Test our algorithms with different domains
 Extract more fine grained contexts
 Enrich the extracted context using also relevant properties
 Integrate our approach with real existing systems
 Use the core system to automatically extract relevant tags
(concepts) from a document (or from a collection of
documents) exploiting tools for named entities extraction

Ranking the Linked Data: the case of DBpedia - ICWE 2010

  • 1.
    10th International Conferenceon Web Engineering, Vienna July 5-9, 2010 Ranking the LinkedData: the case of DBpedia Roberto Mirizzi1, Azzurra Ragone1,2, Tommaso Di Noia1, Eugenio Di Sciascio1 1Politecnico di Bari Via Orabona, 4 70125 Bari (ITALY) 2University of Trento Via Sommarive, 14 38100 Trento (ITALY)
  • 2.
    10th International Conferenceon Web Engineering, Vienna July 5-9, 2010 Outline • Tags are all around • NOT (Not Only Tag): what is it? • NOT a look behind the curtains: – Ranking of RDF resources: an hybrid approach • Evaluation • Conclusion and Future Work
  • 3.
    10th International Conferenceon Web Engineering, Vienna July 5-9, 2010 Tags are all around
  • 4.
    10th International Conferenceon Web Engineering, Vienna July 5-9, 2010 Tag cloud and many more…
  • 5.
    10th International Conferenceon Web Engineering, Vienna July 5-9, 2010 Tagging: a double face Annotation phase Retrieval phase
  • 6.
    10th International Conferenceon Web Engineering, Vienna July 5-9, 2010 Problems with annotation • Insert as much as possible tags (time consuming): – different versions of the same tag to catch all the possible searches – Multilingual tags
  • 7.
    10th International Conferenceon Web Engineering, Vienna July 5-9, 2010 Problem with retrieval • Exactly (syntactic) match among tags: web service is different from web services, webservices,…
  • 8.
    10th International Conferenceon Web Engineering, Vienna July 5-9, 2010 Why not to use Semantic tags? Plugged into the Web 3.0 Disambiguation Relations among tags Machine understandable NOT: Not Only Tag http://sisinflab.poliba.it/not-only-tag/
  • 9.
    10th International Conferenceon Web Engineering, Vienna July 5-9, 2010 Demo • Let’s imagine to tag the book:
  • 10.
    10th International Conferenceon Web Engineering, Vienna July 5-9, 2010 NOT http://sisinflab.poliba.it/not-only-tag/
  • 11.
    10th International Conferenceon Web Engineering, Vienna July 5-9, 2010 Smarter taggingAnnotationphaseRetrievalphase
  • 12.
    10th International Conferenceon Web Engineering, Vienna July 5-9, 2010 What is behind NOT? • DBpedia graph exploration • Computation of similarity value between each pair of RDF resources using external information sources (search engines, bookmarking systems)
  • 13.
    10th International Conferenceon Web Engineering, Vienna July 5-9, 2010 What is behind NOT? (II)
  • 14.
    10th International Conferenceon Web Engineering, Vienna July 5-9, 2010 What is behind NOT? (III)
  • 15.
    10th International Conferenceon Web Engineering, Vienna July 5-9, 2010 What is behind NOT? (IV) Semantic_Web XML-based_standards Knowledge_representation Data_management Internet_architecture Triplestores Folksonomy … … XML Computer_and_telecommunication_stantards Web_services User_interface_markup_languages Scalable_Vector_GraphicsMicroformats skos:subject skos:broaderCategoryArticle Legend …… … Resource Description Framework Microformat RDFa … …
  • 16.
    10th International Conferenceon Web Engineering, Vienna July 5-9, 2010 DBpedia-Ranker: hybrid ranking ?r1 ?r2 isSimilar v hasValue )( ),( )( ),( ),( 2 21 1 21 21 rf rrf rf rrf rrsim        viceversaandrandrbetweenwikilink,2 saor viceverrandrbetweenkwikilin,1 randrbetweenwikilinkno,0 ),( 21 21 21 21 rrorewikilinkSc )( ),( ),( 2 12 21 rl rrl rroreabstractSc  Graph-based ranking External sources-based ranking
  • 17.
    10th International Conferenceon Web Engineering, Vienna July 5-9, 2010 Functional Architecture Back-end Query engine Storage Cloud Generator GUI Ext.InfoSources DBpedia Lookup Service Delicious Yahoo! Bing Graph Explorer SPARQL Context Analyzer Ranker Offline computation Linked Data graph exploration Rank nodes exploiting external information Store results as pairs of nodes together with their similarity Runtime Search Start typing a tag Query the system for relevant tags (corresponding to DBpedia resources) Show the semantic tag cloud 1 2 3 1 2 3 1 Offlinecomputation 2 3 1 2 3 GoogleGoogle Runtimesearch
  • 18.
    10th International Conferenceon Web Engineering, Vienna July 5-9, 2010 Evaluation We evaluate five different algorithms: 1. DBpediaRanker 2. DBpediaRanker minus Wikipedia info 3. DBpediaRanker minus ext info sources 4. Co-occurrence 5. Similarity Distance ),()()( ),( ),( 2121 21 21 rrfrfrf rrf rrcoOcc     )}(log),(min{loglog ),(log)(log),(logmax ),( 21 2121 21 rfrfN rrfrfrf rrngd   
  • 19.
    10th International Conferenceon Web Engineering, Vienna July 5-9, 2010 Evaluation (II) http://sisinflab.poliba.it/evaluation  50 volunteers Researchers in the ICT area 244 votes collected (on average 5 votes for each users) Time to vote: 1min and 40secs
  • 20.
    10th International Conferenceon Web Engineering, Vienna July 5-9, 2010 Evaluation (III) http://sisinflab.poliba.it/evaluation/data 3.91 - Good
  • 21.
    10th International Conferenceon Web Engineering, Vienna July 5-9, 2010 Conclusion • NOT *is* useful in the annotation phase: – suggestions of semantically related tags – Tags enrichment • NOT *is* useful in the retrieval phase: – Semantic match among tags
  • 22.
    10th International Conferenceon Web Engineering, Vienna July 5-9, 2010 Future Work
  • 23.
    10th International Conferenceon Web Engineering, Vienna July 5-9, 2010 Impakt Revolution http://sisinflab.poliba.it/impakt-revolution/
  • 24.
    10th International Conferenceon Web Engineering, Vienna July 5-9, 2010 Inspiration: Google Wonder Wheel Exploratory Search in Google… …nice, but there is no “semantics” in it. You can not discover new knowledge exploiting the meaning of a term (keyword/tag/query)
  • 25.
    10th International Conferenceon Web Engineering, Vienna July 5-9, 2010 SWOC: Semantic Wonder Cloud http://sisinflab.poliba.it/semantic-wonder-cloud/index/
  • 26.
    10th International Conferenceon Web Engineering, Vienna July 5-9, 2010 Q&A a.ragone@poliba.it Thanks for being here on Friday! :-) http://sisinflab.poliba.it/not-only-tag/ http://sisinflab.poliba.it/semantic-wonder-cloud/index/ http://sisinflab.poliba.it/impakt-revolution/
  • 27.
    10th International Conferenceon Web Engineering, Vienna July 5-9, 2010 Conclusion  NOT: a tool for smarter tagging  Ranking algorithm for RDF graphs Future work  Test our algorithms with different domains  Extract more fine grained contexts  Enrich the extracted context using also relevant properties  Integrate our approach with real existing systems  Use the core system to automatically extract relevant tags (concepts) from a document (or from a collection of documents) exploiting tools for named entities extraction

Editor's Notes

  • #10  Cerca: owl Poi aggiungi rdf Poi aggiungi owl