Query modeling and information retrieval within              the Web of Data                  Cristian LAI                ...
OutlineG   MotivationG   UnStructured DataG   Structured DataG   Query buildingG   ApplicationsG   Conclusion             ...
ContextSemantic Webhttp://www.w3.org/2006/Talks/1023-sb-W3CTechSemWeb/                                                    ...
MotivationSearch on the Webhttp://www.slideshare.net/novaspivack/web-evolution-nova-spivack-twine                         ...
OutlineG   MotivationG   UnStructured DataG   Structured DataG   Query buildingG   ApplicationsG   Conclusion             ...
Wikipedia G   Started in 2001. G   Is a multilingual, web-based, free-content encyclopedia project based on     an openly ...
Wikipedia G   Pros:       H   Is a highly-efficient not-for-profit organization.       H   Is the finest example of truly col...
Issues G   UnStructured data, keywords based search. G   Simple questions are hard to answer.       H   People who were bo...
Structure in Wikipedia  G   Wikipedia articles consist mostly of free text, but also contain different      types of struc...
Structured Information in Wikipedia                          september 6, 2012   10 / 37
Structured Information in Wikipedia                          september 6, 2012   11 / 37
Structured Information in Wikipedia                          september 6, 2012   12 / 37
OutlineG   MotivationG   UnStructured DataG   Structured DataG   Query buildingG   ApplicationsG   Conclusion             ...
RDF representationKnowledge Basedbp:Cagliari rdf:type dbp:Citydbp:Cagliari dbp:Title "Cagliari"dbp:Cagliari dbp:Country db...
RDF G   Triples: (subject, predicate, object) G   Subject and object       H   are both URIs that each identify a resource...
DBpedia G   Started in 2007. G   Is the result of a community effort to extract structured information from     Wikipedia....
Nucleus of the Web of Data G   Within the W3C Linking Open Data (LOD) community effort. G   Tim Berners-Lee’s Linked Data ...
LOD Datasets               september 6, 2012   18 / 37
LOD Datasets               september 6, 2012   19 / 37
OutlineG   MotivationG   UnStructured DataG   Structured DataG   Query buildingG   ApplicationsG   Conclusion             ...
SPARQL Query Language G   RDF is a directed, labeled graph data format for representing information     (also in the Web)....
SPARQL Queries                        SELECT variables_list                            FROM < RDF_source_URL >            ...
The DBpedia SPARQL endpoint G   All data sets are available for queries via the DBpedia SPARQL endpoint     (http://dbpedi...
Abstracts of movies starring Tom Cruise, released before1999                                              SPARQL          ...
OutlineG   MotivationG   UnStructured DataG   Structured DataG   Query buildingG   ApplicationsG   Conclusion             ...
Linked Data Search Engines and Indexes G   A number of search engines have been developed that crawl Linked Data     from ...
Google rich snippets                       september 6, 2012   27 / 37
Twitter, #annotationsTwitter API based client                           september 6, 2012   28 / 37
Twitter, #annotationsLookup annotations                        september 6, 2012   29 / 37
Twitter, #annotationsResource #dbpedia:Cagliari                             september 6, 2012   30 / 37
Twitter, #annotationsResource #dbpedia:Cagliari                             september 6, 2012   31 / 37
Question answeringRisorsa Cagliari                     september 6, 2012   32 / 37
Question answeringTemplate                     september 6, 2012   33 / 37
Question answeringRDF/XML                     september 6, 2012   34 / 37
OutlineG   MotivationG   UnStructured DataG   Structured DataG   Query buildingG   ApplicationsG   Conclusion             ...
Conclusion G   Data on the Web is a major challenge; technologies are needed to use     them, to interact with them, to in...
Q&Aseptember 6, 2012   37 / 37
Upcoming SlideShare
Loading in …5
×

Seminario Cristian Lai, 06-09-2012

730 views

Published on

Il seminario presenta il tema emergente del Web of Data, nell'ambito del Semantic Web. Vengono esaminate le criticità incontrate nell'accedere all'enorme quantità di informazione presente attualmente nel Web e i vantaggi di un approccio basato sulla creazione interattiva di interrogazioni.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
730
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Seminario Cristian Lai, 06-09-2012

  1. 1. Query modeling and information retrieval within the Web of Data Cristian LAI clai@crs4.it CRS4 september 6, 2012 1 / 37
  2. 2. OutlineG MotivationG UnStructured DataG Structured DataG Query buildingG ApplicationsG Conclusion september 6, 2012 2 / 37
  3. 3. ContextSemantic Webhttp://www.w3.org/2006/Talks/1023-sb-W3CTechSemWeb/ september 6, 2012 3 / 37
  4. 4. MotivationSearch on the Webhttp://www.slideshare.net/novaspivack/web-evolution-nova-spivack-twine september 6, 2012 4 / 37
  5. 5. OutlineG MotivationG UnStructured DataG Structured DataG Query buildingG ApplicationsG Conclusion september 6, 2012 5 / 37
  6. 6. Wikipedia G Started in 2001. G Is a multilingual, web-based, free-content encyclopedia project based on an openly editable model. G Is the 5th site on the web and serves 454 million unique visitors monthly as of March 2011. G Has fewer than 100 employees. G Wikipedia holds an annual fundraiser instead of accepting advertising. You may have seen "A personal appeal from Wikipedia founder Jimmy Wales" if you’ve used the online encyclopedia during the last weeks of 2011. Google co-founder Sergey Brin and his wife, Anne Wojcicki, has given a 500,000 dollars grant to help Wikipedia fund its 28.3 million dollars annual budget. september 6, 2012 6 / 37
  7. 7. Wikipedia G Pros: H Is a highly-efficient not-for-profit organization. H Is the finest example of truly collaborative created content: >19M articles; >270 languages, >82k active contributors. H Covers many topics and domains, articles are a result of a community consensus. G Cons: H Contains many inconsistencies. G Disclaimer: Wikipedia cannot guarantee the validity of the information found here. H Is not very well integrated with other data sources. H Queries and search are not facilitated due to the lacks of structured representation. september 6, 2012 7 / 37
  8. 8. Issues G UnStructured data, keywords based search. G Simple questions are hard to answer. H People who were born in Rome before 1900. H Italian musicians with English and French descriptions. H The official websites of companies with more than 500 employees. G The information required to answer these is contained in Wikipedia. G Transforming Wikipedia into a knowledge base. H To reveal the structure and semantics of Wikipedia content H The DBpedia project. september 6, 2012 8 / 37
  9. 9. Structure in Wikipedia G Wikipedia articles consist mostly of free text, but also contain different types of structured information, such as infobox templates,categorisation information, images, geo-coordinates, and links to external Web pages. G Title G Abstract G Infobox Template G Geo-coordinates G Caegories G Images G Links H other language version H other Wikipedia pages H redirects H disambiguation september 6, 2012 9 / 37
  10. 10. Structured Information in Wikipedia september 6, 2012 10 / 37
  11. 11. Structured Information in Wikipedia september 6, 2012 11 / 37
  12. 12. Structured Information in Wikipedia september 6, 2012 12 / 37
  13. 13. OutlineG MotivationG UnStructured DataG Structured DataG Query buildingG ApplicationsG Conclusion september 6, 2012 13 / 37
  14. 14. RDF representationKnowledge Basedbp:Cagliari rdf:type dbp:Citydbp:Cagliari dbp:Title "Cagliari"dbp:Cagliari dbp:Country dbp:Italydbp:Cagliari dbp:postalCode 09100dbp:Cagliari geo:lat "39.246387"xsd:floatdbp:Cagliari geo:long "9.057500"xsd:floatdbp:Cagliari rdf:type yago:MediterraneanPortCitiesAndTownsInItaly... G An environment for collecting and structuring data. G Well defined structure of classification. september 6, 2012 14 / 37
  15. 15. RDF G Triples: (subject, predicate, object) G Subject and object H are both URIs that each identify a resource, or a URI and a string literal respectively. H G Predicate H specifies how the subject and object are related, and is also represented by a URI. G For example: H A knows B H C isAuthorOf D H Two resources linked in this fashion can be drawn from different data sets on the Web, allowing data in one data source to be linked to that in another, thereby creating a Web of Data. september 6, 2012 15 / 37
  16. 16. DBpedia G Started in 2007. G Is the result of a community effort to extract structured information from Wikipedia. G Makes Wikipedia data available as RDF. G Results: The DBpedia Data Set H describes 3.64 million "things" with over half a billion "facts" (July 2011), 364k persons, 462k places, 99k music albums, 54k films, 148k organisations; H extraction in 97 different languages; H 672M RDF triples G It is maintained by: Universität Leipzig, Freie Universität Berlin, OpenLink Software, Inc. G See http://wiki.dbpedia.org/Team september 6, 2012 16 / 37
  17. 17. Nucleus of the Web of Data G Within the W3C Linking Open Data (LOD) community effort. G Tim Berners-Lee’s Linked Data principles. H URI H HTTP H RDF, SPARQL H Interlinking among data providers G An increasing number of data providers have started to publish and interlink data on the Web. G Several billion RDF triples and covers domains such as geographic information, people, companies, online communities, films, music, books and scientific publications. september 6, 2012 17 / 37
  18. 18. LOD Datasets september 6, 2012 18 / 37
  19. 19. LOD Datasets september 6, 2012 19 / 37
  20. 20. OutlineG MotivationG UnStructured DataG Structured DataG Query buildingG ApplicationsG Conclusion september 6, 2012 20 / 37
  21. 21. SPARQL Query Language G RDF is a directed, labeled graph data format for representing information (also in the Web). G SPARQL is a language for querying RDF graphs by specifying templates against which to compare graph components. Data which matches or satisfies a template is returned from the query. G A triple template contains variables that represent triplet components (e.g., ?s, ?p, or ?o within a triplet). G Example: H ?person ex:age "20"xsd:integer . H Identifies a list of triplet subjects that have an ex:age property of "20". Analogous to asking "Who has age 20?". H The SPARQL query engine will return a list of the subject component of triples that satisfy each query through value substitution. september 6, 2012 21 / 37
  22. 22. SPARQL Queries SELECT variables_list FROM < RDF_source_URL > WHERE { { triple_pattern_1 . . . . triple_pattern_n . }. } SELECT ?person FROM < http://ex.com > ?person WHERE { ------------------ ?person ex:age "20"xsd:integer . _p1 } _p2 . . . september 6, 2012 22 / 37
  23. 23. The DBpedia SPARQL endpoint G All data sets are available for queries via the DBpedia SPARQL endpoint (http://dbpedia.org/sparql). G Querying the data set: H ... H Abstracts of movies starring Tom Cruise, released before 1999. H The official websites of companies with more than 50000 employees. H Cities with more than 2 million habitants. H ... september 6, 2012 23 / 37
  24. 24. Abstracts of movies starring Tom Cruise, released before1999 SPARQL SELECT ?subject ?label ?released ?abstract WHERE { ?subject rdf:type <http://dbpedia.org/ontology/Film>. ?subject dbpedia2:starring <http://dbpedia.org/resource/Tom_Cruise>. ?subject rdfs:comment ?abstract. ?subject rdfs:label ?label. FILTER(lang(?abstract) = "en" && lang(?label) = "en"). ?subject <http://dbpedia.org/ontology/releaseDate> ?released. FILTER(xsd:date(?released) < "2000-01-01"^^xsd:date). } ORDER BY ?released september 6, 2012 24 / 37
  25. 25. OutlineG MotivationG UnStructured DataG Structured DataG Query buildingG ApplicationsG Conclusion september 6, 2012 25 / 37
  26. 26. Linked Data Search Engines and Indexes G A number of search engines have been developed that crawl Linked Data from the Web by following RDF links, and provide query capabilities over aggregated data. Tom Heath and Christian Bizer (2011) Linked Data: Evolving the Web into a Global Data Space (1st edition). Synthesis Lectures on the Semantic Web: Theory and Technology, 1:1, 1-136. Morgan & Claypool. G Google, Bing and Yahoo! agree to create and support a common vocabulary for structured data markup on web pages. G Facebook has started to support RDF and Linked Data URIs and now provides access to parts of its user data via a Linked Data API. september 6, 2012 26 / 37
  27. 27. Google rich snippets september 6, 2012 27 / 37
  28. 28. Twitter, #annotationsTwitter API based client september 6, 2012 28 / 37
  29. 29. Twitter, #annotationsLookup annotations september 6, 2012 29 / 37
  30. 30. Twitter, #annotationsResource #dbpedia:Cagliari september 6, 2012 30 / 37
  31. 31. Twitter, #annotationsResource #dbpedia:Cagliari september 6, 2012 31 / 37
  32. 32. Question answeringRisorsa Cagliari september 6, 2012 32 / 37
  33. 33. Question answeringTemplate september 6, 2012 33 / 37
  34. 34. Question answeringRDF/XML september 6, 2012 34 / 37
  35. 35. OutlineG MotivationG UnStructured DataG Structured DataG Query buildingG ApplicationsG Conclusion september 6, 2012 35 / 37
  36. 36. Conclusion G Data on the Web is a major challenge; technologies are needed to use them, to interact with them, to integrate them. G Semantic Web technologies (RDF, SPARQL, etc.) can play a major role in publishing and using Data on the Web. G Users can largely benefit from the wide world of structured content. G Content providers joining the Linking Open Data project are contributing to create more meaningful navigation paths not only within websites but across the whole web. september 6, 2012 36 / 37
  37. 37. Q&Aseptember 6, 2012 37 / 37

×