Hack U Barcelona 2011


Published on

Very brief intro to Semantic Web and BOSS for a Yahoo! Hack U event at UPC in Barcelona, Spain.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Hack U Barcelona 2011

  1. 1. Fun with the Semantic Web<br />Peter Mika<br />Yahoo! Research Barcelona<br />pmika@yahoo-inc.com<br />
  2. 2. Vague, but exciting… Berners-Lee and the dawn of the Web<br />
  3. 3. Semantic Web<br />Publish data on the Web<br />Linked Data: a web of data instead of web ofdocuments<br />Query databases over the Web<br />Two main architectural challenges<br />A common format for sharing data<br />Sharing the meaning of data<br />Semantic Web standards from W3C<br />Data and schema languages (RDF, OWL, RIF)<br />Document formats (RDF/XML, RDFa)<br />Protocols (SPARQL, HTTP)<br />Semantic Web research into knowledge representation and reasoning, data integration, data quality and many other topics<br />Community efforts to publish data and develop schemas (Linked Data)<br />
  4. 4. RDF (Resource Description Framework)<br />The basic data model of the Semantic Web<br />A universal model to capture all sorts of data: networks, relational, object-oriented…<br />Basic unit of information is a triple <br />A tuple of (subject, predicate, object)<br />Example: (Joe, loves, Mary)<br />Each triple gives the value of a property for a given resource or relates two objects to one another<br />Object is either a resource or a literal<br />An RDF model is a set of triples<br />Ordering of statements in an RDF document is irrelevant (unlike XML)<br />
  5. 5. Resources vs. literals<br />Resources are identified by a URI or otherwise the are called a blank node<br />URIs are a generalization of URLs<br />Notation: <http://www.example.org/Person> or ex:Person<br />Literals have an optional language and datatype (string, integer etc.)<br />Literals can not be subjects of statements<br />Datatypes are identified by URIs, e.g. XML Schema datatypes<br />Two literals are the same if their components are the same<br />Notation: “Joe B.” or Joe@en^^http://…#string<br />
  6. 6. Graphical and textual notation<br />foaf:Person<br />type<br />my:Joe<br />name<br />“Joe A.”<br />A number of ways to serialize an RDF model into an RDF document<br />RDF/XML, Turtle, N3, N-Triples<br />Example: http://www.cs.vu.nl/~pmika/foaf.rdf<br />
  7. 7. RDF is designed for the Web<br />URIs provide web-wide global identification across datasets<br />A resource may be described by multiple documents<br />We know it’s the same resource because the same URI is used or through reasoning (advanced topic…)<br />URIs are intented to be reused<br />Unique, but not single identifiers: two URIs may denote the same thing<br />URIs are dereferencable (can be retrieved)<br />A well-behaved URI returns a description of the resource <br />Provides authority: the definition of foaf:Person lives at that URI<br />Ontologies can be looked up as well<br />Typically at the root of the URIs, also known as the namespace<br />Example: http://xmlns.com/foaf/0.1/Person redirects to the specification<br />
  8. 8. URIs implicitly link data together <br />(#joe, #loves, #mary)<br />(#joe, #name, “Joe A.”)<br />(#joe, #email, mailto:joe@joe.com)<br />A dating site<br />(#mary, name, “Mary B.”)<br />(#mary, gender, “female”)<br />Joe’s homepage<br />Mary’s homepage<br />(#name, #type, #Property)<br />(#name, #domain, #Person)<br />Schema doc<br />
  9. 9. Put together, triples form a single ‘global’ graph<br />“Joe A.”<br />#name<br />#joe<br />#email<br />“joe@joe.com”<br />#loves<br />“Mary B.”<br />#name<br />#mary<br />#gender<br />“female”<br />
  10. 10. Linked Data<br />Open your data<br />Publish it in RDF, the lingua franca of the data web<br />Data first, schema second<br />Worry about linking, data integration later… someone else can do it for you!<br />Optionally, provide query access using the SPARQL query language and protocol<br />Powerful, SQL-like query language<br />HTTP or SOAP protocol to communicate with SPARQL servers <br />
  11. 11. Linked Data cloud: interlinked RDF datasets on the Web<br />http://linkeddata.org/<br />
  12. 12. Dbpedia<br />Dbpedia is dataset that contains much of the structured data in Wikipedia<br />Data from the info-boxes<br />Links between Wikipedia pages<br />Categories<br />Disambiguation and redirect pages<br />Links to other datasets<br />
  13. 13. Fetching individual resources<br />Use your web browser<br />http://dbpedia.org/resource/Yahoo redirects to http://dbpedia.org/page/Yahoo<br />You can plug in this URI into other Linked Data browsers<br />HTTP GET to fetch data<br />Using curl: add Accept: application/rdf+xmlfor RDF and enable redirect<br />curl -L -H 'Accept:application/rdf+xml' 'http://dbpedia.org/resource/Berlin’<br />Data dumps<br />http://wiki.dbpedia.org/Datasets<br />
  14. 14. Querying using SPARQL<br />Interactive query builders<br />SPARQL Explorer: http://dbpedia.org/snorql/<br />Examples at: http://wiki.dbpedia.org/OnlineAccess<br />Using HTTP GET<br />GET /sparql/?query=EncodedQuery HTTP/1.1<br />Example:<br />curl 'http://dbpedia.org/sparql?query=SELECT%20%3Ffilm%20WHERE%20%7B%20%3Ffilm%20%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Fterms%2Fsubject%3E%20%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FCategory%3AFrench_films%3E%20%7D’<br />Result type is an XML document<br />
  15. 15. More data<br />New York Times<br />http://data.nytimes.com/<br />Example URI:<br />http://data.nytimes.com/60694995023816375851<br />Also supports JSON<br />Append .json or set Accept:text/javascript<br />Freebase<br />http://freebase.com<br />Example URI <br />http://rdf.freebase.com/rdf/en.tron_legacy<br />Data dump<br />http://download.freebase.com<br />
  16. 16. And more data…<br />Geonames: open geo data<br />Geonames.org<br />http://sws.geonames.org/5130561/<br />Download: <br />http://www.geonames.org/export/<br />Open Government data efforts<br />Data.gov: <br />Data.gov.uk<br />http://data.gov.uk/sparql<br />
  17. 17. Spanish open gov’t data and linked data efforts<br />Spanish open data efforts<br />La AsociaciónEspañola de Linked Data (AELID) <br />http://aelid.es/<br />ProyectoAporta<br />aporta.es<br />Regional/local efforts<br />risp.asturias.es (RDF, SPARQL)<br />datos.zaragoza.es (RDF, SPARQL)<br />opendata.euskadi.net (RDF)<br />dadesobertes.gencat.cat (RDF)<br />Competition AbreDatos 2010<br />abredatos.es<br />
  18. 18. More info<br />Segaran et al.: Programming the Semantic Web, O’Reilly, 2010.<br />linkeddata.org<br />W3C Semantic Web Activity<br />Presentations, guides etc.<br />RDF Primer<br />http://www.w3.org/TR/2004/REC-rdf-primer-20040210/<br />SPARQL query language and protocol specs<br />http://www.w3.org/TR/rdf-sparql-protocol/<br />http://www.w3.org/TR/rdf-sparql-query/<br />Search SlideShare etc. for more intro material<br />
  19. 19. Build your Own Search Service (BOSS)<br />Peter Mika<br />Yahoo! Research Barcelona<br />pmika@yahoo-inc.com<br />
  20. 20. Innovate with Search!<br />It’s really simple…<br />Example: <br />pay $0.0008 for a query, earn $0.01 per query<br />100,000 users a day, each making 1 query a day<br />Earn $920 dollars a day!<br />
  21. 21. Yahoo BOSS: Yahoo’s Search API<br />Ability to re-order results and blend-in addition content<br />No restrictions on presentation<br />No branding or attribution<br />Access to multiple verticals (web search, image, news)<br />Spelling suggestions<br />40+ supported language and region pairs<br />Pricing (BOSS)<br />10,000 free queries a day<br />Pay for more queries<br />Serve any ads you want<br />For more info, http://developer.yahoo.com/search/boss/<br />New in BOSS v2<br />Powered by Bing<br />Retrieve ads from Yahoo! and earn money ;)<br />
  22. 22. Using BOSS<br />Simple HTTP GET calls, no authentication<br />Get an Application ID at <br />http://developer.yahoo.com/search/boss/<br />Example:<br />http://boss.yahooapis.com/ysearch/web/v1/{query}?appid={appid}&format=xml<br />http://boss.yahooapis.com/ysearch/spelling/v1/{query}?appid={appid}&format=xml<br />Documentation<br />http://developer.yahoo.com/search/boss/boss_guide/<br />
  23. 23. Queries you can play with<br />Yahoo!’s WebScope program <br />Data sharing with universities and research institutions <br />Some of the most exciting data that we have!<br />Request access online<br />http://webscope.sandbox.yahoo.com/<br />Requires approval by Department Chair<br />For HackU, you can sign up here for access to a dataset containing real world user queries<br />Yahoo! Search Tiny Sample v1.0: a set of 4,500 queries<br />Ideal for testing and demonstrating your search-based apps<br />Can you really show something interesting for all these users?<br />