Semantic Web - Introduction


Published on

Published in: Education, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • This is just a generic slide set. Should be adapted, reviewed, possibly with slides removed, for a specific event. Rule of thumb: on the average, a slide is a minute…
  • Semantic Web - Introduction

    1. 1. What is the Semantic Web? for Codecamp 2009 Kyiv, Ukraine 2009-01-15, Amsterdam, The Netherlands Ivan Herman, W3C ReadWriteWeb Microformats
    2. 2. Let’s organize a trip to Budapest using the Web!
    3. 3. You try to find a proper flight with …
    4. 4. … a big, reputable airline, or …
    5. 5. … the airline of the target country, or …
    6. 6. … or a low cost one
    7. 7. You have to find a hotel, so you look for…
    8. 8. … a really cheap accommodation, or …
    9. 9. … or a really luxurious one, or …
    10. 10. … and intermediate one …
    11. 11. oops, that is no good, the page is in Hungarian that almost nobody understands, but…
    12. 12. … this one could work
    13. 13. Of course, you could decide to trust a specialized site…
    14. 14. … like this one, or…
    15. 15. … or this one
    16. 16. You may want to know something about Budapest; look for some photographs…
    17. 17. … on flickr …
    18. 18. … on Google …
    19. 19. … or you can look at mine
    20. 20. but you can also look at a (social) travel site
    21. 21. What happened here? <ul><li>You had to consult a large number of sites, all different in style, purpose, possibly language… </li></ul><ul><li>You had to mentally integrate all those information to achieve your goals </li></ul><ul><li>We all know that, sometimes, this is a long and tedious process! </li></ul>
    22. 22. <ul><li>All those pages are only tips of respective icebergs: </li></ul><ul><ul><li>the real data is hidden somewhere in databases, XML files, Excel sheets, … </li></ul></ul><ul><ul><li>you have only access to what the Web page designers allow you to see </li></ul></ul>
    23. 23. <ul><li>Specialized sites (Expedia, TripAdvisor) do a bit more: </li></ul><ul><ul><li>they gather and combine data from other sources (usually with the approval of the data owners) </li></ul></ul><ul><ul><li>but they still control how you see those sources </li></ul></ul><ul><li>But sometimes you want to personalize: access the original data and combine it yourself! </li></ul>
    24. 24. Another example: social sites. I have a list of “friends” by…
    25. 25. … Dopplr,
    26. 26. … Twine,
    27. 27. … LinkedIn,
    28. 28. … and, of course, the ubiquitous Facebook
    29. 29. <ul><li>I had to type in and connect with friends again and again for each site independently </li></ul><ul><li>This is even worse then before: I feed the icebergs, but I still do not have an easy access to data… </li></ul>
    30. 30. What would we like to have? <ul><li>Use the data on the Web the same way as we do with documents: </li></ul><ul><ul><li>be able to link to data (independently of their presentation) </li></ul></ul><ul><ul><li>use that data the way I want (present it, mine it, etc) </li></ul></ul><ul><ul><li>agents, programs, scripts, etc, should be able to interpret part of that data </li></ul></ul>
    31. 31. But wait! Isn’t what mashup sites are already doing?
    32. 32. A “mashup” example:
    33. 33. <ul><li>In some ways, yes, and that shows the huge power of what such Web of data provides </li></ul><ul><li>But mashup sites are forced to do very ad-hoc jobs </li></ul><ul><ul><li>various data sources expose their data via Web Services </li></ul></ul><ul><ul><li>each with a different API, a different logic, different structure </li></ul></ul><ul><ul><li>these sites are forced to reinvent the wheel many times because there is no standard way of doing things </li></ul></ul>
    34. 34. Let us put it together <ul><li>What we need for a Web of Data: </li></ul><ul><ul><li>use URI-s to publish data, not only full documents </li></ul></ul><ul><ul><li>allow the data to link to other data </li></ul></ul><ul><ul><li>characterize/classify the data and the links (the “terms”) to convey some extra meaning </li></ul></ul><ul><ul><li>and use standards for all these! </li></ul></ul>
    35. 35. So What is the Semantic Web?
    36. 36. It is a collection of standard technologies to realize a Web of Data WWW -> GGG (Giant Global Graph)
    37. 37. <ul><li>It is that simple… </li></ul><ul><li>Of course, the devil is in the details </li></ul><ul><ul><li>a common model has to be provided for machines to describe, query, etc, the data and their connections </li></ul></ul><ul><ul><li>the “classification” of the terms can become very complex for specific knowledge areas: this is where ontologies, thesauri, etc, enter the game… </li></ul></ul><ul><ul><li>but these details are fleshed out by experts as we speak! </li></ul></ul>
    38. 38. Towards a Semantic Web <ul><li>The current Web represents information using </li></ul><ul><ul><li>natural language (English, Hungarian, Chinese,…) </li></ul></ul><ul><ul><li>graphics, multimedia, page layout </li></ul></ul><ul><li>Humans can process this easily </li></ul><ul><ul><li>can deduce facts from partial information </li></ul></ul><ul><ul><li>can create mental associations </li></ul></ul><ul><ul><li>are used to various sensory information </li></ul></ul><ul><ul><ul><li>(well, sort of… people with disabilities may have serious problems on the Web with rich media!) </li></ul></ul></ul>
    39. 39. Towards a Semantic Web <ul><li>Tasks often require to combine data on the Web: </li></ul><ul><ul><li>hotel and travel information may come from different sites </li></ul></ul><ul><ul><li>searches in different digital libraries </li></ul></ul><ul><ul><li>etc. </li></ul></ul><ul><li>Again, humans combine these information easily </li></ul><ul><ul><li>even if different terminologies are used! </li></ul></ul>
    40. 40. However… <ul><li>However: machines are ignorant! </li></ul><ul><ul><li>partial information is unusable </li></ul></ul><ul><ul><li>difficult to make sense from, e.g., an image </li></ul></ul><ul><ul><li>drawing analogies automatically is difficult </li></ul></ul><ul><ul><li>difficult to combine information automatically </li></ul></ul><ul><ul><ul><li>is <foo:creator> same as <bar:author> ? </li></ul></ul></ul><ul><ul><li>… </li></ul></ul>
    41. 41. Example: automatic airline reservation <ul><li>Your automatic airline reservation </li></ul><ul><ul><li>knows about your preferences </li></ul></ul><ul><ul><li>builds up knowledge base using your past </li></ul></ul><ul><ul><li>can combine the local knowledge with remote services: </li></ul></ul><ul><ul><ul><li>airline preferences </li></ul></ul></ul><ul><ul><ul><li>dietary requirements </li></ul></ul></ul><ul><ul><ul><li>calendaring </li></ul></ul></ul><ul><ul><ul><li>etc </li></ul></ul></ul><ul><li>It communicates with remote information </li></ul><ul><ul><li>(M. Dertouzos: The Unfinished Revolution) </li></ul></ul>
    42. 42. What is needed? <ul><li>(Some) data should be available for machines for further processing </li></ul><ul><li>Data should be possibly combined, merged on a Web scale </li></ul><ul><li>Sometimes, data may describe other data… </li></ul><ul><li>… but sometimes the data is to be exchanged by itself, like my calendar or my travel preferences </li></ul><ul><li>Machines may also need to reason about that data </li></ul>
    43. 43. The rough structure of data integration <ul><li>Map the various data onto an abstract data representation </li></ul><ul><ul><li>make the data independent of its internal representation… </li></ul></ul><ul><li>Merge the resulting representations </li></ul><ul><li>Start making queries on the whole! </li></ul><ul><ul><li>queries not possible on the individual data sets </li></ul></ul>
    44. 44. A simplified bookstore data (dataset “A”)
    45. 45. 1 st : export your data as a set of relations
    46. 46. Some notes on the exporting the data <ul><li>Data export does not necessarily mean physical conversion of the data </li></ul><ul><ul><li>relations can be generated on-the-fly at query time </li></ul></ul><ul><ul><ul><li>via SQL “bridges” </li></ul></ul></ul><ul><ul><ul><li>scraping HTML pages </li></ul></ul></ul><ul><ul><ul><li>extracting data from Excel sheets </li></ul></ul></ul><ul><ul><ul><li>etc. </li></ul></ul></ul><ul><li>One can export part of the data </li></ul>
    47. 47. Another bookstore data (dataset “F”)
    48. 48. 2 nd : export your second set of data
    49. 49. 3 rd : start merging your data
    50. 50. 3 rd : start merging your data (cont.)
    51. 51. 3 rd : merge identical resources
    52. 52. Start making queries… <ul><li>User of data “F” can now ask queries like: </li></ul><ul><ul><li>“give me the title of the original” </li></ul></ul><ul><li>This information is not in the dataset “F”… </li></ul><ul><li>…but can be retrieved by merging with dataset “A”! </li></ul>
    53. 53. However, more can be achieved… <ul><li>We “feel” that a:author and f:auteur should be the same </li></ul><ul><li>But an automatic merge does not know that! </li></ul><ul><li>Let us add some extra information to the merged data: </li></ul><ul><ul><li>a:author same as f:auteur </li></ul></ul><ul><ul><li>both identify a “Person” </li></ul></ul><ul><ul><li>a term that a community may have already defined: </li></ul></ul><ul><ul><ul><li>a “Person” is uniquely identified by his/her name and, say, homepage </li></ul></ul></ul><ul><ul><ul><li>it can be used as a “category” for certain type of resources </li></ul></ul></ul>
    54. 54. 3 rd revisited: use the extra knowledge
    55. 55. Start making richer queries! <ul><li>User of dataset “F” can now query: </li></ul><ul><ul><li>“give me the home page of the original’s author” </li></ul></ul><ul><li>The information is not in datasets “F” or “A”… </li></ul><ul><li>…but was made available by: </li></ul><ul><ul><li>merging datasets “A” and datasets “F” </li></ul></ul><ul><ul><li>adding three simple extra statements as an extra “glue” </li></ul></ul>
    56. 56. Combine with different datasets <ul><li>Via, e.g., the “Person”, the dataset can be combined with other sources </li></ul><ul><li>For example, data in Wikipedia can be extracted using dedicated tools </li></ul>
    57. 57. Merge with Wikipedia data
    58. 58. Merge with Wikipedia data
    59. 59. Merge with Wikipedia data
    60. 60. It could become even more powerful <ul><li>We could add extra knowledge to the merged datasets </li></ul><ul><ul><li>e.g., a full classification of various types of library data </li></ul></ul><ul><ul><li>geographical information </li></ul></ul><ul><ul><li>etc. </li></ul></ul><ul><li>This is where ontologies , extra rules , etc, come in </li></ul><ul><ul><li>ontologies/rule sets can be relatively simple and small, or huge, or anything in between… </li></ul></ul><ul><li>Even more powerful queries can be asked as a result </li></ul>
    61. 61. Simple SPARQL example SELECT ?isbn ?price ?currency # note: not ?x! WHERE {?isbn a:price ?x. ?x rdf:value ?price. ?x p:currency ?currency.}
    62. 62. Simple SPARQL example <ul><li>Returns: [[<..49X>,33,£], [<..49X>,50,€], [<..6682>,60,€], [<..6682>,78,$]] </li></ul>SELECT ?isbn ?price ?currency # note: not ?x! WHERE {?isbn a:price ?x. ?x rdf:value ?price. ?x p:currency ?currency.}
    63. 63. Pattern constraints SELECT ?isbn ?price ?currency # note: not ?x! WHERE { ?isbn a:price ?x. ?x rdf:value ?price. ?x p:currency ?currency. FILTER(?currency == € } <ul><li>Returns: [[<..409X>,50,€], [<..6682>,60,€]] </li></ul>
    64. 64. What did we do? (cont)
    65. 65. The network effect <ul><li>Through URI-s we can link any data to any data </li></ul><ul><li>The “network effect” is extended to the (Web) data </li></ul><ul><li>“Mashup on steroids” become possible </li></ul>
    66. 66. Semantic Web technologies stack
    67. 67. Yahoo’s SearchMonkey <ul><li>Search results may be customized via small applications using content metadata in, eg, RDFa </li></ul><ul><li>Users can customize their search pages </li></ul>
    68. 68. Linking Open Data Project <ul><li>Goal: “expose” open datasets in RDF </li></ul><ul><li>Set RDF links among the data items from different datasets </li></ul><ul><li>Billions triples, millions of “links” </li></ul>
    69. 69. DBpedia: Extracting structured data from Wikipedia Kolkata < Kolkata > dbpedia:native_name “Kolkata (Calcutta)”@en; dbpedia:altitude “9”; dbpedia:populationTotal “4580544”; dbpedia:population_metro “14681589”; geo:lat “22.56970024108887”^^xsd:float; ...
    70. 70. Automatic links among open datasets DBpedia Geonames Processors can switch automatically from one to the other… < Kolkata > owl:sameAs <>; ... <> owl:sameAs <> wgs84_pos:lat “22.5697222”; wgs84_pos:long “88.3697222”; sws:population “4631392” ...
    71. 71. Faviki: social bookmarking with Wiki tagging <ul><li>Tag bookmarks via Wikipedia terms/DBpedia URIs </li></ul><ul><li>Helps disambiguating tag usage </li></ul>
    72. 72. Lots of Tools ( not an exhaustive list!) <ul><li>Categories: </li></ul><ul><ul><li>Triple Stores </li></ul></ul><ul><ul><li>Inference engines </li></ul></ul><ul><ul><li>Converters </li></ul></ul><ul><ul><li>Search engines </li></ul></ul><ul><ul><li>Middleware </li></ul></ul><ul><ul><li>CMS </li></ul></ul><ul><ul><li>Semantic Web browsers </li></ul></ul><ul><ul><li>Development environments </li></ul></ul><ul><ul><li>Semantic Wikis </li></ul></ul><ul><ul><li>… </li></ul></ul><ul><li>Some names: </li></ul><ul><ul><li>Jena, AllegroGraph, Mulgara, Sesame, flickurl, … </li></ul></ul><ul><ul><li>TopBraid Suite, Virtuoso environment, Falcon, Drupal 7, Redland, Pellet, … </li></ul></ul><ul><ul><li>Disco, Oracle 11g, RacerPro, IODT, Ontobroker, OWLIM, Tallis Platform, … </li></ul></ul><ul><ul><li>RDF Gateway, RDFLib, Open Anzo, DartGrid, Zitgist, Ontotext, Protégé, … </li></ul></ul><ul><ul><li>Thetus publisher, SemanticWorks, SWI-Prolog, RDFStore… </li></ul></ul><ul><ul><li>… </li></ul></ul>
    73. 73. Application patterns <ul><li>It is fairly difficult to “categorize” applications (there are always overlaps) </li></ul><ul><li>With this caveat, some of the application patterns: </li></ul><ul><ul><li>data integration (ie, integrating data from major databases) </li></ul></ul><ul><ul><li>intelligent (specialized) portals (with improved local search based on vocabularies and ontologies) </li></ul></ul><ul><ul><li>content and knowledge organization </li></ul></ul><ul><ul><li>knowledge representation, decision support </li></ul></ul><ul><ul><li>X2X integration (often combined with Web Services) </li></ul></ul><ul><ul><li>data registries, repositories </li></ul></ul><ul><ul><li>collaboration tools (eg, social network applications) </li></ul></ul>
    74. 74. Microformats currently supported <ul><li>hCalendar – Putting Event & Todo data on the web (iCalendar) </li></ul><ul><li>hCard – electronic business card/self-identification (vCard </li></ul><ul><li>rel-license – To declare licenses for content </li></ul><ul><ul><li>Example: <a rel=&quot;license&quot; href=&quot;;> </li></ul></ul><ul><li>rel-tag – Allow authors to assign keywords to stuff. </li></ul><ul><ul><li>Example: <a rel=&quot;tag&quot; href=&quot;tagspace/tag&quot;>...</a> </li></ul></ul><ul><li>VoteLinks </li></ul><ul><li>XFN Distributed Social Networks (“XHTML Friends Network “) </li></ul><ul><ul><li>Example: <a rel=&quot;friend met&quot; href=&quot;;>Molly Holzschlag</a> </li></ul></ul><ul><li>XOXO - eXtensible Open XHTML Outlines (you are looking at one!) </li></ul>
    75. 75. Microformats coming in the not-so-distant future <ul><li>adr - for marking up address information </li></ul><ul><li>geo - for marking up geographic coordinates (latitude; longitude) </li></ul><ul><li>hAtom - format to standardize feeds/syndicating episodic content (e.g. weblog postings) </li></ul><ul><li>hAudio </li></ul><ul><li>hProduct </li></ul><ul><li>hRecipe </li></ul><ul><li>hResume - for publishing resumes and CVs </li></ul>
    76. 76. Microformats coming in the not-so-distant future (contd) <ul><li>hReview -Publishing reviews of products, events, people, etc </li></ul><ul><li>rel-directory - distributed directory building </li></ul><ul><li>rel-enclosure - for indicating attachments (e.g. files) to download and cache </li></ul><ul><li>rel-home - indicate a hyperlink to the homepage of the site </li></ul><ul><li>rel-payment - indicate a payment mechanism </li></ul><ul><li>xFolk </li></ul>
    77. 77. Semantic Web <ul><li>Machines talking to machines </li></ul><ul><li>Making the Web more 'intelligent’ </li></ul><ul><li>Tim Berners-Lee: computers &quot;analyzing all the data on the Web‚ the content, links, and transactions between people and computers.” </li></ul><ul><li>Bottom Up = annotate, metadata, RDF! </li></ul><ul><li>Top Down = Simple </li></ul>Image credit: dullhunk <ul><li>Top-down: </li></ul><ul><li>Leverage existing web information </li></ul><ul><li>Apply specific, vertical semantic knowledge </li></ul><ul><li>Deliver the results as a consumer-centric web app </li></ul>
    78. 78. Semantic Apps <ul><li>What is a Semantic App? </li></ul><ul><li>- Not necessarily W3C Semantic Web </li></ul><ul><li>An app that determines the meaning of text and other data, and then creates connections for users </li></ul><ul><li>Data portability and connectibility are keys (ref: Nova Spivack) </li></ul>Example: Calais Reuters, the international business and financial news giant, launched an API called Open Calais in Feb 08. The API does a semantic markup on unstructured HTML documents - recognizing people, places, companies, and events. Ref: Reuters Wants The World To Be Tagged ; Alex Iskold, ReadWriteWeb, Feb 08
    79. 79. Top 10 Semantic Web Products of 2008 <ul><li>Yahoo! SearchMonkey </li></ul><ul><li>Powerset </li></ul><ul><ul><li>SearchMonkey allows developers to build applications on top of Yahoo! search, including allowing site owners to share structured data with Yahoo!, using semantic markup (microformats, RDF), standardized XML feeds, APIs (OpenSearch or other web services), and page extraction. </li></ul></ul><ul><ul><li>Powerset (see our initial coverage here and here) is a natural language search engine. It's fair to say that Powerset has had a great 2008, having been acquired by Microsoft in July this year. </li></ul></ul><ul><ul><li>(acquired by Microsoft in '08) </li></ul></ul>
    80. 80. Top 10 Semantic Web Products of 2008 <ul><li>Open Calais (Thomson Reuters) </li></ul><ul><li>Calais - a toolkit of products that enable users to incorporate semantic functionality within blog, content management system, website or application. </li></ul><ul><li>Dapper MashupAds </li></ul><ul><li>serve up a banner ad that's related to whatever movie this page happens to be about. </li></ul>
    81. 81. Top 10 Semantic Web Products of 2008 <ul><li>BooRah </li></ul><ul><li>BooRah is a restaurant review sit. BooRah uses semantic analysis and natural language processing to aggregate reviews from food blogs. Because of this, BooRah can recognize praise and criticism in these reviews and then rates restaurants accordingly. </li></ul><ul><li>BlueOrganizer (AdaptiveBlue) </li></ul><ul><li>AdaptiveBlue are makers of the Firefox plugin, BlueOrganizer.The basic idea behind is that it gives you added information about webpages you visit and offers useful links based on the subject matter. </li></ul>
    82. 82. Top 10 Semantic Web Products of 2008 <ul><li>Hakia </li></ul><ul><li>- a search engine focusing on natural language processing methods to try and deliver 'meaningful' search results. Hakia attempts to analyze the concept of a search query, in particular by doing sentence analysis. </li></ul><ul><li>TripIt </li></ul><ul><li>Tripit is an app that manages your travel planning. </li></ul>
    83. 83. Top 10 Semantic Web Products of 2008 <ul><li>Zemanta </li></ul><ul><li>Zemanta is a blogging tool to add relevant content to your posts. Users can now incorporate their own social networks, RSS feeds, and photos into their blog posts. </li></ul><ul><li>UpTake </li></ul><ul><li>Semantic search startup UpTake (formerly Kango) aims to make the process of booking travel online easier. Hotels and activities - over 400,000 of them - from more than 1,000 different travel sites. Over 20 million reviews, opinions. </li></ul>
    84. 84. Thanks! [email_address] Credits: * 2009-01-15, What is the Semantic Web? (in 15 minutes), Ivan Herman, ISOC New Years Reception in Amsterdam, the Netherlands * 2008-09-24, Introduction to the Semantic Web (tutorial) Ivan Herman, 2nd European Semantic Technology Conference in Vienna, Austria * ReadWriteWeb - Web Technology Trends for 2008 and Beyond (, 10 best semantic applications * Microformats (