Semantic Web - Introduction

What is the Semantic Web? www.pleso.net for Codecamp 2009 Kyiv, Ukraine 2009-01-15, Amsterdam, The Netherlands Ivan Herman, W3C ReadWriteWeb Microformats

Let’s organize a trip to Budapest using the Web!

You try to find a proper flight with …

… a big, reputable airline, or …

… the airline of the target country, or …

You have to find a hotel, so you look for…

… a really cheap accommodation, or …

… or a really luxurious one, or …

oops, that is no good, the page is in Hungarian that almost nobody understands, but…

Of course, you could decide to trust a specialized site…

You may want to know something about Budapest; look for some photographs…

but you can also look at a (social) travel site

What happened here? You had to consult a large number of sites, all different in style, purpose, possibly language… You had to mentally integrate all those information to achieve your goals We all know that, sometimes, this is a long and tedious process!

All those pages are only tips of respective icebergs: the real data is hidden somewhere in databases, XML files, Excel sheets, … you have only access to what the Web page designers allow you to see

Specialized sites (Expedia, TripAdvisor) do a bit more: they gather and combine data from other sources (usually with the approval of the data owners) but they still control how you see those sources But sometimes you want to personalize: access the original data and combine it yourself!

Another example: social sites. I have a list of “friends” by…

… and, of course, the ubiquitous Facebook

I had to type in and connect with friends again and again for each site independently This is even worse then before: I feed the icebergs, but I still do not have an easy access to data…

What would we like to have? Use the data on the Web the same way as we do with documents: be able to link to data (independently of their presentation) use that data the way I want (present it, mine it, etc) agents, programs, scripts, etc, should be able to interpret part of that data

But wait! Isn’t what mashup sites are already doing?

In some ways, yes, and that shows the huge power of what such Web of data provides But mashup sites are forced to do very ad-hoc jobs various data sources expose their data via Web Services each with a different API, a different logic, different structure these sites are forced to reinvent the wheel many times because there is no standard way of doing things

Let us put it together What we need for a Web of Data: use URI-s to publish data, not only full documents allow the data to link to other data characterize/classify the data and the links (the “terms”) to convey some extra meaning and use standards for all these!

So What is the Semantic Web?

It is a collection of standard technologies to realize a Web of Data WWW -> GGG (Giant Global Graph)

It is that simple… Of course, the devil is in the details a common model has to be provided for machines to describe, query, etc, the data and their connections the “classification” of the terms can become very complex for specific knowledge areas: this is where ontologies, thesauri, etc, enter the game… but these details are fleshed out by experts as we speak!

Towards a Semantic Web The current Web represents information using natural language (English, Hungarian, Chinese,…) graphics, multimedia, page layout Humans can process this easily can deduce facts from partial information can create mental associations are used to various sensory information (well, sort of… people with disabilities may have serious problems on the Web with rich media!)

Towards a Semantic Web Tasks often require to combine data on the Web: hotel and travel information may come from different sites searches in different digital libraries etc. Again, humans combine these information easily even if different terminologies are used!

However… However: machines are ignorant! partial information is unusable difficult to make sense from, e.g., an image drawing analogies automatically is difficult difficult to combine information automatically is <foo:creator> same as <bar:author> ? …

Example: automatic airline reservation Your automatic airline reservation knows about your preferences builds up knowledge base using your past can combine the local knowledge with remote services: airline preferences dietary requirements calendaring etc It communicates with remote information (M. Dertouzos: The Unfinished Revolution)

What is needed? (Some) data should be available for machines for further processing Data should be possibly combined, merged on a Web scale Sometimes, data may describe other data… … but sometimes the data is to be exchanged by itself, like my calendar or my travel preferences Machines may also need to reason about that data

The rough structure of data integration Map the various data onto an abstract data representation make the data independent of its internal representation… Merge the resulting representations Start making queries on the whole! queries not possible on the individual data sets

A simplified bookstore data (dataset “A”)

1 st : export your data as a set of relations

Some notes on the exporting the data Data export does not necessarily mean physical conversion of the data relations can be generated on-the-fly at query time via SQL “bridges” scraping HTML pages extracting data from Excel sheets etc. One can export part of the data

Another bookstore data (dataset “F”)

2 nd : export your second set of data

3 rd : start merging your data

3 rd : start merging your data (cont.)

3 rd : merge identical resources

Start making queries… User of data “F” can now ask queries like: “give me the title of the original” This information is not in the dataset “F”… …but can be retrieved by merging with dataset “A”!

However, more can be achieved… We “feel” that a:author and f:auteur should be the same But an automatic merge does not know that! Let us add some extra information to the merged data: a:author same as f:auteur both identify a “Person” a term that a community may have already defined: a “Person” is uniquely identified by his/her name and, say, homepage it can be used as a “category” for certain type of resources

3 rd revisited: use the extra knowledge

Start making richer queries! User of dataset “F” can now query: “give me the home page of the original’s author” The information is not in datasets “F” or “A”… …but was made available by: merging datasets “A” and datasets “F” adding three simple extra statements as an extra “glue”

Combine with different datasets Via, e.g., the “Person”, the dataset can be combined with other sources For example, data in Wikipedia can be extracted using dedicated tools

It could become even more powerful We could add extra knowledge to the merged datasets e.g., a full classification of various types of library data geographical information etc. This is where ontologies , extra rules , etc, come in ontologies/rule sets can be relatively simple and small, or huge, or anything in between… Even more powerful queries can be asked as a result

Simple SPARQL example SELECT ?isbn ?price ?currency # note: not ?x! WHERE {?isbn a:price ?x. ?x rdf:value ?price. ?x p:currency ?currency.}

Simple SPARQL example Returns: [[<..49X>,33,£], [<..49X>,50,€], [<..6682>,60,€], [<..6682>,78,$]] SELECT ?isbn ?price ?currency # note: not ?x! WHERE {?isbn a:price ?x. ?x rdf:value ?price. ?x p:currency ?currency.}

Pattern constraints SELECT ?isbn ?price ?currency # note: not ?x! WHERE { ?isbn a:price ?x. ?x rdf:value ?price. ?x p:currency ?currency. FILTER(?currency == € } Returns: [[<..409X>,50,€], [<..6682>,60,€]]

The network effect Through URI-s we can link any data to any data The “network effect” is extended to the (Web) data “Mashup on steroids” become possible

Semantic Web technologies stack

Yahoo’s SearchMonkey Search results may be customized via small applications using content metadata in, eg, RDFa Users can customize their search pages

Linking Open Data Project Goal: “expose” open datasets in RDF Set RDF links among the data items from different datasets Billions triples, millions of “links”

DBpedia: Extracting structured data from Wikipedia http://en.wikipedia.org/wiki/ Kolkata <http://dbpedia.org/resource/ Kolkata > dbpedia:native_name “Kolkata (Calcutta)”@en; dbpedia:altitude “9”; dbpedia:populationTotal “4580544”; dbpedia:population_metro “14681589”; geo:lat “22.56970024108887”^^xsd:float; ...

Automatic links among open datasets DBpedia Geonames Processors can switch automatically from one to the other… <http://dbpedia.org/resource/ Kolkata > owl:sameAs <http://sws.geonames.org/1275004/>; ... <http://sws.geonames.org/1275004/> owl:sameAs <http://DBpedia.org/resource/Kolkata> wgs84_pos:lat “22.5697222”; wgs84_pos:long “88.3697222”; sws:population “4631392” ...

Faviki: social bookmarking with Wiki tagging Tag bookmarks via Wikipedia terms/DBpedia URIs Helps disambiguating tag usage

Lots of Tools ( not an exhaustive list!) Categories: Triple Stores Inference engines Converters Search engines Middleware CMS Semantic Web browsers Development environments Semantic Wikis … Some names: Jena, AllegroGraph, Mulgara, Sesame, flickurl, … TopBraid Suite, Virtuoso environment, Falcon, Drupal 7, Redland, Pellet, … Disco, Oracle 11g, RacerPro, IODT, Ontobroker, OWLIM, Tallis Platform, … RDF Gateway, RDFLib, Open Anzo, DartGrid, Zitgist, Ontotext, Protégé, … Thetus publisher, SemanticWorks, SWI-Prolog, RDFStore… …

Application patterns It is fairly difficult to “categorize” applications (there are always overlaps) With this caveat, some of the application patterns: data integration (ie, integrating data from major databases) intelligent (specialized) portals (with improved local search based on vocabularies and ontologies) content and knowledge organization knowledge representation, decision support X2X integration (often combined with Web Services) data registries, repositories collaboration tools (eg, social network applications)

Microformats currently supported hCalendar – Putting Event & Todo data on the web (iCalendar) hCard – electronic business card/self-identification (vCard rel-license – To declare licenses for content Example: <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/2.0/"> rel-tag – Allow authors to assign keywords to stuff. Example: <a rel="tag" href="tagspace/tag">...</a> VoteLinks XFN Distributed Social Networks (“XHTML Friends Network “) Example: <a rel="friend met" href="http://molly.com">Molly Holzschlag</a> XOXO - eXtensible Open XHTML Outlines (you are looking at one!)

Microformats coming in the not-so-distant future adr - for marking up address information geo - for marking up geographic coordinates (latitude; longitude) hAtom - format to standardize feeds/syndicating episodic content (e.g. weblog postings) hAudio hProduct hRecipe hResume - for publishing resumes and CVs

Microformats coming in the not-so-distant future (contd) hReview -Publishing reviews of products, events, people, etc rel-directory - distributed directory building rel-enclosure - for indicating attachments (e.g. files) to download and cache rel-home - indicate a hyperlink to the homepage of the site rel-payment - indicate a payment mechanism xFolk

Semantic Web Machines talking to machines Making the Web more 'intelligent’ Tim Berners-Lee: computers "analyzing all the data on the Web‚ the content, links, and transactions between people and computers.” Bottom Up = annotate, metadata, RDF! Top Down = Simple Image credit: dullhunk Top-down: Leverage existing web information Apply specific, vertical semantic knowledge Deliver the results as a consumer-centric web app

Semantic Apps What is a Semantic App? - Not necessarily W3C Semantic Web An app that determines the meaning of text and other data, and then creates connections for users Data portability and connectibility are keys (ref: Nova Spivack) Example: Calais Reuters, the international business and financial news giant, launched an API called Open Calais in Feb 08. The API does a semantic markup on unstructured HTML documents - recognizing people, places, companies, and events. Ref: Reuters Wants The World To Be Tagged ; Alex Iskold, ReadWriteWeb, Feb 08

Top 10 Semantic Web Products of 2008 Yahoo! SearchMonkey Powerset SearchMonkey allows developers to build applications on top of Yahoo! search, including allowing site owners to share structured data with Yahoo!, using semantic markup (microformats, RDF), standardized XML feeds, APIs (OpenSearch or other web services), and page extraction. Powerset (see our initial coverage here and here) is a natural language search engine. It's fair to say that Powerset has had a great 2008, having been acquired by Microsoft in July this year. (acquired by Microsoft in '08)

Top 10 Semantic Web Products of 2008 Open Calais (Thomson Reuters) Calais - a toolkit of products that enable users to incorporate semantic functionality within blog, content management system, website or application. Dapper MashupAds serve up a banner ad that's related to whatever movie this page happens to be about.

Top 10 Semantic Web Products of 2008 BooRah BooRah is a restaurant review sit. BooRah uses semantic analysis and natural language processing to aggregate reviews from food blogs. Because of this, BooRah can recognize praise and criticism in these reviews and then rates restaurants accordingly. BlueOrganizer (AdaptiveBlue) AdaptiveBlue are makers of the Firefox plugin, BlueOrganizer.The basic idea behind is that it gives you added information about webpages you visit and offers useful links based on the subject matter.

Top 10 Semantic Web Products of 2008 Hakia - a search engine focusing on natural language processing methods to try and deliver 'meaningful' search results. Hakia attempts to analyze the concept of a search query, in particular by doing sentence analysis. TripIt Tripit is an app that manages your travel planning.

Top 10 Semantic Web Products of 2008 Zemanta Zemanta is a blogging tool to add relevant content to your posts. Users can now incorporate their own social networks, RSS feeds, and photos into their blog posts. UpTake Semantic search startup UpTake (formerly Kango) aims to make the process of booking travel online easier. Hotels and activities - over 400,000 of them - from more than 1,000 different travel sites. Over 20 million reviews, opinions.

Thanks! http://www.pleso.net/ [email_address] Credits: * 2009-01-15, What is the Semantic Web? (in 15 minutes), Ivan Herman, ISOC New Years Reception in Amsterdam, the Netherlands * 2008-09-24, Introduction to the Semantic Web (tutorial) Ivan Herman, 2nd European Semantic Technology Conference in Vienna, Austria * ReadWriteWeb - Web Technology Trends for 2008 and Beyond (http://www.readwriteweb.com/), 10 best semantic applications * Microformats (http://microformats.org/)

Semantic Web - Introduction

More Related Content

What's hot

Viewers also liked

Similar to Semantic Web - Introduction

More from Oleksandr Pryymak

Recently uploaded

Semantic Web - Introduction

Editor's Notes