Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Linked Data Tutorial


Published on

This tutorial explains the Data Web vision, some preliminary standards and technologies as well as some tools and technological building blocks developed by AKSW research group from Universität Leipzig.

Published in: Technology
  • merci
    Are you sure you want to  Yes  No
    Your message goes here
  • Great Intro Preso on how to marry the traditional DB with Semantic Info
    Are you sure you want to  Yes  No
    Your message goes here
  • That is very excellent presentation
    Are you sure you want to  Yes  No
    Your message goes here
  • excellent presentation,
    Are you sure you want to  Yes  No
    Your message goes here
  • is the place to resolve the price problem. Buy now and make a deal for you.
    Are you sure you want to  Yes  No
    Your message goes here

Linked Data Tutorial

  1. 1. From Document Web to a Web of Linked Data Dr. S ö ren Auer AKSW, Institut f ü r Informatik
  2. 2. Overview <ul><li>The Linked Data Web Vision </li></ul><ul><li>Data Web Technologies </li></ul><ul><li>Publishing relational data on the Web </li></ul><ul><li>DBpedia – transforming Wikipedia into a knowledge base </li></ul><ul><li>OntoWiki – an Linked Data Wiki </li></ul><ul><li>Open Street Maps – linked open geo data </li></ul>Linked Data Tutorial
  3. 3. From the Document Web to the Linked Open Data Web (and beyond) Linked Data Tutorial <ul><li>Web (since 1992) </li></ul><ul><li>HTTP </li></ul><ul><li>HTML/CSS/JavaScript </li></ul><ul><li>Semantic Web (Vision 1998, starting ???) </li></ul><ul><li>Reasoning </li></ul><ul><li>Logic, Rules </li></ul><ul><li>Trust </li></ul><ul><li>Social Web (since 2003) </li></ul><ul><li>Folksonomies/Tagging </li></ul><ul><li>Reputation, sharing </li></ul><ul><li>Groups, relationships </li></ul><ul><li>Data Web (since 2006) </li></ul><ul><li>URI de-referencability </li></ul><ul><li>CBD </li></ul><ul><li>RDF serializations </li></ul>
  4. 4. Conceptual Level Data Access and Integration Linked Data Tutorial <ul><li>Object-relational mappings (ORM) </li></ul><ul><li>NeXT’s EOF / WebObjects </li></ul><ul><li>ADO.NET Entity Framework </li></ul><ul><li>Hibernate </li></ul><ul><li>Entity-attribute-value (EAV) </li></ul><ul><li>HELP medical record system, TrialDB </li></ul><ul><li>Column-oriented DBMS </li></ul><ul><li>Collocates column values rather than row values </li></ul><ul><li>Vertica, C-Store, MonetDB </li></ul><ul><li>Data Web </li></ul><ul><li>URIs as entity identifiers </li></ul><ul><li>HTTP as data access protocol </li></ul><ul><li>Local-As-View (LAV) </li></ul><ul><li>RDBMS </li></ul><ul><li>Organize data in relations, rows, cells </li></ul><ul><li>Oracle, DB2, MS-SQL </li></ul><ul><li>Triple/Quad Stores </li></ul><ul><li>RDF data model </li></ul><ul><li>Virtuoso, Oracle, Sesame </li></ul>Data Models <ul><li>Others </li></ul><ul><li>XML, hierachical, tree, graph-oriented DBMS </li></ul><ul><li>Procedural APIs </li></ul><ul><li>ODBC </li></ul><ul><li>JDBC </li></ul>Data Access <ul><li>Query Languages </li></ul><ul><li>Datalog, SQL </li></ul><ul><li>SPARQL </li></ul><ul><li>XPATH/XQuery </li></ul>Data Integration <ul><li>Linked Data </li></ul><ul><li>de-referencable URIs </li></ul><ul><li>RDF serialization formats </li></ul>Enterprise Information Integration sets of heterogeneous data sources appear as a single, homogeneous data source <ul><li>Data Warehousing </li></ul><ul><li>Based on extract, transform load (ETL) </li></ul><ul><li>Global-As-View (GAV) </li></ul>Research Mediators Ontology-based P2P Web service-based
  5. 5. Web 1.0 Web 2.0 Web 3.0 Many Web sites containing unstructured, textual content Few large Web sites are specialized on specific content types Many Web sites containing & semantically syndicating arbitrarily structured content Pictures Video Encyclopedic articles + + Linked Data Tutorial
  6. 6. The Long Tail of Information Domains Pictures News Video Recipes Calendar Currently supported structured content types SemWeb supported structured content Gene sequences Itinerary of King George Talent management Popularity Not or insufficiently supported content types The Long Tail by Chris Anderson ( Wired , Oct. ´ 04) adopted to information domains … … Requirements- Engineering … … Special interest communities Linked Data Tutorial
  7. 7. Why Do We Need Another Web? <ul><li>Try to search for these things on the current Web: </li></ul><ul><li>Apartments near German-French bilingual childcare in Leipzig. </li></ul><ul><li>ERP service providers with offices in Vienna and Berlin. </li></ul><ul><li>Researchers working on DB related topics in south-east Asia. </li></ul><ul><li>Information to answer such search queries is available on the Web, but opaque to current Web search . </li></ul><ul><li>(Semantic) Data Web allows to complement text on Web pages with structured data and to intelligently combine and integrate such structured information from different sources: </li></ul>Web server Web server Linked Data Tutorial Has everything about childcare in L.e. Knows all about real estate offers in Germany DB Web server DB Web server Search engine HTML HTML RDF RDF
  8. 8. Overview <ul><li>The Linked Data Web Vision </li></ul><ul><li>Data Web Technologies </li></ul><ul><li>Publishing relational data on the Web </li></ul><ul><li>DBpedia – transforming Wikipedia into a knowledge base </li></ul><ul><li>OntoWiki – an Linked Data Wiki </li></ul><ul><li>Virtuoso – Knowledge Store </li></ul><ul><li>Open Street Maps – free and open geo data </li></ul>Linked Data Tutorial
  9. 9. RDF - Resource Description Framework <ul><li>Distinguishes two fundamental base types : </li></ul><ul><li>Resources </li></ul><ul><li>Complex abstract or concret entities </li></ul><ul><li>Uniquely identified by an URI: </li></ul><ul><ul><li> </li></ul></ul><ul><li>Literals </li></ul><ul><li>concrete data values </li></ul><ul><li>Optionally typed (e.g. xsl:string , xsl:dateTime etc.) or language (e.g. en , de ): </li></ul><ul><ul><li>&quot; 2008-05-31T09:30:00 &quot; ^^xsd:dateTime </li></ul></ul><ul><ul><li>&quot; Wien &quot; @ &quot; de &quot; </li></ul></ul>Linked Data Tutorial
  10. 10. RDF Statement / Triple Paradigm <ul><li>RDF/XML: </li></ul><ul><li><?xml version=&quot;1.0&quot;?> </li></ul><ul><li>< rdf:RDF </li></ul><ul><li>xmlns=&quot;; xmlns:dc=&quot;;> </li></ul><ul><li>< Description about =&quot; &quot;> </li></ul><ul><li>< dc:Creator >Sö ren Auer < /DC:Creator > </li></ul><ul><li></Description > </li></ul><ul><li></rdf:RDF> </li></ul>Linked Data Tutorial Sö ren Auer dc:creator Subject (Resource) Predicate (Resource) Object (Resource/Literal) RDF/N3: &quot;Sö ren Auer “
  11. 11. RDF Document / Model / Graph <ul><ul><li>Simple Knowledge Base </li></ul></ul><ul><ul><li>Combines multiple RDF Statements </li></ul></ul>Linked Data Tutorial [email_address] dc:Creator Sö ren Auer foaf:Email foaf:Name
  12. 12. RDF Serialization <ul><li><?xml version=&quot;1.0&quot;?> </li></ul><ul><li>< rdf:RDF </li></ul><ul><li>xmlns=&quot;; xmlns:dc=&quot;;> </li></ul><ul><li>< rdf:Description about=&quot;;> </li></ul><ul><li><dc:Creator> </li></ul><ul><li>< rdf:Description> </li></ul><ul><li>< rdf:Description about=&quot;;> </li></ul><ul><li><dc:Name>Sö ren Auer </dc:Name> </li></ul><ul><li><dc:Email></dc:Email> </li></ul><ul><li>< /rdf:Description > </li></ul><ul><li></dc:Creator> </li></ul><ul><li>< /rdf:Description > </li></ul><ul><li>< /rdf:RDF > </li></ul>Linked Data Tutorial &quot;Sö ren Auer &quot; [email_address] [email_address] Creator Sö ren Auer Email Name
  13. 13. RDF Schema <ul><li>Restrict combinations of resources / literals </li></ul><ul><li>Structuring of vocabularies </li></ul><ul><li>Instantiation / classification </li></ul><ul><li>Provisioning of special resources: </li></ul><ul><li>Classes (concepts, frames) </li></ul><ul><li>Attributes (properties, slots, roles) </li></ul><ul><li>Instances (objects) </li></ul>Linked Data Tutorial 16.11.2007 dc:creator ?
  14. 14. RDF-S Class & Property Hierarchies <ul><li>Beer rdf:type rdfs:Class </li></ul><ul><li>BottomFermentedBeer rdfs:subClassOf Beer </li></ul><ul><li>Bock rdfs:subClassOf BottomFermentedBeer </li></ul><ul><li>Lager rdfs:subClassOf BottomFermentedBeer </li></ul><ul><li>Pilsner rdfs:subClassOf BottomFermentedBeer </li></ul>Linked Data Tutorial hasContent rdf:type rdfs:Property hasAlcoholicContent rdfs:subPropertyOf Beer hasOriginalWortContent rdfs:subClassOf BottomFermentedBeer
  15. 15. RDF-S Properties <ul><li>… are defined and used independently from classes </li></ul><ul><li>Domain: Association with one or multiple classes </li></ul><ul><li>Range: defines values the property can assume </li></ul><ul><ul><li>Instances of a certain class </li></ul></ul><ul><ul><li>literals typed with a certain XML schema data type </li></ul></ul>Linked Data Tutorial hasAlcoholicContent rdf:type owl:DatatypeProperty hasAlcoholicContent rdf:type owl:FunctionalProperty hasAlcoholicContent rdfs:domain Beer hasAlcoholicContent rdfs:range xsd:float hasAlcoholicContent rdfs:subPropertyOf hasContent brews rdf:type owl:ObjectProperty brews rdfs:domain  Brewery brews rdfs:range Beer
  16. 16. RDF-S Instances <ul><li>Are associated to one (or multiple) class(es) : </li></ul>Linked Data Tutorial Boddingtons rdf:type Ale Grafentrunk rdf:type Bock Hoegaarden rdf:type White Jever rdf:type Pilsner
  17. 17. Semantic Web Layer Cake Linked Data Tutorial
  18. 18. Linked Data - Paradigm <ul><li>Use URIs as names for things </li></ul><ul><li>Use HTTP URIs so that people can look up those names. </li></ul><ul><li>When someone looks up a URI, provide useful information. </li></ul><ul><li>Include links to other URIs. so that they can discover more things. </li></ul>
  19. 19. Linked Data – Publishing RDF <ul><li>De-referenceable RDF-URIs, e.g.: </li></ul><ul><li>Different HTTP response depending on HTTP-Accept-Header </li></ul>Linked Data Tutorial
  20. 20. Benefits of using the RDF Data Model in the Linked Data Context <ul><li>Clients can look up every URI in an RDF graph over the Web to retrieve additional information. </li></ul><ul><li>Information from different sources merges naturally. </li></ul><ul><li>The data model enables you to set RDF links between data from different sources. </li></ul><ul><li>The data model allows you to represent information that is expressed using different schemata in a single model. </li></ul><ul><li>Combined with schema languages such as RDF-S or OWL, the data model allows you to use as much or as little structure as you need, meaning that you can represent tightly structured data as well as semi-structured data. </li></ul>Linked Data Tutorial
  21. 21. Linking Open Data (LOD) Cloud Linked Data Tutorial
  22. 22. Data Web Moving Targets <ul><li>Base technologies (RDF, SPARQL, HTTP etc.) are developed, standardized and ready to use </li></ul><ul><li>Big issues: </li></ul><ul><li>Scalability </li></ul><ul><li>User interfaces </li></ul><ul><li>Search engines </li></ul><ul><li>Business models </li></ul><ul><li>(Reasoning) </li></ul>Linked Data Tutorial
  23. 23. Data Web Business Models <ul><li>Advertisement (page view) based businesses will probably not be first movers  </li></ul><ul><li>Large Web companies will probably not be first movers  </li></ul><ul><li>Data Web should focus on fragmented markets with many players which require widest distribution of information , e.g. realtors, online shops, transportation service providers, public information, geo data etc. </li></ul>Linked Data Tutorial
  24. 24. Overview <ul><li>The Linked Data Web Vision </li></ul><ul><li>Data Web Technologies </li></ul><ul><li>Publishing relational data on the Web </li></ul><ul><li>DBpedia – transforming Wikipedia into a knowledge base </li></ul><ul><li>OntoWiki – an Linked Data Wiki </li></ul><ul><li>Open Street Maps – free and open geo data </li></ul>Linked Data Tutorial
  25. 25. Triplify Motivation <ul><li>growth of semantic representations still outpaced by the traditional Web </li></ul><ul><li>overcome the chicken-and-egg dilemma of missing semantic representations and search facilities on the Web </li></ul><ul><li>Triplify leverages relational representations behind existing Web applications: </li></ul><ul><ul><li>often open-source, deployed hundred thousand times </li></ul></ul><ul><ul><li>structure and semantics encoded in relational database schemes (behind Web apps) is not accessible to Web search engines, mashups etc. </li></ul></ul>Linked Data Tutorial Monthly Web application downloads at Sourceforge
  26. 26. Triplify Big Picture Linked Data Tutorial
  27. 27. Triplify Approach: Simplicity <ul><li>Expose semantics as simple as possible </li></ul><ul><ul><li>No (new) mapping languages </li></ul></ul><ul><ul><li>Few lines of code – easy to plug-in </li></ul></ul><ul><ul><li>Simple, reusable configurations </li></ul></ul><ul><li>Available for most popular Web app languages </li></ul><ul><ul><li>PHP (ready), Ruby/Python under development </li></ul></ul><ul><li>Works with most popular Web app DBs </li></ul><ul><ul><li>MySQL (extensively tested), PHP-PDO DBs (SQLite, Oracle, DB2, MS SQL, PostgreSQL etc.) should work, not needed for Virtuoso  </li></ul></ul><ul><li>Triplify exposes RDF/Ntriples, LinkedData and RDF/JSON </li></ul>Linked Data Tutorial
  28. 28. Triplify Solution: SQL-SELECT queries map relational data to RDF <ul><li>Triplify Configuration: </li></ul><ul><li>number of  SQL queries selecting information, which should be made publicly available. </li></ul><ul><li>Special SQL query result structure required (in order to convert results into RDF: </li></ul><ul><li>first column must contain identifiers for generating instance URIs (i.e. the primary key of DB table) </li></ul><ul><li>column names are used to generate property URIs , renaming columns allows to reuse properties from existing vocabularies such as Dublin Core, FOAF, SIOC </li></ul><ul><ul><li>e.g. SELECT id, name AS ' foaf:name ' FROM users </li></ul></ul><ul><li>individual cells contain data values or references to other instances (eventually constitute the objects of resulting triples) </li></ul>Linked Data Tutorial
  29. 29. Example: Wordpress Blog Posts <ul><li>Associate the URL path fragment 'post‘ with a number of SQL patterns: </li></ul><ul><li> </li></ul><ul><li>SELECT  id, post_author  AS 'sioc:has_creator->user' , post_title  AS 'dc:title', post_content  AS 'sioc:content', post_date  AS 'dcterms:modified^^xsd:dateTime‘, post_modified  AS 'dcterms:created^^xsd:dateTime' </li></ul><ul><li>FROM  posts </li></ul><ul><li>WHERE  post_status='publish‘ ( AND id=xxx) </li></ul><ul><li>SELECT  post_id id, tag_label  AS 'tag:taggedWithTag‘ </li></ul><ul><li>FROM  post2tag INNER JOIN tag ON( post2tag.tag_id=tag.tag_id ) </li></ul><ul><li>( WHERE  id=xxx) </li></ul><ul><li>SELECT  post_id id, category_id  AS 'belongsToCategory->category‘ </li></ul><ul><li>FROM  post2cat </li></ul><ul><li>( WHERE  id=xxx) </li></ul>Linked Data Tutorial Object property Datatype property 1 2 3
  30. 30. RDF Conversion Linked Data Tutorial sioc:has_creator dc:title “New DBpedia release” sioc:content “Today we released …” dcterms:modified “20081020T1635”^^xsd:dateTime dcterms:created “20081020T1635”^^xsd:dateTime tag:taggedWithTag “DBpedia” tag:taggedWithTag “Release” belongsToCategory 1 2 3 id post_author post_title post_content post_date post_modified 1 5 New DBpedia release Today we released … 200810201635 200810201635 id tag:taggedWithTag 1 DBpedia 1 Release .. id belogsToCategory 1 34 …
  31. 31. Example Config <ul><li><?php include('../wp-config.php'); $triplify['namespaces'] =array(     'vocabulary'=>'',     'foaf'=>'', … ); $triplify['queries'] =array(     'post'=>array(         &quot; SELECT  id,post_author 'sioc:has_creator->user',post_date 'dcterms:created',post_title 'dc:title', post_content 'sioc:content',                 post_modified 'dcterms:modified‘ FROM  {$table_prefix}posts WHERE post_status='publish'&quot;,         &quot; SELECT  post_id id,tag_id 'tag:taggedWithTag'  FROM  {$table_prefix}post2tag&quot;,         &quot; SELECT  post_id id,category_id 'belongsToCategory'  FROM  {$table_prefix}post2cat&quot;,     ),     'tag'=>&quot; SELECT  tag_ID id,tag 'tag:tagName'  FROM  {$table_prefix}tags&quot;,     'category'=>&quot; SELECT  cat_ID id,cat_name 'skos:prefLabel',category_parent 'skos:narrower'  FROM  {$table_prefix}categories&quot;,     'user'=>array(         &quot; SELECT  id,user_login 'foaf:accountName', SHA(CONCAT ('mailto:',user_email)) 'foaf:mbox_sha1sum',                 user_url 'foaf:homepage',display_name 'foaf:name' FROM  {$table_prefix}users&quot;,         &quot; SELECT  user_id id,meta_value 'foaf:firstName'  FROM  {$table_prefix}usermeta  WHERE  meta_key='first_name'&quot;,         &quot; SELECT  user_id id,meta_value 'foaf:family_name'  FROM  {$table_prefix}usermeta  WHERE  meta_key='last_name'&quot;,     ),     'comment'=>&quot; SELECT  comment_ID id,comment_post_id 'sioc:reply_of',comment_author  AS  'foaf:name',              SHA(CONCAT ('mailto:',comment_author_email)) 'foaf:mbox_sha1sum', comment_author_url 'foaf:homepage', </li></ul><ul><li> comment_date  AS   'dcterms:created', comment_content 'sioc:content',comment_karma,comment_type          FROM  {$table_prefix}comments  WHERE  comment_approved='1'&quot;, ); $triplify['objectProperties'] =array(     'sioc:has_creator'=>'user', 'tag:taggedWithTag'=>'tag', 'belongsToCategory'=>'category‘,'skos:narrower'=>'category','sioc:reply_of'=>'post'); $triplify['classMap'] =array('user'=>'foaf:person', 'post'=>'sioc:Post', 'tag'=>'tag:Tag', 'category'=>'skos:Concept'); $triplify['TTL'] =0; // Caching $triplify['db'] =new PDO('mysql:host='.DB_HOST.';dbname='.DB_NAME,DB_USER,DB_PASSWORD); </li></ul><ul><li>?> </li></ul>Linked Data Tutorial
  32. 32. Triplify Temporal Extension <ul><li>Problem: How do next generation search engines know something changed on the Data Web? </li></ul><ul><li>Different solutions: </li></ul><ul><li>Try to crawl always everything : currently deployed on the Web </li></ul><ul><li>Ping a central update notification service: – will probably not scale if the Data Web gets really deployed </li></ul><ul><li>Each linked data endpoint publishes an update log: Triplify Update Logs </li></ul>Linked Data Tutorial
  33. 33. Triplify Temporal Extension <ul><li> </li></ul><ul><li> rdf:type update:UpdateCollection . </li></ul><ul><li> rdf:type update:UpdateCollection . </li></ul><ul><li> </li></ul><ul><li> rdf:type update:UpdateCollection . </li></ul><ul><li> rdf:type update:UpdateCollection . </li></ul><ul><li>Nesting continues until we finally reach an URL, which exposes all updates performed in a certain second in time… </li></ul><ul><li> </li></ul><ul><li> </li></ul><ul><li>update:updatedResource ; </li></ul><ul><li>update:updatedAt &quot;20080101T17:58:06&quot;^<xsd:dateTime> ; </li></ul><ul><li>update:updatedBy . </li></ul>Linked Data Tutorial special update path and vocabulary
  34. 34. Triplify Spatial Extension <ul><li>How to publish geo-data using Triplify? </li></ul><ul><li>OpenStreetMaps – 160 GB Geo Data lots of POIs – hotels, gas stations, universities … </li></ul><ul><li>,16.359722/1000/Hotel </li></ul><ul><li> </li></ul><ul><li> </li></ul><ul><li> </li></ul>Linked Data Tutorial Lon Lat Radius Tag
  35. 35. RDB2RDF tool comparison Linked Data Tutorial More at: Tool Triplify R2DQ Virtuoso RDF Views Technology Scripting languages (PHP) Java Whole middleware solution SPARQL endpoint - X X Mapping language SQL RDF based RDF based Mapping generation Manual Semi-automatic Manual Scalability Medium-high (but no SPARQL) medium High
  36. 36. Marrying DBs with RDF & Ontologies <ul><li>Using DBs for storage and querying of RDF & ontologies </li></ul>Linked Data Tutorial Publishing DB content as RDF Relational Databases RDF & Ontologies Data Model Relational (tables, columns, rows) Triples (subject, predicate, object) Schema and data separation   Implicit information   Scalability   Schema flexibility   Web data integration readiness  
  37. 37. Overview <ul><li>The Linked Data Web Vision </li></ul><ul><li>Data Web Technologies </li></ul><ul><li>Publishing relational data on the Web </li></ul><ul><li>DBpedia – transforming Wikipedia into a knowledge base </li></ul><ul><li>OntoWiki – an Linked Data Wiki </li></ul><ul><li>Open Street Maps – free and open geo data </li></ul>Linked Data Tutorial
  38. 38. Transforming Wikipedia into a Knowledge base <ul><li>☺ Wikipedia is the 8th most popular website (according to </li></ul><ul><li>☺ Maybe the finest example of truly collaboratively created content (>8M articles in >200 languages written by >300.000 authors) </li></ul><ul><li>☺ Covers all possible topics and domains, articles are a result of a “community consensus” </li></ul><ul><li>Θ Many inconsistencies can be found on different pages/language versions </li></ul><ul><li>Θ Not very well integrated with other data sources </li></ul><ul><li>Θ Lacks structured representations of content which facilitate querying and search </li></ul><ul><li>Simple Questions – hard to answer: </li></ul><ul><li>What have the Art Nouveau and Berlin in common ? </li></ul><ul><li>Who are mayors of central European towns elevated more than 1000m ? </li></ul><ul><li>Which films are longer than 4 hours and had a budget of less than $1 Million ? </li></ul><ul><li>The information required to answer these is contained in Wikipedia ! </li></ul><ul><li>How can we reveal structure and semantics of Wikipedia content? </li></ul>Linked Data Tutorial
  39. 39. Structure in Wikipedia <ul><li>Title </li></ul><ul><li>Abstract </li></ul><ul><li>Infoboxes </li></ul><ul><li>Geo-coordinates </li></ul><ul><li>Categories </li></ul><ul><li>Images </li></ul><ul><li>Links </li></ul><ul><ul><li>other language versions </li></ul></ul><ul><ul><li>other Wikipedia pages </li></ul></ul><ul><ul><li>To the Web </li></ul></ul><ul><ul><li>Redirects </li></ul></ul><ul><ul><li>Disambiguations </li></ul></ul>Linked Data Tutorial
  40. 40. Infobox templates <ul><li>{{Infobox Korean settlement </li></ul><ul><li>| title = Busan Metropolitan City </li></ul><ul><li>| img = Busan.jpg </li></ul><ul><li>| imgcaption = A view of the [[Geumjeong]] district in Busan </li></ul><ul><li>| hangul = 부산 광역시 </li></ul><ul><li>... </li></ul><ul><li>| area_km2 = 763.46 </li></ul><ul><li>| pop = 3635389 </li></ul><ul><li>| popyear = 2006 </li></ul><ul><li>| mayor = Hur Nam-sik </li></ul><ul><li>| divs = 15 wards (Gu), 1 county (Gun) </li></ul><ul><li>| region = [[Yeongnam]] </li></ul><ul><li>| dialect = [[Gyeongsang]] </li></ul><ul><li>}} </li></ul><ul><li> </li></ul><ul><li>dbp:Busan dbpp:title ″Busan Metropolitan City″ </li></ul><ul><li>dbp:Busan dbpp:hangul ″ 부산 광역시 ″ @Hang </li></ul><ul><li>dbp:Busan dbpp:area_km2 ″763.46“^xsd:float </li></ul><ul><li>dbp:Busan dbpp:pop ″3635389“^xsd:int </li></ul><ul><li>dbp:Busan dbpp:region dbp:Yeongnam </li></ul><ul><li>dbp:Busan dbpp:dialect dbp:Gyeongsang </li></ul><ul><li>... </li></ul>Wikitext-Syntax RDF representation Linked Data Tutorial
  41. 41. Class Hierarchy <ul><li>200k people (70k athletes, 65k artists, 18k office holders) </li></ul><ul><li>193k places (100k areas, 40k cities, 10k rivers) </li></ul><ul><li>187k works (71k music albums, 24k singles, 31k films, 15k books) </li></ul><ul><li>87k species </li></ul><ul><li>70k organisations (20k educational institutions, 18k companies, 12k radio stations) </li></ul><ul><li>22k buildings (8k airports, 5k stations, 2k stadiums, 1k bridges) </li></ul><ul><li>12k planets </li></ul><ul><li>And more… (events, diseases, proteins, drugs, aircrafts, automobiles, ships, astronaut, architect, scientists) </li></ul>
  42. 42. Extraction results <ul><li>Extraction algorithm with the English Wikipedia content ( ) </li></ul><ul><li><1h needed to extract templates and convert them to RDF (>2M English Wikipedia articles, >10GB raw data) </li></ul><ul><li>roughly 30M facts extracted from infobox templates alone </li></ul><ul><li>Sample checks reveal: ~ 90% accuracy , 9% redundant information, 1% erroneous </li></ul><ul><li>multi-domain ontology covering a large body of domains </li></ul><ul><li>extraction results and source code of the extraction algorithm available at </li></ul>Linked Data Tutorial Dataset (en) Triples Articles 7.6M Abstracts 2.1M External Links 3.2M Categories 7.3M Infoboxes 29.3M Persons 560k Yago Classes 2M Wordnet Classes 338k Geo-coordinates 450k Mapping to Flickr, DBLP, Eurostat, CIA-Factbook, Musicbrainz, Project Gutenberg, US Census, … 100k Mapping to OpenCyc 45k
  43. 43. DBpedia Components Wikipedia Dumps Article texts DB tables Infobox Articles Categories … DBpedia datasets SPARQL Endpoint Query Builder SNORQL Browser Traditional Web Browser Web 2.0 Mashups Virtuoso MySQL Extraction loaded into published via … Linked Data … Semantic Web Browsers OpenCyc Wordnet Freebase Geonames … … … interlinked with other open data Linked Data Tutorial
  44. 44. User Interfaces Linked Data Tutorial
  45. 45. DBpedia SPARQL Endpoint (1) <ul><li> </li></ul><ul><li>hosted on a OpenLink Virtuoso server </li></ul><ul><li>can answer SPARQL queries like </li></ul><ul><ul><li>Give me all Sitcoms that are set in NYC? </li></ul></ul><ul><ul><li>All tennis players from Moscow? </li></ul></ul><ul><ul><li>All films by Quentin Tarentino? </li></ul></ul><ul><ul><li>All German musicians that were born in Berlin in the 19th century? </li></ul></ul><ul><ul><li>All soccer players with tricot number 11, playing for a club having a stadium with over 40,000 seats and is born in a country with over 10 million inhabitants? </li></ul></ul>
  46. 46. DBpedia SPARQL Endpoint (2) <ul><li>SELECT ?name ?birth ?description ?person WHERE { </li></ul><ul><li>?person dbp:birthPlace dbp:Berlin . </li></ul><ul><li>?person skos:subject dbp:Cat:German_musicians . </li></ul><ul><li>?person dbp:birth ?birth . </li></ul><ul><li>?person foaf:name ?name . </li></ul><ul><li>?person rdfs:comment ?description . </li></ul><ul><li>FILTER (LANG(?description) = 'en') . </li></ul><ul><li>} ORDER BY ?name </li></ul>Linked Data Tutorial
  47. 47. Overview <ul><li>The Linked Data Web Vision </li></ul><ul><li>Data Web Technologies </li></ul><ul><li>Publishing relational data on the Web </li></ul><ul><li>DBpedia – transforming Wikipedia into a knowledge base </li></ul><ul><li>OntoWiki – an Linked Data Wiki </li></ul><ul><li>Virtuoso – Knowledge Store </li></ul><ul><li>Open Street Maps – free and open geo data </li></ul>Linked Data Tutorial
  48. 48. OntoWiki <ul><li>Semantic Wiki </li></ul><ul><li>Differences </li></ul><ul><li>Similarities </li></ul><ul><li>Architecture </li></ul><ul><li>Use Cases </li></ul>Linked Data Tutorial
  49. 49. Semantic Wiki <ul><li>Wiki with added semantics </li></ul><ul><li>Goal: Wiki pages + background knowledge base </li></ul><ul><li>Examples: Semantic MediaWiki , Rhizome, IkeWiki </li></ul>Linked Data Tutorial
  50. 50. Conceptual Differences: Views over Articles Wiki articles Linked Data Tutorial Resource views
  51. 51. Conceptual Differences: Forms over Code Wiki code Linked Data Tutorial Forms
  52. 52. Conceptual Similarities: Wikiwiki Concepts <ul><li>Everyone can edit anything </li></ul><ul><li>Content is edited in the same way as structure is </li></ul><ul><li>Activity can be watched and reviewed by everyone </li></ul>Ward Cunningham Linked Data Tutorial
  53. 53. Versioning <ul><li>Everything can be undone </li></ul><ul><li>Philosophy: make it easy to correct mistakes </li></ul>Linked Data Tutorial
  54. 54. OntoWiki Application Framework: Interfaces <ul><li>SPARQL Endpoint </li></ul><ul><li>Linked Data Endpoint </li></ul><ul><li>WebDAV </li></ul><ul><li>REST API </li></ul><ul><li>Command Line Interface </li></ul><ul><li>LDAP </li></ul>Linked Data Tutorial
  55. 55. Extensibility <ul><li>Plugins </li></ul><ul><li>Views/Templates </li></ul><ul><li>Themes </li></ul><ul><li>Localizations </li></ul>Linked Data Tutorial
  56. 56. Access Control <ul><li>Model-based </li></ul><ul><li>Action-based </li></ul><ul><li>(Statement-based) </li></ul>Linked Data Tutorial
  57. 57. Other Features <ul><li>Facet-based browsing </li></ul><ul><li>Inline editing </li></ul><ul><li>Auto-adaptive user interface </li></ul><ul><li>Resource auto-suggestion </li></ul><ul><li>SPARQL Query Editor </li></ul>Linked Data Tutorial
  58. 58. Architecture Linked Data Tutorial
  59. 59. Vision <ul><li>Generic data wiki for RDF models </li></ul><ul><ul><li>no data model mismatch (structured vs. unstructured) </li></ul></ul><ul><li>Application framework for: </li></ul><ul><ul><li>Knowledge-intensive applications </li></ul></ul><ul><ul><li>Agile processes </li></ul></ul><ul><ul><li>Distributed user groups </li></ul></ul>Linked Data Tutorial
  60. 60. SoftWiki* Linked Data Tutorial Problem: Requirements Engineering with large, spatially distributed stakeholder groups Solution: comprehensive ontology for representing RE relevant knowledge + adapted OntoWiki application Application of text-mining methods for duplicate detection * Work in BmbF funded project with UniDuE, T-Systems, QA-Systems, LeCoS, ProDV
  61. 61. Linked Data Tutorial
  62. 62. Caucasian Spiders <ul><li>Faunistic database on spiders of the Caucasus </li></ul><ul><li>Taxonomy </li></ul><ul><li>Localities </li></ul><ul><li>240k triples </li></ul>Linked Data Tutorial
  63. 63. Linked Data Tutorial
  64. 64. Professor Catalogue <ul><li>Professor catalogue with 800 entries and 60 schema elements </li></ul><ul><li>OntoWiki used as backend for data entry </li></ul><ul><li>Custom front-end </li></ul>Linked Data Tutorial
  65. 65. Linked Data Tutorial
  66. 66. Linked Data Tutorial
  67. 67. Semantic Wikis: Related Work Linked Data Tutorial OntoWiki Semantic MediaWiki IkeWiki Main developer Uni Leipzig AKSW AIFB Karlsruhe Salzburg Research Technology PHP/MySQL <ul><ul><li>PHP/MySQL (MediaWiki extension) </li></ul></ul>Java/Postgres Base artifacts Facts <ul><ul><li>(annotated) texts </li></ul></ul>(annotated) texts Authoring WYSIWIG facts / forms Wiki syntax / semantic forms WYSIWIG / forms Other Data Web development framework Planned Wikipedia deployment Visual KB browser
  68. 68. Vakantieland* <ul><li>One of the largest tourist information sites in NL (>100.000 daily page views, >20.000 points of interest) </li></ul><ul><li>Traditional relational DB system was to inflexible to capture the increasingly heterogeneous content types </li></ul><ul><li>Development of an OntoWiki based Data Web application </li></ul><ul><li>Geo-data integration from OpenStreetMaps </li></ul><ul><li>Semantic-Search </li></ul><ul><li>Integration of DBpedia data </li></ul><ul><li>Comprehensive performance tuning </li></ul><ul><li>* work with Ceriel Jakobs, Michael Martin partially funded by SenterNovem </li></ul>Linked Data Tutorial
  69. 69. Overview <ul><li>The Linked Data Web Vision </li></ul><ul><li>Data Web Technologies </li></ul><ul><li>Publishing relational data on the Web </li></ul><ul><li>DBpedia – transforming Wikipedia into a knowledge base </li></ul><ul><li>OntoWiki – an Linked Data Wiki </li></ul><ul><li>Open Street Maps – linked open geo data </li></ul>Linked Data Tutorial
  70. 70. Linked Open Geo Data <ul><li>Spatial data is crucial for the Data Web in order to interlink geographically linked resources. </li></ul><ul><li>Open Street Map project (OSM) collects, organizes and publishes geo data the wiki way: </li></ul><ul><li>80.000 OSM users collected data about 22M km ways (roads, highways etc.) on earth , 25T km are added daily </li></ul><ul><li>OSM contains a vast amount points-of-interest descriptions e.g. shops, amenities, sports venues, businesses, touristic and historic sights. </li></ul><ul><li>Goal: publish OSM geo data, interlink it with other data sources and provide efficient means for browsing and authoring: </li></ul><ul><li>Open Street Map data extraction works on the basis of OSM database dumps, a bi-directional live integration of OSM and our Linked Geo Data browser and editor is currently in the works. </li></ul><ul><li>Triplify spatial data publishing , the Triplify script for publishing linked data from relational databases is extended for publishing geo data, in particular with regard to the retrieval of information about geographical areas. </li></ul><ul><li>LinkedGeo Data browser and editor is a facet-based browser for geo content, which uses an OLAP inspired hypercube for quickly retrieving aggregated information about any user selected area on earth. </li></ul>Linked Data Tutorial
  71. 71. Faceted Linked-Geo-Data Browser Linked Data Tutorial
  72. 72. AKSW Linked Data Web Building Blocks DBpedia “ Semantification” of Wikipedia Linked Data Tutorial Triplify “ Semantification” of (small) Web Applications OntoWiki Collaborative creation of explicit knowledge via Semantic Wikis OWLDB Extending DBs for ontology handling / revealing implicit information Vakantieland Building Data Web applications SoftWiki Distributed, stakeholder driven Requirements Engineering Foundations Marrying databases with RDF and ontologies Tools Applications Bringing the Data Web to end users <ul><ul><li>RDF Query Subsumption & View Maintenance </li></ul></ul><ul><ul><li>Scaling database backed Triple Stores </li></ul></ul>xOperator Combining Instant Messaging with the Data Web A semantic Wiki for the sciences … DL-Learner Machine Learning for Ontologies
  73. 73. Thanks! <ul><li>Dr. S ö ren Auer </li></ul><ul><li>[email_address] </li></ul><ul><li>Research group Agile Knowledge Engineering & Semantic Web (AKSW): </li></ul><ul><li> </li></ul><ul><li> </li></ul><ul><li> </li></ul><ul><li> </li></ul><ul><li> </li></ul><ul><li> </li></ul><ul><li> </li></ul>Linked Data Tutorial