Your SlideShare is downloading. ×
The Semantic Data Web, Sören Auer, University of Leipzig
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

The Semantic Data Web, Sören Auer, University of Leipzig

3,553
views

Published on

Presentation: The Semantic Data Web by Sören Auer of University of Leipzig at the Semantics & Media Conference in Mainz in July 2011

Presentation: The Semantic Data Web by Sören Auer of University of Leipzig at the Semantics & Media Conference in Mainz in July 2011

Published in: Education

0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,553
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
106
Comments
0
Likes
6
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Initially, the Web consisted of many Websites containing only unstructured/textual content.The Web 2.0 extended this traditional Web with few extremely large Websites specializing on certain specific content types. Examples are:Wikipedia for encyclopedic articlesDel.icio.us for Web linksFlickr for picturesYouTube for VideosDigg for news…These websites provide specialized searching, querying, sharing, authoring specifically adopted to the content type of their concern.
  • Popular content types such as pictures, movies, calendars, encyclopedic articles, news recipes etc. are already sufficiently well supported on the Web.However, there is a long tail of special-interest content (profiles of expertise, historic data and events, bio-medical knowledge, intra-corporational knowledge etc.) which has very low or no current support (for filtering, aggregation, searching, querying, collaborative editing) on the Web.
  • http://www.flickr.com/photos/residae/2560241604/#/
  • http://www.flickr.com/photos/alasis/3541341601/sizes/l/in/photostream/
  • Transcript

    • 1. The Semantic Data WebSören Auer
    • 2. Problem: Try to search for these things on the current Web:
      • Apartments near German-Russian bilingual childcare in Berlin.
      • 3. ERP service providers with offices in Vienna and London.
      • 4. Researchers working on multimedia topics in Eastern Europe.
      Informationis available on the Web, but opaque to current search.
      Why Semantic Data Web?
      Solution: complement text on Web pages with structured linked open data & intelligently combine/integrate such structured information from different sources:
      Search engine
      HTML
      HTML
      RDF
      RDF
      Web server
      Web server
      Web server
      Web server
      berlin.de
      Has everything about childcare in Berlin.
      Immobilienscout.de
      Knows all about real estate offers in Germany
      DB
      DB
    • 5. Vom Web derDokumentezumSemantic Data Web
      Semantic Web(Vision 1998, starting ???)
      Data Web (since 2006)
      • URI de-referencability
      • 8. Web Data integration
      • 9. RDF serializations
      Social Web (since 2003)
      • Folksonomies/Tagging
      • 10. Reputation, sharing
      • 11. Groups, relationships
      Web (since 1992)
      • HTTP
      • 12. HTML/CSS/JavaScript
    • Web 1.0 Web 2.0 Web 3.0
      Many Web sitescontaining unstructured,textual content
      Few large Web sites are specialized onspecific content types
      Many Web sites containing & semantically syndicating arbitrarily structured content
      Video
      Pictures
      Encyclopedicarticles
      +
      +
    • 13. The Long Tail of Information Domains
      Pictures
      The Long Tail by Chris Anderson (Wired, Oct. ´04) adopted to information domains
      Recipes
      News
      Video
      Calendar
      Popularity
      SemWeb supported structured content
      Requirements-Engineering
      Talentmanagement
      Special interest
      communities
      Itinerary ofKing George
      Gene
      sequences




      Currently supportedstructuredcontent types
      Not or insufficiently supported content types
    • 14. 1. Uses RDF Data Model
      Linked Data Web Technology
      14.7.2011
      takesPlaceAt
      organizes
      Uni Mainz
      Semedia2011
      Mainz
      takesPlaceIn
      2. Is serialised in triples:
      UniMainzorganizes SeMedia2011
      SeMedia2011 takesPlaceAt “20110714”^^xsd:date
      SeMedia2011 takesPlaceAtMainz
      3. Uses Content-negotiation
    • 15. The emerging Web of Data
      interlink
      2009
      2007
      SILK
      DXX Engine
      fuse
      create
      2008
      poolparty
      SemMF
      OntoWiki
      2008
      Sigma
      WiQA
      2008
      2008
      ORE
      repair
      classify
      Virtouso
      2009
      DL-Learner
      MonetDB
      Sindice
      enrich
    • 16. What works now? What has to be done?
      • Web - a global, distributed platform for data, information and knowledge integration
      • 17. exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF
      Achievements
      Extension of the Web with a data commons (25B facts
      vibrant, global RTD community
      Industrial uptake begins (e.g. BBC, Thomson Reuters, Eli Lilly)
      Emerging governmental adoption in sight
      Establishing Linked Data as a deployment path for the Semantic Web.
      • Challenges
      Coherence: Relatively few, expensively maintained links
      Quality: partly low quality data and inconsistencies
      Performance: Still substantial penalties comparedto relational
      Data consumption: large-scale processing, schema mapping and data fusion still in its infancy
      Usability: Establishing direct end-user tools and network effect
      April 2008
      July 2007
      September 2008
      July 2009
    • 18. Linked DataLifecycle
    • 19. Extraction
    • 20. From unstructured sources
      • NLP, text mining, annotation
      From semi-structured sources
      • DBpedia, LinkedGeoData, SCOVO/DataCube
      From structured sources
      • RDB2RDF
      Extraction
    • 21. extract structured information from Wikipedia & make this information available on the Web as LOD:
      • ask sophisticated queries against Wikipedia (e.g. universities in brandenburg, mayors of elevated towns, soccer players),
      • 22. link other data sets on the Web to Wikipedia data
      • 23. Represents a community consensus
      Recently launched DBpedia Live transforms Wikipedia into a structred knowledge base
      Transforming Wikipedia into an Knowledge Base
    • 24. Structure in Wikipedia
      Title
      Abstract
      Infoboxes
      Geo-coordinates
      Categories
      Images
      Links
      other language versions
      other Wikipedia pages
      To the Web
      Redirects
      Disambiguations
      14.07.2011
      Sören Auer - The emerging Web of Linked Data
      13
    • 25. Infobox templates
      Wikitext-Syntax
      {{Infobox Korean settlement
      | title = Busan Metropolitan City
      | img = Busan.jpg
      | imgcaption = A view of the [[Geumjeong]] district in Busan
      | hangul = 부산 광역시
      ...
      | area_km2 = 763.46
      | pop = 3635389
      | popyear = 2006
      | mayor = Hur Nam-sik
      | divs = 15 wards (Gu), 1 county (Gun)
      | region = [[Yeongnam]]
      | dialect = [[Gyeongsang]]
      }}
      http://dbpedia.org/resource/Busan
      dbp:Busan dbpp:title ″Busan Metropolitan City″
      dbp:Busan dbpp:hangul ″부산 광역시″@Hang
      dbp:Busan dbpp:area_km2 ″763.46“^xsd:float
      dbp:Busan dbpp:pop ″3635389“^xsd:int
      dbp:Busan dbpp:region dbp:Yeongnam
      dbp:Busan dbpp:dialect dbp:Gyeongsang
      ...
      RDF representation
      14.07.2011
      Sören Auer - The emerging Web of Linked Data
      14
    • 26. A vast multi-lingual, multi-domain knowledge base
      DBpedia extraction results in:
      descriptions of ca. 3.4 million things (1.5 million classified in a consistent ontology, including 312,000 persons, 413,000 places, 94,000 music albums, 49,000 films, 15,000 video games, 140,000 organizations, 146,000 species, 4,600 diseases
      labels and abstracts for these 3.2 million things in up to 92 different languages; 1,460,000 links to images and 5,543,000 links to external web pages; 4,887,000 external links into other RDF datasets, 565,000 Wikipedia categories, and 75,000 YAGO categories
      altogether over 1 billion pieces of information (i.e. RDF triples): 257M from English edition, 766M from other language editions
      DBpedia made a substantial impact in science, technology and society
      DBpedia became the central interlinking hub on the Data Web
      Scientific publications attracted more than 500 citations
      More than 15.000 monthly visits on DBpedia.org,numerous press articles, blog posts …
      Ecosystem of commercial and community applications:ThomsonReuters, BBC, Neofonie, Openlink, Faviki …
      Motivates further research need wrt. scalability, quality
      14.07.2011
      Sören Auer - The emerging Web of Linked Data
      15
    • 27. Many different approaches (D2R, Virtuoso RDF Views, Triplify, …)
      No agreement on a formalsemantics of RDF2RDFmapping
      • LOD readiness, SPARQL-SQL translation
      W3C RDB2RDF WG
      Extraction Relational Data
    • 28. From unstructured sources
      • Deploy existing NLP approaches (OpenCalais, Ontos API)
      • 29. Develop standardized, LOD enabled interfaces between NLP tools (NLP2RDF)
      From semi-structured sources
      • Efficient bi-directional synchronization
      From structured sources
      • Declarative syntax and semantics of data model transformations (W3C WG RDB2RDF)
      Orthogonal challenges
      • Using LOD as background knowledge
      • 30. Provenance
      Extraction Challenges
    • 31. Storage and Querying
    • 32. Still by a factor 5-50 slower than relational data management (BSBM)
      Performance increases steadily
      Comprehensive, well-supported open-soure and commercial implementations are available:
      • OpenLink’s Virtuoso (os+commercial)
      • 33. Big OWLIM (commercial), Swift OWLIM (os)
      • 34. 4store (os)
      • 35. Talis (hosted)
      • 36. Bigdata (distributed)
      • 37. Allegrograph (commercial)
      • 38. Mulgara (os)
      RDF Data Management
    • 39.
      • Reduce the performance gap between relational and RDF data management
      • 40. SPARQL Query extensions
      • 41. Spatial/semantic/temporal data management
      • 42. More advanced query result caching
      • 43. View maintenance / adaptive reorganization based on common access patterns
      • 44. More realistic benchmarks
      Storage and Querying Challenges
    • 45. Authoring
    • 46. 1. Semantic (Text) Wikis
      • Authoring of semanticallyannotated texts
      2. Semantic Data Wikis
      • Direct authoring of structured information (i.e. RDF, RDF-Schema, OWL)
      Two Kinds of Semantic Wikis
    • 47. Versatile domain-independent tool
      Serves as Linked Data / SPARQL endpoint on the Data Web
      Open-source project hosted at Google code
      Not just a Wiki UI, but a whole framework for the development of Semantic Web applications
      Developed in PHP based on the Zend framework
      Very active developer and user community
      More than 500 downloads monthly
      Large number of use cases
      Management of multimedia content for a record label in the UK
      OntoWiki – a semantic data wiki
    • 48. Dynamische Sichten auf die Wissensbasis
      OntoWiki
      14.07.2011
      Sören Auer - The emerging Web of Linked Data
      24
    • 49. OntoWiki
      RDF-Triple auf Resourcen-Detailseite
      14.07.2011
      Sören Auer - The emerging Web of Linked Data
      25
    • 50. OntoWiki
      Dynamische Vorschläge aus dem Daten Web
      14.07.2011
      Sören Auer - The emerging Web of Linked Data
      26
    • 51. RDFauthor in OntoWiki
    • 52. 14.07.2011
      Sören Auer - The emerging Web of Linked Data
      28
      OntoWiki: Caucasian Spiders
    • 53. OntoWiki: Supporting Requirements engineering
    • 54. Semantic Portal with OntoWiki: Vakantieland
    • 55. Linking
      © CC-BY-NC-ND by ~Dezz~ (residae on flickr)
    • 56. Automatic
      Semi-automatic
      Manual
      • Sindice integration into UIs
      • 58. Semantic Pingback
      LOD Linking
    • 59. update and notification services for LOD
      Downward compatible with Pingback (blogosphere)
      http://aksw.org/Projects/SemanticPingBack
      Creating a network effect aroundLinking Data: Semantic Pingback
    • 60. Visualizing Pingbacks in OntoWiki
    • 61. Only 5% of the information on the Data Web is actually linked
      • Make sense of work in the de-duplication/record linkage literature
      • 62. Consider the open world nature of Linked Data
      • 63. Use LOD background knowledge
      • 64. Zero-configuration linking
      • 65. Explore active learning approaches, which integrate users in a feedback loop
      • 66. Maintain a 24/7 linking service: Linked Open Data Around-The-Clock project (LATC-project.eu)
      Interlinking Challenges
    • 67. Enrichment
    • 68. Linked Data is mainly instance data and !!!
      ORE (Ontology Repair and Enrichment) tool allows to improve an OWL ontology by fixing inconsistencies & making suggestions for adding further axioms.
      • Ontology Debugging: OWL reasoning to detect inconsistencies and satisfiable classes + detect the most likely sources for the problems. user can create a repair plan, while maintaining full control.
      • 69. Ontology Enrichment: uses the DL-Learner framework to suggest definitions & super classes for existing classes in the KB. works if instance data is available for harmonising schema and data.
      http://aksw.org/Projects/ORE
      Enrichment & Repair
    • 70. Quality
      Analysis
    • 71. Quality on the Data Web is varying a lot
      • Hand crafted or expensively curated knowledge base (e.g. DBLP, UMLS) vs. extracted from text or Web 2.0 sources (DBpedia)
      Research Challenge
      • Establish measures for assessing the authority, provenance, reliability of Data Web resources
      Linked Data Quality Analysis
    • 72. Evolution
      © CC-BY-SA by alasis on flickr)
    • 73.
      • unified method, for both data evolution and ontology refactoring.
      • 74. modularized, declarative definition of evolution patterns is relatively simple compared to an imperative description of evolution
      • 75. allows domain experts and knowledge engineers to amend the ontology structure and modify data with just a few clicks
      • 76. Combined with RDF representation of evolution patterns and their exposure on the Linked Data Web, EvoPat facilitates the development of an evolution pattern ecosystem
      • 77. patterns can be shared and reused on the Data Web.
      • 78. declarative definition of bad smells and corresponding evolution patterns promotes the (semi-)automatic improvement of information quality.
      EvoPat – Pattern based KB Evolution
    • 79. Evolution Patterns
    • 80. 14.07.2011
      Sören Auer - The emerging Web of Linked Data
      43
    • 81. Exploration
    • 82. CatalogusProfessorumLipsiensis
    • 83. Visual Query Builder
    • 84. Relationship Finder in CPL
    • 85. PublicData.eu
    • 86. http://fintrans.publicdata.eu
    • 87. http://energy.publicdata.eu
    • 88. http://scoreboard.lod2.eu
    • 89. Anwendung BBC Music
      39 meist besuchte Webseite
      Integriert verschiedenste Linked Data Quellen (DBpedia, Musicbrainz, Geonames)
    • 90. LODLifecycle
      supported byDebian based
      LOD2 Stack
      (to be released in September)
    • 91.
      • Linked Data isrelativelysimple touse
      • 92. The LOD ecosystemcomprisesextraction, authoring, enrichment, exploration & search
      • 93. More workhastobedoneto bring variousbitsandpiecestogether
      • 94. Debian based LOD2 Stackfaciliatesmanagingthe LOD lifecycle
      • 95. CKAN.net will evolveinto a registrytordatasets, visualizations, apps, wrappers, mashups, ..
      • 96. Linked Data is a perfecttechnologyforMedia applications
      Take homemessages
    • 97. Thanksforyourattention!
      Sören Auer
      http://www.informatik.uni-leipzig.de/~auer/ | http://aksw.org | http://lod2.org
      auer@uni-leipzig.de
    • 98. AKSW Data Web Building Blocks
      Vakantieland
      Building Data Web applications
      OntoWiki
      Collaborative creation of explicit knowledge via Semantic Wikis
      RDB2RDF
      Mapping relational data to RDF
      SoftWiki
      Distributed, stakeholder driven Requirements Engineering
      DL-Learner
      Machine Learning for Ontologies
      DBpedia
      “Semantification” ofWikipedia
      OpenResearch.org
      A semantic Wiki for the sciences
      ORE
      Ontology Enrichment & Repair
      LinkedGeoData
      “Semantification” of OpenStreetMaps
      xOperator
      Combining Instant Messaging with the Data Web
      LIMES
      Link Discovery Framework
      formetricspaces)
      Triplify
      “Semantification” of (small) Web Applications
      CatalogusProfessorum
      Prosopographical knowledge base
      RDF Query Subsumption& View Maintenance
      Scaling triple stores
      LESS
      Semantification Syndication

      Foundations
      Marrying databases with RDFand ontologies
      Applications
      Bringing the Data Web to end users
      Tools
      14.07.2011
      Sören Auer - The emerging Web of Linked Data
      56
    • 99. LOD2 in a Nutshell
      57
      Research focus
      • Very large RDF datamanagement
      • 100. Enrichment &Interlinking
      • 101. Fusion & InformationQuality
      • 102. Adaptive UI interfaces
      Use Cases
      • Media & Publishing
      • 103. Enterprise Data Webs
      • 104. Open Gov Data
      Partners
      Uni Leipzig, DERI Galway, FU Berlin, Semantic Web Company, OpenLink, Tenforce, Exalead, WoltersKluwer, OKFN
    • 105. Open Governmental Data –and ideal testbed for Linked Data?
      • European registry & collaboration platform for open governmental data
      • 106. Outreach & involve original data providers - local, regional, national and European
      Close cooperation with W3C eGov IG, OKFN’s OpenEUdata, PSI & grassroots efforts
      CKAN.org | OKFN’s EuOpenData group |ICT2010 Networking Session
      UIs and Personalization
      individual mashups of data with other sources
      Notification/subscription service based on personal prefs
      Transparency wishlists, upload revisions, derivates
      create and publish queries, reports and visualizations
      58
    • 107. Linked Enterprise Intra Data Webs can fill the gap between Intra-/Extranets and EIS/ERP
      Facilitates data integration along value-chains within and across enterprises
      The pragmatic, incremental, vocabulary based Linked Data approach reduces data integration costs significantly
      The wealth of knowledge available as Linked Open Data can be leveraged as background knowledge for Enterprise applications
      Linked Enterprise Data

    ×