Creating knowledge out of interlinked data

  • 1,797 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,797
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
46
Comments
0
Likes
5

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • http://www.flickr.com/photos/residae/2560241604/#/
  • http://www.flickr.com/photos/alasis/3541341601/sizes/l/in/photostream/

Transcript

  • 1. Creating Knowledge out of Interlinked DataSören Auer
  • 2. 1. Reasoningdoes not scale on the Web
    • IR / one dimensional indexingscales (Google)
    • 3. Next stepconjunctivequerying (OWL-QL?, dynamicscale-out / clustering)
    • 4. Web scalable DL reasoningis out-of-sight (maybefragment, fuzzyreasoninghassomechances)
    2. Ifitwouldscaleitwould not beaffordable
    • “What is the only former Yugoslav republic in theEuropean Union?”
    • 5. 2880 POWER7 cores, 16 Terabytes memory, 4 Terabytes clustered storage (IBM Watson) still can not answer this question
    WhytheSemantic Web won‘twork (soon)
  • 6. Whatshallwe do
    inbetween?
    Howcanwemakeit happen faster?
    IftheSemantic Web does not happen soon…
    Dayton BRANDFIELD (American)Old Hill Road, April 2, 1937, (historical depression)
  • 7. … andtryto find an shallowmigrationpath
    Wecan do whatworksalreadynow…
    http://www.flickr.com/photos/jurvetson/
  • 8. What works now? What has to be done?
    • Web - a global, distributed platform for data, information and knowledge integration
    • 9. exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF
    Achievements
    Extension of the Web with a data commons (25B facts
    vibrant, global RTD community
    Industrial uptake begins (e.g. BBC, Thomson Reuters, Eli Lilly)
    Emerging governmental adoption in sight
    Establishing Linked Data as a deployment path for the Semantic Web.
    • Challenges
    Coherence: Relatively few, expensively maintained links
    Quality: partly low quality data and inconsistencies
    Performance: Still substantial penalties comparedto relational
    Data consumption: large-scale processing, schema mapping and data fusion still in its infancy
    Usability: Establishing direct end-user tools and network effect
    April 2008
    July 2007
    September 2008
    July 2009
  • 10. Linked DataLifecycle
  • 11. Extraction
  • 12. From unstructured sources
    • NLP, text mining, annotation
    From semi-structured sources
    • DBpedia, LinkedGeoData, SCOVO/DataCube
    From structured sources
    • RDB2RDF
    Extraction
  • 13. extract structured information from Wikipedia & make this information available on the Web as LOD:
    • ask sophisticated queries against Wikipedia (e.g. universities in brandenburg, mayors of elevated towns, soccer players),
    • 14. link other data sets on the Web to Wikipedia data
    • 15. Represents a community consensus
    Recently launched DBpedia Live transforms Wikipedia into a structured knowledge base
    Transforming Wikipedia into an Knowledge Base
  • 16. Structure in Wikipedia
    Title
    Abstract
    Infoboxes
    Geo-coordinates
    Categories
    Images
    Links
    other language versions
    other Wikipedia pages
    To the Web
    Redirects
    Disambiguations
    29.05.2011
    Sören Auer - The emerging Web of Linked Data
    10
  • 17. Infobox templates
    Wikitext-Syntax
    {{Infobox Korean settlement
    | title = Busan Metropolitan City
    | img = Busan.jpg
    | imgcaption = A view of the [[Geumjeong]] district in Busan
    | hangul = 부산 광역시
    ...
    | area_km2 = 763.46
    | pop = 3635389
    | popyear = 2006
    | mayor = Hur Nam-sik
    | divs = 15 wards (Gu), 1 county (Gun)
    | region = [[Yeongnam]]
    | dialect = [[Gyeongsang]]
    }}
    http://dbpedia.org/resource/Busan
    dbp:Busan dbpp:title ″Busan Metropolitan City″
    dbp:Busan dbpp:hangul ″부산 광역시″@Hang
    dbp:Busan dbpp:area_km2 ″763.46“^xsd:float
    dbp:Busan dbpp:pop ″3635389“^xsd:int
    dbp:Busan dbpp:region dbp:Yeongnam
    dbp:Busan dbpp:dialect dbp:Gyeongsang
    ...
    RDF representation
    29.05.2011
    Sören Auer - The emerging Web of Linked Data
    11
  • 18. A vast multi-lingual, multi-domain knowledge base
    DBpedia extraction results in:
    descriptions of ca. 3.4 million things (1.5 million classified in a consistent ontology, including 312,000 persons, 413,000 places, 94,000 music albums, 49,000 films, 15,000 video games, 140,000 organizations, 146,000 species, 4,600 diseases
    labels and abstracts for these 3.2 million things in up to 92 different languages; 1,460,000 links to images and 5,543,000 links to external web pages;4,887,000 external links into other RDF datasets, 565,000 Wikipedia categories, and 75,000 YAGO categories
    altogether over 1 billion pieces of information (i.e. RDF triples): 257M from English edition, 766M from other language editions
    DBpediaLive (http://live.dbpedia.org/sparql/) &Mappings Wiki (http://mappings.dbpedia.org)integrate the community into a refinement cycle
    UpcommingDBpedia inline
    29.05.2011
    Sören Auer - The emerging Web of Linked Data
    12
  • 19. OpenStreetMaps
    • Wikipedia forGeoData
    • 20. Outperformescommercialmapproviders in manyregions
    • 21. Extremlyrichsourceofdata (shophours, trashbins, excavations, …)
    LinkedGeoData – revealingthedatabehindOpenStreetMaps
    LinkedGeoDataArchitecture
  • 22.
  • 23. Stevie
    http://tiny.cc/stevie10
    LinkedGeoData Apps
    BeAware
    http://beaware.at/
    Vicibit
    http://vicibit.linkedgeodata.org
  • 24. DataCube – Publishing Statistical Data
    http://publishing-statistical-data.googlecode.com/svn/trunk/specs/src/main/html/cube.html
  • 25. DataCube Importer – Linked Statistical Data
  • 26. Many different approaches (D2R, Virtuoso RDF Views, Triplify, …)
    No agreement on a formalsemantics of RDF2RDFmapping
    • LOD readiness, SPARQL-SQL translation
    W3C RDB2RDF WG
    Extraction Relational Data
  • 27. From unstructured sources
    • Deploy existing NLP approaches (OpenCalais, Ontos API)
    • 28. Develop standardized, LOD enabled interfaces between NLP tools (NLP2RDF)
    From semi-structured sources
    • Efficient bi-directional synchronization
    From structured sources
    • Declarative syntax and semantics of data model transformations (W3C WG RDB2RDF)
    Orthogonal challenges
    • Using LOD as background knowledge
    • 29. Provenance
    Extraction Challenges
  • 30. Storage and Querying
  • 31. Still by a factor 5-50 slower than relational data management (BSBM, DBpedia Benchmark)
    Performance increases steadily
    Comprehensive, well-supported open-soure and commercial implementations are available:
    • OpenLink’s Virtuoso (os+commercial)
    • 32. Big OWLIM (commercial), Swift OWLIM (os)
    • 33. 4store (os)
    • 34. Talis (hosted)
    • 35. Bigdata (distributed)
    • 36. Allegrograph (commercial)
    • 37. Mulgara (os)
    RDF Data Management
  • 38.
    • UsesDBpediaasdataand a selectionof 25 frequentlyexecutedqueries
    • 39. Can generatefractionsand multiples ofDBpedia‘ssize
    • 40. Does not resemble relational data
    Performance differences, observedwithotherbenchmarksareamplified
    DBpedia Benchmark
    GeometricMean
  • 41.
    • Reduce the performance gap between relational and RDF data management
    • 42. SPARQL Query extensions
    • 43. Spatial/semantic/temporal data management
    • 44. More advanced query result caching
    • 45. View maintenance / adaptive reorganization based on common access patterns
    • 46. More realistic benchmarks
    Storage and Querying Challenges
  • 47. Authoring
  • 48. 1. Semantic (Text) Wikis
    • Authoring of semanticallyannotated texts
    2. Semantic Data Wikis
    • Direct authoring of structured information (i.e. RDF, RDF-Schema, OWL)
    Two Kinds of Semantic Wikis
  • 49. Versatile domain-independent tool
    Serves as Linked Data / SPARQL endpoint on the Data Web
    Open-source project hosted at Google code
    Not just a Wiki UI, but a whole framework for the development of Semantic Web applications
    Developed in PHP based on the Zend framework
    Very active developer and user community
    More than 500 downloads monthly
    Large number of use cases
    OntoWiki – a semantic data wiki
  • 50. Dynamische Sichten auf die Wissensbasis
    OntoWiki
    29.05.2011
    Sören Auer - The emerging Web of Linked Data
    27
  • 51. OntoWiki
    RDF-Triple auf Resourcen-Detailseite
    29.05.2011
    Sören Auer - The emerging Web of Linked Data
    28
  • 52. OntoWiki
    Dynamische Vorschläge aus dem Daten Web
    29.05.2011
    Sören Auer - The emerging Web of Linked Data
    29
  • 53. CatalogusProfessorumLipsiensis
  • 54. 29.05.2011
    Sören Auer - The emerging Web of Linked Data
    31
    OntoWiki: Caucasian Spiders
  • 55. RDFauthor in OntoWiki
  • 56. OntoWiki: Supporting Requirements engineering
  • 57. Semantic Portal with OntoWiki: Vakantieland
  • 58. RDFaCE- RDFa Content Editor
  • 59. Linking
    © CC-BY-NC-ND by ~Dezz~ (residae on flickr)
  • 60. Automatic
    Semi-automatic
    Manual
    • Sindice integration into UIs
    • 62. Semantic Pingback
    LOD Linking
  • 63. update and notification services for LOD
    Downward compatible with Pingback (blogosphere)
    http://aksw.org/Projects/SemanticPingBack
    Creating a network effect aroundLinking Data: Semantic Pingback
  • 64. Visualizing Pingbacks in OntoWiki
  • 65. Only 5% of the information on the Data Web is actually linked
    • Make sense of work in the de-duplication/record linkage literature
    • 66. Consider the open world nature of Linked Data
    • 67. Use LOD background knowledge
    • 68. Zero-configuration linking
    • 69. Explore active learning approaches, which integrate users in a feedback loop
    • 70. Maintain a 24/7 linking service: Linked Open Data Around-The-Clock project (LATC-project.eu)
    Interlinking Challenges
  • 71. Enrichment
  • 72. Linked Data is mainly instance data and !!!
    ORE (Ontology Repair and Enrichment) tool allows to improve an OWL ontology by fixing inconsistencies & making suggestions for adding further axioms.
    • Ontology Debugging: OWL reasoning to detect inconsistencies and satisfiable classes + detect the most likely sources for the problems. user can create a repair plan, while maintaining full control.
    • 73. Ontology Enrichment: uses the DL-Learner framework to suggest definitions & super classes for existing classes in the KB. works if instance data is available for harmonising schema and data.
    http://aksw.org/Projects/ORE
    Enrichment & Repair
  • 74. Quality
    Analysis
  • 75. Quality on the Data Web is varying a lot
    • Hand crafted or expensively curated knowledge base (e.g. DBLP, UMLS) vs. extracted from text or Web 2.0 sources (DBpedia)
    Research Challenge
    • Establish measures for assessing the authority, provenance, reliability of Data Web resources
    Linked Data Quality Analysis
  • 76. Evolution
    © CC-BY-SA by alasis on flickr)
  • 77.
    • unified method, for both data evolution and ontology refactoring.
    • 78. modularized, declarative definition of evolution patterns is relatively simple compared to an imperative description of evolution
    • 79. allows domain experts and knowledge engineers to amend the ontology structure and modify data with just a few clicks
    • 80. Combined with RDF representation of evolution patterns and their exposure on the Linked Data Web, EvoPat facilitates the development of an evolution pattern ecosystem
    • 81. patterns can be shared and reused on the Data Web.
    • 82. declarative definition of bad smells and corresponding evolution patterns promotes the (semi-)automatic improvement of information quality.
    EvoPat – Pattern based KB Evolution
  • 83. Evolution Patterns
  • 84. 29.05.2011
    Sören Auer - The emerging Web of Linked Data
    48
  • 85. Exploration
  • 86.
  • 87. Visual Query Builder
  • 88. Relationship Finder in CPL
  • 89. LODLifecycle
    supported byDebian based
    LOD2 Stack
    (to be released in September)
  • 90. Enables applications that allow for
    • getting a 360 ° view on an issue
    • 91. rapid fact-checking
    • 92. cross-referencing and checking of statistical claims (“how to lie with statistics”)
    • 93. increased transparency in public debate
    • 94. release of the creative potential of “the crowd”
    Help citizens in their daily life
    • to understand their governments better
    • 95. find good places to live (little pollution, good schools, close to protected natural sites…)
    • 96. locate public services (administrative offices, public toilets…)
    Brings citizens and government closer together
    Some Benefits of Linked Open Gov’t Data
  • 97. PublicData.eu
  • 98. http://fintrans.publicdata.eu
  • 99. http://energy.publicdata.eu
  • 100. http://scoreboard.lod2.eu
  • 101.
  • 102. 29.05.2011
    Sören Auer - The emerging Web of Linked Data
    60
  • 103. 29.05.2011
    Sören Auer - The emerging Web of Linked Data
    61
  • 104. Linked Enterprise Intra Data Webs can fill the gap between Intra-/Extranets and EIS/ERP
    Facilitates data integration along value-chains within and across enterprises
    The pragmatic, incremental, vocabulary based Linked Data approach reduces data integration costs significantly
    The wealth of knowledge available as Linked Open Data can be leveraged as background knowledge for Enterprise applications
    Linked Enterprise Data
  • 105. Thanksforyourattention!
    Sören Auer
    http://www.informatik.uni-leipzig.de/~auer/ | http://aksw.org | http://lod2.org
    auer@uni-leipzig.de