Your SlideShare is downloading. ×
Linked Data Driven Data Virtualization for Web-scale Integration
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Linked Data Driven Data Virtualization for Web-scale Integration

1,861
views

Published on

Linked Data Driven Data Virtualization for Web-scale Integration

Linked Data Driven Data Virtualization for Web-scale Integration

Published in: Technology, Education

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,861
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1.
      • Linked Data Driven Data Virtualization for Web-scale Integration
    © 2009 OpenLink Software, All rights reserved Orri Erling Program Manager, Virtuoso
  • 2. Situation Analysis
    • Agility via ad hoc data access has prevailed throughout the history of IT.
    • Data, heterogeneity are growing exponentially, across Intranets, Extranets, and the Internet
    • Processing windows remain static (we still only have 24 hrs. in a day for personal and professional activities)
    • Individual and Enterprise Agility remains totally dependent on data access, manipulation, and dissemination
    • Data remains dirty and its context remains necessary for extracting meaning.
    • Data Virtualization (in the form of heterogeneous Linked Data Spaces) remains the only viable way forward.
    © 2009 OpenLink Software, All rights reserved
  • 3. What is Linked Data?
    • RDF (Resource Description Framework) Data Model - a graph model where records take the form of
    • 3-tuples i.e., subject-predicate-object or entity-attribute-value
    • RDF Data Serialization Formats - (X)HTML+RDFa, Turtle, N3, TriX, RDF/XML, and others
    • RDF Data Item Identity - is HTTP URI based
    • RDF is inherently schema-last and self-describing
    • Linked Data - application of RDF model where records identifiers, fields, and optionally field values, are endowed with HTTP scheme URIs whether instance data (ABox) or data dictionary data (TBox)
    • Linked Data enables follow-your-nose traversal of RDF data records where every record identifier, field, or field value is a data pathway
    © 2009 OpenLink Software, All rights reserved
  • 4. The Linked Data Landscape
    • Core vocabularies - common terms facilitate integration:
      • FOAF for Personal Profile
      • SIOC for Social Networking
      • Dublin Core for Bibliography
      • GoodRelations for eCommerce
      • Geonames
    • Domain specific vocabularies for all verticals:
      • OBO Foundry for biology
    • Dbpedia, OpenCYC, Yago, SUMO, Geonames etc. define URIs for talking about almost any well known real world entity or class of entities.
    © 2009 OpenLink Software, All rights reserved
  • 5. The Linked Open Data Cloud © 2009 OpenLink Software, All rights reserved
  • 6. What Linked Data Offers for Data Integration © 2009 OpenLink Software, All rights reserved
    • In RDF, all things have a single-part global HTTP based Identifier: Anything can join with anything else through its URI.Many people will use a different identifier for the same thing.
    • Whether two things can be considered the same depends on context. OWL sameAs is a generic way of stating identity co-reference.Literal values can be tagged by type or language, allowing explicit representation of units of measure etc.RDF Triples are contained in Named Graphs. The graph usually denotes provenance, and it has a URI, about which further statements can be made
  • 7. RDF vs. Relational
    • When the data is ragged and highly heterogenous, with schema last needs, use RDF and Linked Data
    • The more different sources of data, the more you will need RDF and Linked Data
    • If data is highly regular and uniform, relational offers higher performance: Application specific indices, &c are faster than putting everything in a generic index scheme
    © 2009 OpenLink Software, All rights reserved
  • 8. Incentives for Publishing
    • If one is on the web, one is there in order to be found
    • Publishing data in standard vocabularies allows applications to mesh data from many Web-addressable Data Spaces (eg. Pages)
    • In the end, Linked Data will enhance the end user experience by added serendipitous discovery and increased relevance
    © 2009 OpenLink Software, All rights reserved
  • 9. Models for Publishing
    • Linked data is usually published in large dumps which have a release cycle
    • Any relational database's contents can be published as linked data through generating RDF on demand via a relational to RDF schema mapping
    • Whether one generates RDF on demand or ETLs RDBs as RDF depends on use case
    © 2009 OpenLink Software, All rights reserved If one publishes data – whether as a product, for promotion, or regulatory compliance – RDF/Linked Data is attractive because of a critical mass of reusable terms and a ready base of technology. As more data is published, the link density increases, leading to more novel ways of deriving value from the data.
  • 10. Use Case: CRM and MIS
    • At OpenLink internal IT, all CRM, Support, Blogs, Wikis available as linked data
    • Interactive drill down from products to support tickets to customers to docs, etc.
    • Currently working on projects about exposing enterprise CRM as linked data
    © 2009 OpenLink Software, All rights reserved
  • 11. Use Case: The Neurocommons © 2009 OpenLink Software, All rights reserved AddGene Plasmids NeuronDB BAMS Neurocommons text mining Homologene SWAN Entrez Gene Gene ontology annotations Mammalian Phenotype PDSPki BrainPharm AlzGene Antibodies PubChem MESH Reactome Allen Brain Atlas Publications CCDB Neuronbank OBO Ontologies NeuroMorpho SAO Coriell catalog
  • 12. Bio2RDF - some of the larger datasets © 2009 OpenLink Software, All rights reserved Name Triple count PubMed * 797,000,000 NCBI GeneID 172,931,628 Uniprot 797,000,000 UniRef * 242,000,000 UniParc * 490,000,000 IproClass 149,342,977
  • 13. Use Case: BBC Programs and Music Service
    • Data Harvested via Sitemap and Web Crawling
    • 20M Triples
    • Integrated to Last.FM, Dbpedia, Musicbrainz: See what any of these has to say about an artist of work.
    • http://bbc.openlinksw.com
    © 2009 OpenLink Software, All rights reserved
  • 14. Use Case: Linked Open Data Cloud Service
    • Dbpedia, Freebase, Geodata, Neurocommons, Bio2RDF, Govtrack, US Census, RKB Explorer
    • Pingthesemanticweb, Good Relations and more
    • Entity Ranks
    • Full Text, SPARQL, Faceted Browsing
    • http://lod2.openlinksw.com , http://lod.openlinksw.com
    • 7.59 billion triples
    © 2009 OpenLink Software, All rights reserved
  • 15. The Generations of the Web
    • Web 1.0 - Publishing for all
    • Web 2.0 - User generated content, mashups, the citizen journalist
    • Linked Data Web - Big data, integration and analytics for all.
    © 2009 OpenLink Software, All rights reserved

×