Your SlideShare is downloading. ×
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Exploration of a Data Landscape using a Collaborative Linked Data Framework.


Published on

In Proceedings of the The Future of the Web for Collaborative Science, WWW2010 (Raleigh, North Carolina, April 26, 2010)

In Proceedings of the The Future of the Web for Collaborative Science, WWW2010 (Raleigh, North Carolina, April 26, 2010)

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • Laurent Alquier Project Lead in the Informatics CoE group, in J&J Pharma R&D. Between R&D and IT. We provide informatics solutions to help researchers.
  • Scientists want to ask translational questions such as…. Identify patient subgroups based on genetics/genomics. Which biomarkers predict disease progression of people in their 50s and 60s
  • But they end up spending too much time on questions such as these…
  • knowIT was designed to help capture what we know about IT systems and facilitate answering real questions about scientific data.
  • We have used knowIT to catalogue information about data sources…. ++++++++++++++++++++ Built on top of Semantic MediaWiki As a wiki, it is easy to use And allows customization and community features necessary to encourage participation Description of data sources using selected metadata Content updated directly by IT, scientists and data source owners 170 internal and external data sources catalogued
  • As a Semantic Wiki Data can be accessed through search, faceted navigation, and SPARQL queries SMW gives structure to content of wiki pages And enables queries
  • SMW  allows direct translation of page content into RDF triples Includes knowledge management tools (from MediaWiki’s experience with Wikipedia) Provides many ways to import and export content
  • The data source catalog in knowIT is only a first step in a larger Linked Data Framework in J&J With a first application to Neuroscience Translational medicine and open innovation are key drivers Require better understanding of data sources
  • Here are the details of the architecture Data source catalog and mapping of SPARQL endpoints (knowIT) Relational to RDF mapping (D2R) Triple stores (4store, OpenRDF) Repository for ontological concepts Ontologies uploaded into OpenRDF Sesame triple store
  • We are using NIF, but exploring adding TMO as a high level ontology framework Used to dereference URIs discovered within data cloud
  • The framework also provides a C# API for easy access within our application development platform
  • This API allows the development of visual browsing and query applications 3DX can be used to find data sources
  • Bring data into a spreadsheet environment, from which a rich array of analytics can be invoked
  • The SPARQL interface to knowIT content is used to provide incremental updates for our enterprise search engine
  • … and to analyze relationships within the content of each data source (RelFinder)
  • ... as well as to visualize relationships between data sources
  • Next steps will include automation, integration with existing ontologies, self describing data sources, dynamic visualizations as well as a broader adoption.
  • Finally, I would like to take this occasion to thank …
  • Questions ?
  • Transcript

    • 1. Exploration of a Data Landscape using a Collaborative Linked Data Framework Laurent Alquier, Project Lead, Informatics CoE
    • 2. Scientists are looking for answers to translational questions Background images from Flickr - CC Share Alike 2.0 Basic Research Prototype Design or Discovery Preclinical Development Clinical Development FDA Filing, Approval & Launch Prep Translational Research Critical Path Research
    • 3. Instead they research how to find and access data. Which data source should I choose ? Can I access it ? Who should I talk to ? Scientists want to do Science, not Data Management. Which interfaces are available ?
    • 4. knowIT helps scientists and IT manage what they know about Information Systems.
    • 5. Collaborative cataloging of data sources
    • 6. Structured data inside wiki pages
    • 7. Advantages of Semantic MediaWiki (SMW) Wiki pages are subjects for Semantic annotations (triples : page – property – value) MediaWiki knowledge management tools Many ways to input and output content.
    • 8.
      • Provide a flexible integrative data layer
          • Linked Data concepts can be used for flexible data integration of internal and external sources of disparate data
      • Enable scientists to answer novel translational questions
        • Linked Data queries support diverse translational scientific questions
      Linked Data Pilot for Neuroscience
    • 9. Overview of the Linked Data Framework RDF CSV XML RDBMS Docs RDB2RDF mapping TM J&J Ontol. Data catalog Data browser & Visualization Ontology Curation J&J RDF SPARQL Semantic Wiki
    • 10. Reviewing W3C’s Translational Medicine Ontology (TMO) as a base for J&J Ontology
    • 11.
        • Data Source discovery
          • Provides framework with provenance information
        • Resource discovery
          • Extracts relationships between concepts across data sources
        • SPARQL Querying
          • Constructs and captures queries, parses results
        • Inferencing
          • Pulls pre-constructed rule sets hosted in server directory
      C# API allows programmatic abstraction of….
    • 12. Ask questions using federated queries
    • 13. Integration with analytical tools
    • 14. Improved enterprise search
    • 15. Analysis of data sources content Relfinder
    • 16. Visualization of relationships between data sources
    • 17. Next steps Automation and integration with existing ontologies Enable self-describing data sources Dynamic, layered map of data landscape Incorporate additional data sources Drive broader awareness and adoption of the tool
    • 18.
      • We would like to thank current and past contributors for their patience, ideas and support :
        • Tim Schultz
        • Susie Stephens
        • Dimitris Agrafiotis
        • Rudi Verbeek
        • John Stong
        • Joe Ciervo
        • Keith McCormick
    • 19.
      • Laurent Alquier
      • Software engineer, Project lead
      • Informatics CoE
      • [email_address]