Laurent Alquier Project Lead in the Informatics CoE group, in J&J Pharma R&D. Between R&D and IT. We provide informatics solutions to help researchers.
Scientists want to ask translational questions such as…. Identify patient subgroups based on genetics/genomics. Which biomarkers predict disease progression of people in their 50s and 60s
But they end up spending too much time on questions such as these…
knowIT was designed to help capture what we know about IT systems and facilitate answering real questions about scientific data.
We have used knowIT to catalogue information about data sources…. ++++++++++++++++++++ Built on top of Semantic MediaWiki As a wiki, it is easy to use And allows customization and community features necessary to encourage participation Description of data sources using selected metadata Content updated directly by IT, scientists and data source owners 170 internal and external data sources catalogued
As a Semantic Wiki Data can be accessed through search, faceted navigation, and SPARQL queries SMW gives structure to content of wiki pages And enables queries
SMW allows direct translation of page content into RDF triples Includes knowledge management tools (from MediaWiki’s experience with Wikipedia) Provides many ways to import and export content
The data source catalog in knowIT is only a first step in a larger Linked Data Framework in J&J With a first application to Neuroscience Translational medicine and open innovation are key drivers Require better understanding of data sources
Here are the details of the architecture Data source catalog and mapping of SPARQL endpoints (knowIT) Relational to RDF mapping (D2R) Triple stores (4store, OpenRDF) Repository for ontological concepts Ontologies uploaded into OpenRDF Sesame triple store
We are using NIF, but exploring adding TMO as a high level ontology framework Used to dereference URIs discovered within data cloud
The framework also provides a C# API for easy access within our application development platform
This API allows the development of visual browsing and query applications 3DX can be used to find data sources
Bring data into a spreadsheet environment, from which a rich array of analytics can be invoked
The SPARQL interface to knowIT content is used to provide incremental updates for our enterprise search engine
… and to analyze relationships within the content of each data source (RelFinder)
... as well as to visualize relationships between data sources
Next steps will include automation, integration with existing ontologies, self describing data sources, dynamic visualizations as well as a broader adoption.
Finally, I would like to take this occasion to thank …
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework Laurent Alquier, Project Lead, Informatics CoE
Scientists are looking for answers to translational questions Background images from Flickr - CC Share Alike 2.0 Basic Research Prototype Design or Discovery Preclinical Development Clinical Development FDA Filing, Approval & Launch Prep Translational Research Critical Path Research
Instead they research how to find and access data. Which data source should I choose ? Can I access it ? Who should I talk to ? Scientists want to do Science, not Data Management. Which interfaces are available ?
knowIT helps scientists and IT manage what they know about Information Systems.
Advantages of Semantic MediaWiki (SMW) Wiki pages are subjects for Semantic annotations (triples : page – property – value) MediaWiki knowledge management tools Many ways to input and output content.
<ul><li>Provide a flexible integrative data layer </li></ul><ul><ul><ul><li>Linked Data concepts can be used for flexible data integration of internal and external sources of disparate data </li></ul></ul></ul><ul><li>Enable scientists to answer novel translational questions </li></ul><ul><ul><li>Linked Data queries support diverse translational scientific questions </li></ul></ul>Linked Data Pilot for Neuroscience
Overview of the Linked Data Framework RDF CSV XML RDBMS Docs RDB2RDF mapping TM J&J Ontol. Data catalog Data browser & Visualization Ontology Curation J&J RDF SPARQL Semantic Wiki
Reviewing W3C’s Translational Medicine Ontology (TMO) as a base for J&J Ontology
<ul><ul><li>Data Source discovery </li></ul></ul><ul><ul><ul><li>Provides framework with provenance information </li></ul></ul></ul><ul><ul><li>Resource discovery </li></ul></ul><ul><ul><ul><li>Extracts relationships between concepts across data sources </li></ul></ul></ul><ul><ul><li>SPARQL Querying </li></ul></ul><ul><ul><ul><li>Constructs and captures queries, parses results </li></ul></ul></ul><ul><ul><li>Inferencing </li></ul></ul><ul><ul><ul><li>Pulls pre-constructed rule sets hosted in server directory </li></ul></ul></ul>C# API allows programmatic abstraction of….
Visualization of relationships between data sources
Next steps Automation and integration with existing ontologies Enable self-describing data sources Dynamic, layered map of data landscape Incorporate additional data sources Drive broader awareness and adoption of the tool
<ul><li>We would like to thank current and past contributors for their patience, ideas and support : </li></ul><ul><ul><li>Tim Schultz </li></ul></ul><ul><ul><li>Susie Stephens </li></ul></ul><ul><ul><li>Dimitris Agrafiotis </li></ul></ul><ul><ul><li>Rudi Verbeek </li></ul></ul><ul><ul><li>John Stong </li></ul></ul><ul><ul><li>Joe Ciervo </li></ul></ul><ul><ul><li>Keith McCormick </li></ul></ul>Acknowledgements