SlideShare a Scribd company logo
1 of 24
Incorporating Commercial and Private
Data into an Open Linked Data Platform
for Drug Discovery
Carole Goble, Alasdair J G Gray, Lee Harland,
Karen Karapetyan, Antonis Loizou, Ivan Mikhailov,
Yrjänä Rankka, Stefan Senger, Valery Tkachenko,
Antony J Williams, and Egon L Willighagen
www.openphacts.org
@open_phacts

A.J.G.Gray@hw.ac.uk
@gray_alasdair
Pre-competitive Informatics
Pharmaceutical companies are all accessing, processing,
storing & re-processing external research data

Literature Genbank
Patents PubChem

Data Integration

Databases

Data Analysis

Downloads

x

Repeat @
each
company

Firewalled Databases

Lowering industry firewalls: pre-competitive informatics in drug discovery
25/10/2013 Reviews Drug Discovery (2009) 2013
ISWC 8, 701-708 doi:10.1038/nrd2944
Nature

1
Open PHACTS objective
Apps

Interactive
responses

Open
Standards
Domain API

Provenance of
data

Drug Discovery Platform
Production
quality
25/10/2013

ISWC 2013

2
Drug Discovery Data
Pathways
Proteins
Interactions
Genes

Pharmacological
Activities

Transcripts
Clinical Drug
Applications
Biological
Processes
Pathological
Processes

25/10/2013

Drugs
Indications

Diseases
Compounds

ISWC 2013

3
Public Data
Pathways
Proteins
Interactions
Genes

Pharmacological
Activities

Transcripts
Clinical Drug
Applications
Biological
Processes
Pathological
Processes

25/10/2013

Drugs
Indications

Diseases
Compounds

ISWC 2013

4
Real Business Questions
“Let me compare
Proteins logP and PSA
MW,
for known
oxidoreductase
Genes
inhibitors”

Pathways
Interactions

“What is the
Pharmacological
selectivity profile of
Activities
known p38 inhibitors?”

Transcripts
Clinical Drug
Applications
Biological
Processes
Pathological
Processes

25/10/2013

“Find me compounds
that inhibit targets in
NFkB pathway assayed
in only functional assays
with a potency <1 μM”

Drugs
Indications

Diseases

Compounds

ISWC 2013

5
OPS Discovery Platform

Core Platform

Apps
Identity
Resolution
Service
Identifier
Management
Service

“Adenosine
receptor 2a”

Linked Data API (RDF/XML, TTL, JSON)

P12374
EC2.43.4
CS4532

Domain
Specific
Services

Semantic Workflow Engine
Chemistry
Registration
Normalisatio
n & Q/C

Data Cache
(Virtuoso Triple Store)

Indexing
VoID

VoID

VoID

Nanopub

Public
Ontologies

Db

Db

25/10/2013

VoID

Nanopub

Db

Nanopub

Db

ISWC 2013

Public Content

VoID

Commercial

User
Annotations

6
Present Content: Public Data
Source

Initial Records

Triples

Properties

ChEMBL

1,247,403

305,419,649

77

19,628

517,584

74

?

533,394,147

82

6,187

73,838

2

ChEBI

40,575

40,575

2

GeneOntology

38,137

1,265,273

26

?

23,489,501

15

ChemSpider

1,194,437

161,336,857

26

ConceptWiki

2,828,966

3,739,884

1

946

1,449,981

34

DrugBank
UniProt
ENZYME

GOA

WikiPathways
25/10/2013

ISWC 2013

7
Semantic Integration Methodology
1. Define use cases
2. Identify Data
–
–

Create RDF
VoID dataset descriptions

3. Create mappings
–
–

25/10/2013

between data set and known data sets
(instance level)
index for text to URL conversion

ISWC 2013

8
Semantic Integration Methodology
4. Ingest RDF into data cache
(i.e. triple store)
5. Define access paths to core concepts in data
6. Extend or create SPARQL queries for API calls
7. Publish API calls

25/10/2013

ISWC 2013

9
Commercial Data Use Case
“What is the
selectivity profile of
known p38 inhibitors?”

• Comprehensive data
coverage
– Commercial data
collections
– Extensive private
collections

“There is relevant
data in various
commercial datasets.”

• Control data responses
– Only authorised data

“My company X has its
own private dataset on
this topic.”

25/10/2013

ISWC 2013

10
Commercial Data Sets Pilot
Pathways
Proteins
Interactions
Genes

Pharmacological
Activities

Transcripts
Clinical Drug
Applications
Biological
Processes
Pathological
Processes

25/10/2013

Drugs
Indications

Diseases
Compounds

ISWC 2013

11
Linked Open Data
★

make your stuff available on the Web
(whatever format) under an open license
★★
make it available as structured data
(e.g. Excel instead of image scan of a table)
★★★
use non-proprietary formats
(e.g. CSV instead of Excel)
★★★★ use URIs to denote things,
so that people can point at your stuff
★★★★★ link your data to other data
to provide context
http://5stardata.info/
25/10/2013

ISWC 2013

12
Commercial Linked Data
• Same conversion challenges as Open Data!
– Goal to have 5★ linked data
– www.openphacts.org/specs/rdfguide/

• Pilot (sample) data provided as data dumps
– XML
– CSV
– RDF

• Structurally similar to ChEMBL
• Converted to interoperable RDF
25/10/2013

ISWC 2013

13
Data Modelling Challenges
• Contain private terminologies
– Mapped to public equivalents
– On going work

• Units represented as strings
– Not always consistent, e.g. IC50, IC_50, IC-50
– QUDT extended, e.g. IC50
– www.openphacts.org/specs/units/

25/10/2013

ISWC 2013

14
Dataset Descriptions
www.openphacts.org/specs/datadesc/

Enable
• Discovery
– Name
– Description
– Coverage

• Access control
– License
– File locations

• Answer Provenance
– Returned data links to
description
25/10/2013

Commercial Data
Description
• Publicly discoverable
– Advertisement for data
– Bring in more customers

• Restricted access by
license
Private Data Description
• Hidden to all but
authorised
• Restricted access
ISWC 2013

15
Chemical mappings
• Data is messy!
• Identify common
problems:
– Charge imbalance
– Stereochemistry

• Link based on structure

25/10/2013

ISWC 2013

16
Chemistry Registration
ChemSpider Service

Chemical Registration Service

• Validates and
standardizes chemical
representations
• Manual curation by
RSC staff
• Data loaded in
ChemSpider
• Open data: unsuitable for

• Utilizes ChemSpider
Validation and
Standardization platform
• Utilizes FDA rule set as
basis for standardization
• Generates OPSID for
chemicals
• Computes properties

– Commercial data
– Private data

25/10/2013

ISWC 2013

17
Access Requirements

25/10/2013

ISWC 2013

18
Data Access
• Each data set loaded into separate graph in
cache
• Pilot data same form as open ChEMBL data
– Extend queries with sub-queries for each set

• Restricted access
– Virtuoso offers graph-based access restriction
– Commercial data sets turned on/off

25/10/2013

ISWC 2013

19
Conclusions
• Drug discovery requires full data coverage
– Public/open data
• Open description
• Open data

– Commercial data
• Open description
• Restricted data

– Private data
• Restricted description
• Restricted data

• Pilot study with three commercial datasets
25/10/2013

ISWC 2013

20
Conclusions
• Data Modelling
– Similar challenges as public data

• Access restriction
– Provided by standard mechanisms
– Graph-based access

• Open PHACTS Discovery Platform
– Releasing version 1.3 (late 2013)
– Version 1.4 will contain commercial data (2014)
25/10/2013

ISWC 2013

21
Acknowledgements
• GVK Bio
GOSTAR
gostardb.com
• Thomson Reuters
Integrity
integrity.thomson-pharma.com
• Aureus Sciences Elsevier
AurSCOPE
www.aureus-sciences.com
25/10/2013

ISWC 2013

22
Questions
A.J.G.Gray@hw.ac.uk
www.macs.hw.ac.uk/~ajg33
@gray_alasdair

Open PHACTS Project

pmu@openphacts.org
www.openphacts.org
@open_phacts

More Related Content

What's hot

Data integrity
Data integrityData integrity
Data integrityKiran Kota
 
Liraglutide - Comprehensive patent search
Liraglutide  - Comprehensive patent searchLiraglutide  - Comprehensive patent search
Liraglutide - Comprehensive patent searchReportsnReports
 
Rifaximin - Comprehensive patent search
Rifaximin  - Comprehensive patent searchRifaximin  - Comprehensive patent search
Rifaximin - Comprehensive patent searchReportsnReports
 
FDA Data Integrity Issues - DMS hot fixes
FDA Data Integrity Issues - DMS hot fixesFDA Data Integrity Issues - DMS hot fixes
FDA Data Integrity Issues - DMS hot fixesVidyasagar P
 
Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019Kees van Bochove
 
Ensuring data integrity in pharmaceutical environment
Ensuring data integrity in pharmaceutical environmentEnsuring data integrity in pharmaceutical environment
Ensuring data integrity in pharmaceutical environmentdeepak mishra
 
2019-10-11 The value of FAIR data in health data networks - The Hyve - ELIXIR...
2019-10-11 The value of FAIR data in health data networks - The Hyve - ELIXIR...2019-10-11 The value of FAIR data in health data networks - The Hyve - ELIXIR...
2019-10-11 The value of FAIR data in health data networks - The Hyve - ELIXIR...Kees van Bochove
 
Data Integrity Issues in Pharmaceutical Companies
Data Integrity Issues in Pharmaceutical CompaniesData Integrity Issues in Pharmaceutical Companies
Data Integrity Issues in Pharmaceutical CompaniesPiyush Tripathi
 
Data integrity: TGA expectations
Data integrity: TGA expectationsData integrity: TGA expectations
Data integrity: TGA expectationsTGA Australia
 
Vilazodone - Comprehensive patent search
Vilazodone  - Comprehensive patent searchVilazodone  - Comprehensive patent search
Vilazodone - Comprehensive patent searchReportsnReports
 
Planning for the New Individual Case Safety Report (ICSR) International Stand...
Planning for the New Individual Case Safety Report (ICSR) International Stand...Planning for the New Individual Case Safety Report (ICSR) International Stand...
Planning for the New Individual Case Safety Report (ICSR) International Stand...Perficient
 
Colesevelam - Comprehensive patent search
Colesevelam   - Comprehensive patent searchColesevelam   - Comprehensive patent search
Colesevelam - Comprehensive patent searchReportsnReports
 
Data Integrity; Ensuring GMP Six Systems Compliance Pharma Training
Data Integrity; Ensuring GMP Six Systems Compliance Pharma TrainingData Integrity; Ensuring GMP Six Systems Compliance Pharma Training
Data Integrity; Ensuring GMP Six Systems Compliance Pharma TrainingMarcep Inc.
 
USUGM 2014 - Gerald Wyckoff (Chemalytics): Development of the Chemalytics Pl...
USUGM 2014 -  Gerald Wyckoff (Chemalytics): Development of the Chemalytics Pl...USUGM 2014 -  Gerald Wyckoff (Chemalytics): Development of the Chemalytics Pl...
USUGM 2014 - Gerald Wyckoff (Chemalytics): Development of the Chemalytics Pl...ChemAxon
 
Data integrity - Regulatory Perspective and Challenges:
Data integrity - Regulatory Perspective and Challenges: Data integrity - Regulatory Perspective and Challenges:
Data integrity - Regulatory Perspective and Challenges: santoshnarla
 
Treprostinil - Comprehensive patent search
Treprostinil  - Comprehensive patent searchTreprostinil  - Comprehensive patent search
Treprostinil - Comprehensive patent searchReportsnReports
 
Nevirapine - Comprehensive patent search
Nevirapine  - Comprehensive patent searchNevirapine  - Comprehensive patent search
Nevirapine - Comprehensive patent searchReportsnReports
 

What's hot (20)

Data integrity
Data integrityData integrity
Data integrity
 
Data integrity
Data integrityData integrity
Data integrity
 
Liraglutide - Comprehensive patent search
Liraglutide  - Comprehensive patent searchLiraglutide  - Comprehensive patent search
Liraglutide - Comprehensive patent search
 
Rifaximin - Comprehensive patent search
Rifaximin  - Comprehensive patent searchRifaximin  - Comprehensive patent search
Rifaximin - Comprehensive patent search
 
FDA Data Integrity Issues - DMS hot fixes
FDA Data Integrity Issues - DMS hot fixesFDA Data Integrity Issues - DMS hot fixes
FDA Data Integrity Issues - DMS hot fixes
 
Gdp alcoa
Gdp  alcoaGdp  alcoa
Gdp alcoa
 
Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019
 
Ensuring data integrity in pharmaceutical environment
Ensuring data integrity in pharmaceutical environmentEnsuring data integrity in pharmaceutical environment
Ensuring data integrity in pharmaceutical environment
 
Data integrity nishodh 01092016
Data integrity nishodh 01092016Data integrity nishodh 01092016
Data integrity nishodh 01092016
 
2019-10-11 The value of FAIR data in health data networks - The Hyve - ELIXIR...
2019-10-11 The value of FAIR data in health data networks - The Hyve - ELIXIR...2019-10-11 The value of FAIR data in health data networks - The Hyve - ELIXIR...
2019-10-11 The value of FAIR data in health data networks - The Hyve - ELIXIR...
 
Data Integrity Issues in Pharmaceutical Companies
Data Integrity Issues in Pharmaceutical CompaniesData Integrity Issues in Pharmaceutical Companies
Data Integrity Issues in Pharmaceutical Companies
 
Data integrity: TGA expectations
Data integrity: TGA expectationsData integrity: TGA expectations
Data integrity: TGA expectations
 
Vilazodone - Comprehensive patent search
Vilazodone  - Comprehensive patent searchVilazodone  - Comprehensive patent search
Vilazodone - Comprehensive patent search
 
Planning for the New Individual Case Safety Report (ICSR) International Stand...
Planning for the New Individual Case Safety Report (ICSR) International Stand...Planning for the New Individual Case Safety Report (ICSR) International Stand...
Planning for the New Individual Case Safety Report (ICSR) International Stand...
 
Colesevelam - Comprehensive patent search
Colesevelam   - Comprehensive patent searchColesevelam   - Comprehensive patent search
Colesevelam - Comprehensive patent search
 
Data Integrity; Ensuring GMP Six Systems Compliance Pharma Training
Data Integrity; Ensuring GMP Six Systems Compliance Pharma TrainingData Integrity; Ensuring GMP Six Systems Compliance Pharma Training
Data Integrity; Ensuring GMP Six Systems Compliance Pharma Training
 
USUGM 2014 - Gerald Wyckoff (Chemalytics): Development of the Chemalytics Pl...
USUGM 2014 -  Gerald Wyckoff (Chemalytics): Development of the Chemalytics Pl...USUGM 2014 -  Gerald Wyckoff (Chemalytics): Development of the Chemalytics Pl...
USUGM 2014 - Gerald Wyckoff (Chemalytics): Development of the Chemalytics Pl...
 
Data integrity - Regulatory Perspective and Challenges:
Data integrity - Regulatory Perspective and Challenges: Data integrity - Regulatory Perspective and Challenges:
Data integrity - Regulatory Perspective and Challenges:
 
Treprostinil - Comprehensive patent search
Treprostinil  - Comprehensive patent searchTreprostinil  - Comprehensive patent search
Treprostinil - Comprehensive patent search
 
Nevirapine - Comprehensive patent search
Nevirapine  - Comprehensive patent searchNevirapine  - Comprehensive patent search
Nevirapine - Comprehensive patent search
 

Similar to Integrating Commercial and Private Data into an Open Linked Data Platform for Drug Discovery

Late Binding: The New Standard For Data Warehousing
Late Binding: The New Standard For Data WarehousingLate Binding: The New Standard For Data Warehousing
Late Binding: The New Standard For Data WarehousingHealth Catalyst
 
7 Steps for Boosting R&D Outcomes
7 Steps for Boosting R&D Outcomes7 Steps for Boosting R&D Outcomes
7 Steps for Boosting R&D OutcomesTamrMarketing
 
Late Binding in Data Warehouses: Desiging for Analytic Agility
Late Binding in Data Warehouses: Desiging for Analytic AgilityLate Binding in Data Warehouses: Desiging for Analytic Agility
Late Binding in Data Warehouses: Desiging for Analytic AgilityHealth Catalyst
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...David Peyruc
 
Late Binding in Data Warehouses
Late Binding in Data WarehousesLate Binding in Data Warehouses
Late Binding in Data WarehousesDale Sanders
 
2013 01-14 ops-dataset_descriptions
2013 01-14 ops-dataset_descriptions2013 01-14 ops-dataset_descriptions
2013 01-14 ops-dataset_descriptionsAlasdair Gray
 
Transforming Big Data into Big Value
Transforming Big Data into Big ValueTransforming Big Data into Big Value
Transforming Big Data into Big ValueThomas Kelly, PMP
 
UDM (Unified Data Model) - Enabling Exchange of Comprehensive Reaction Inform...
UDM (Unified Data Model) - Enabling Exchange of Comprehensive Reaction Inform...UDM (Unified Data Model) - Enabling Exchange of Comprehensive Reaction Inform...
UDM (Unified Data Model) - Enabling Exchange of Comprehensive Reaction Inform...Frederik van den Broek
 
Seeing Is Believing: How Clinical Trial Data Transparency is Changing How an...
Seeing Is Believing:  How Clinical Trial Data Transparency is Changing How an...Seeing Is Believing:  How Clinical Trial Data Transparency is Changing How an...
Seeing Is Believing: How Clinical Trial Data Transparency is Changing How an...d-Wise Technologies
 
Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...
Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...
Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...EMC
 
Hybrid Architecture with Ike & Data Libraries
Hybrid Architecture with Ike  & Data LibrariesHybrid Architecture with Ike  & Data Libraries
Hybrid Architecture with Ike & Data LibrariesStephen Allan Weitzman
 
2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovation2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovationopen_phacts
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Introduction to healthcare and life sciences
Introduction to healthcare and life sciencesIntroduction to healthcare and life sciences
Introduction to healthcare and life sciencesFernando Mesa
 
Enterprise Analytics: Serving Big Data Projects for Healthcare
Enterprise Analytics: Serving Big Data Projects for HealthcareEnterprise Analytics: Serving Big Data Projects for Healthcare
Enterprise Analytics: Serving Big Data Projects for HealthcareDATA360US
 
CDISC & Risk Based Monitoring to Compress Clinical Trial Duration
CDISC & Risk Based Monitoring to Compress Clinical Trial DurationCDISC & Risk Based Monitoring to Compress Clinical Trial Duration
CDISC & Risk Based Monitoring to Compress Clinical Trial DurationClinical Data Inc .
 
Open PHACTS MIOSS may 2016
Open PHACTS MIOSS may 2016Open PHACTS MIOSS may 2016
Open PHACTS MIOSS may 2016open_phacts
 
Update regulatory standards landscape
Update   regulatory standards landscapeUpdate   regulatory standards landscape
Update regulatory standards landscapeFred Miller
 

Similar to Integrating Commercial and Private Data into an Open Linked Data Platform for Drug Discovery (20)

Late Binding: The New Standard For Data Warehousing
Late Binding: The New Standard For Data WarehousingLate Binding: The New Standard For Data Warehousing
Late Binding: The New Standard For Data Warehousing
 
7 Steps for Boosting R&D Outcomes
7 Steps for Boosting R&D Outcomes7 Steps for Boosting R&D Outcomes
7 Steps for Boosting R&D Outcomes
 
Late Binding in Data Warehouses: Desiging for Analytic Agility
Late Binding in Data Warehouses: Desiging for Analytic AgilityLate Binding in Data Warehouses: Desiging for Analytic Agility
Late Binding in Data Warehouses: Desiging for Analytic Agility
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
 
Late Binding in Data Warehouses
Late Binding in Data WarehousesLate Binding in Data Warehouses
Late Binding in Data Warehouses
 
2013 01-14 ops-dataset_descriptions
2013 01-14 ops-dataset_descriptions2013 01-14 ops-dataset_descriptions
2013 01-14 ops-dataset_descriptions
 
Transforming Big Data into Big Value
Transforming Big Data into Big ValueTransforming Big Data into Big Value
Transforming Big Data into Big Value
 
UDM (Unified Data Model) - Enabling Exchange of Comprehensive Reaction Inform...
UDM (Unified Data Model) - Enabling Exchange of Comprehensive Reaction Inform...UDM (Unified Data Model) - Enabling Exchange of Comprehensive Reaction Inform...
UDM (Unified Data Model) - Enabling Exchange of Comprehensive Reaction Inform...
 
Seeing Is Believing: How Clinical Trial Data Transparency is Changing How an...
Seeing Is Believing:  How Clinical Trial Data Transparency is Changing How an...Seeing Is Believing:  How Clinical Trial Data Transparency is Changing How an...
Seeing Is Believing: How Clinical Trial Data Transparency is Changing How an...
 
Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...
Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...
Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...
 
Hybrid Architecture with Ike & Data Libraries
Hybrid Architecture with Ike  & Data LibrariesHybrid Architecture with Ike  & Data Libraries
Hybrid Architecture with Ike & Data Libraries
 
2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovation2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovation
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Introduction to healthcare and life sciences
Introduction to healthcare and life sciencesIntroduction to healthcare and life sciences
Introduction to healthcare and life sciences
 
Enterprise Analytics: Serving Big Data Projects for Healthcare
Enterprise Analytics: Serving Big Data Projects for HealthcareEnterprise Analytics: Serving Big Data Projects for Healthcare
Enterprise Analytics: Serving Big Data Projects for Healthcare
 
CDISC & Risk Based Monitoring to Compress Clinical Trial Duration
CDISC & Risk Based Monitoring to Compress Clinical Trial DurationCDISC & Risk Based Monitoring to Compress Clinical Trial Duration
CDISC & Risk Based Monitoring to Compress Clinical Trial Duration
 
Open PHACTS MIOSS may 2016
Open PHACTS MIOSS may 2016Open PHACTS MIOSS may 2016
Open PHACTS MIOSS may 2016
 
Update regulatory standards landscape
Update   regulatory standards landscapeUpdate   regulatory standards landscape
Update regulatory standards landscape
 
Mark Barnes, "Data Sharing and Compensation for Clinical Trial Injuries in In...
Mark Barnes, "Data Sharing and Compensation for Clinical Trial Injuries in In...Mark Barnes, "Data Sharing and Compensation for Clinical Trial Injuries in In...
Mark Barnes, "Data Sharing and Compensation for Clinical Trial Injuries in In...
 
Webinar on the 21st Century Cures Act Final Rule
Webinar on the 21st Century Cures Act Final RuleWebinar on the 21st Century Cures Act Final Rule
Webinar on the 21st Century Cures Act Final Rule
 

More from Alasdair Gray

Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...Alasdair Gray
 
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...Alasdair Gray
 
An Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland ProjectAn Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland ProjectAlasdair Gray
 
Supporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life SciencesSupporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life SciencesAlasdair Gray
 
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...Alasdair Gray
 
Validata: A tool for testing profile conformance
Validata: A tool for testing profile conformanceValidata: A tool for testing profile conformance
Validata: A tool for testing profile conformanceAlasdair Gray
 
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
The HCLS Community Profile: Describing Datasets, Versions, and DistributionsThe HCLS Community Profile: Describing Datasets, Versions, and Distributions
The HCLS Community Profile: Describing Datasets, Versions, and DistributionsAlasdair Gray
 
Open PHACTS: The Data Today
Open PHACTS: The Data TodayOpen PHACTS: The Data Today
Open PHACTS: The Data TodayAlasdair Gray
 
Data Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case StudyData Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case StudyAlasdair Gray
 
Data Integration in a Big Data Context
Data Integration in a Big Data ContextData Integration in a Big Data Context
Data Integration in a Big Data ContextAlasdair Gray
 
Scientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry dataScientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry dataAlasdair Gray
 
Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...Alasdair Gray
 
Describing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community ProfileDescribing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community ProfileAlasdair Gray
 
Data Science meets Linked Data
Data Science meets Linked DataData Science meets Linked Data
Data Science meets Linked DataAlasdair Gray
 
Sensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-beingSensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-beingAlasdair Gray
 
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...Alasdair Gray
 
Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLSDataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLSAlasdair Gray
 

More from Alasdair Gray (20)

Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
 
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
 
An Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland ProjectAn Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland Project
 
Supporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life SciencesSupporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life Sciences
 
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
 
Validata: A tool for testing profile conformance
Validata: A tool for testing profile conformanceValidata: A tool for testing profile conformance
Validata: A tool for testing profile conformance
 
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
The HCLS Community Profile: Describing Datasets, Versions, and DistributionsThe HCLS Community Profile: Describing Datasets, Versions, and Distributions
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
 
Open PHACTS: The Data Today
Open PHACTS: The Data TodayOpen PHACTS: The Data Today
Open PHACTS: The Data Today
 
Project X
Project XProject X
Project X
 
Data Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case StudyData Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case Study
 
Data Integration in a Big Data Context
Data Integration in a Big Data ContextData Integration in a Big Data Context
Data Integration in a Big Data Context
 
Data Linkage
Data LinkageData Linkage
Data Linkage
 
Scientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry dataScientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry data
 
Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...
 
Describing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community ProfileDescribing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community Profile
 
SensorBench
SensorBenchSensorBench
SensorBench
 
Data Science meets Linked Data
Data Science meets Linked DataData Science meets Linked Data
Data Science meets Linked Data
 
Sensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-beingSensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-being
 
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
 
Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLSDataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLS
 

Recently uploaded

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Integrating Commercial and Private Data into an Open Linked Data Platform for Drug Discovery

  • 1. Incorporating Commercial and Private Data into an Open Linked Data Platform for Drug Discovery Carole Goble, Alasdair J G Gray, Lee Harland, Karen Karapetyan, Antonis Loizou, Ivan Mikhailov, Yrjänä Rankka, Stefan Senger, Valery Tkachenko, Antony J Williams, and Egon L Willighagen www.openphacts.org @open_phacts A.J.G.Gray@hw.ac.uk @gray_alasdair
  • 2. Pre-competitive Informatics Pharmaceutical companies are all accessing, processing, storing & re-processing external research data Literature Genbank Patents PubChem Data Integration Databases Data Analysis Downloads x Repeat @ each company Firewalled Databases Lowering industry firewalls: pre-competitive informatics in drug discovery 25/10/2013 Reviews Drug Discovery (2009) 2013 ISWC 8, 701-708 doi:10.1038/nrd2944 Nature 1
  • 3. Open PHACTS objective Apps Interactive responses Open Standards Domain API Provenance of data Drug Discovery Platform Production quality 25/10/2013 ISWC 2013 2
  • 4. Drug Discovery Data Pathways Proteins Interactions Genes Pharmacological Activities Transcripts Clinical Drug Applications Biological Processes Pathological Processes 25/10/2013 Drugs Indications Diseases Compounds ISWC 2013 3
  • 6. Real Business Questions “Let me compare Proteins logP and PSA MW, for known oxidoreductase Genes inhibitors” Pathways Interactions “What is the Pharmacological selectivity profile of Activities known p38 inhibitors?” Transcripts Clinical Drug Applications Biological Processes Pathological Processes 25/10/2013 “Find me compounds that inhibit targets in NFkB pathway assayed in only functional assays with a potency <1 μM” Drugs Indications Diseases Compounds ISWC 2013 5
  • 7. OPS Discovery Platform Core Platform Apps Identity Resolution Service Identifier Management Service “Adenosine receptor 2a” Linked Data API (RDF/XML, TTL, JSON) P12374 EC2.43.4 CS4532 Domain Specific Services Semantic Workflow Engine Chemistry Registration Normalisatio n & Q/C Data Cache (Virtuoso Triple Store) Indexing VoID VoID VoID Nanopub Public Ontologies Db Db 25/10/2013 VoID Nanopub Db Nanopub Db ISWC 2013 Public Content VoID Commercial User Annotations 6
  • 8. Present Content: Public Data Source Initial Records Triples Properties ChEMBL 1,247,403 305,419,649 77 19,628 517,584 74 ? 533,394,147 82 6,187 73,838 2 ChEBI 40,575 40,575 2 GeneOntology 38,137 1,265,273 26 ? 23,489,501 15 ChemSpider 1,194,437 161,336,857 26 ConceptWiki 2,828,966 3,739,884 1 946 1,449,981 34 DrugBank UniProt ENZYME GOA WikiPathways 25/10/2013 ISWC 2013 7
  • 9. Semantic Integration Methodology 1. Define use cases 2. Identify Data – – Create RDF VoID dataset descriptions 3. Create mappings – – 25/10/2013 between data set and known data sets (instance level) index for text to URL conversion ISWC 2013 8
  • 10. Semantic Integration Methodology 4. Ingest RDF into data cache (i.e. triple store) 5. Define access paths to core concepts in data 6. Extend or create SPARQL queries for API calls 7. Publish API calls 25/10/2013 ISWC 2013 9
  • 11. Commercial Data Use Case “What is the selectivity profile of known p38 inhibitors?” • Comprehensive data coverage – Commercial data collections – Extensive private collections “There is relevant data in various commercial datasets.” • Control data responses – Only authorised data “My company X has its own private dataset on this topic.” 25/10/2013 ISWC 2013 10
  • 12. Commercial Data Sets Pilot Pathways Proteins Interactions Genes Pharmacological Activities Transcripts Clinical Drug Applications Biological Processes Pathological Processes 25/10/2013 Drugs Indications Diseases Compounds ISWC 2013 11
  • 13. Linked Open Data ★ make your stuff available on the Web (whatever format) under an open license ★★ make it available as structured data (e.g. Excel instead of image scan of a table) ★★★ use non-proprietary formats (e.g. CSV instead of Excel) ★★★★ use URIs to denote things, so that people can point at your stuff ★★★★★ link your data to other data to provide context http://5stardata.info/ 25/10/2013 ISWC 2013 12
  • 14. Commercial Linked Data • Same conversion challenges as Open Data! – Goal to have 5★ linked data – www.openphacts.org/specs/rdfguide/ • Pilot (sample) data provided as data dumps – XML – CSV – RDF • Structurally similar to ChEMBL • Converted to interoperable RDF 25/10/2013 ISWC 2013 13
  • 15. Data Modelling Challenges • Contain private terminologies – Mapped to public equivalents – On going work • Units represented as strings – Not always consistent, e.g. IC50, IC_50, IC-50 – QUDT extended, e.g. IC50 – www.openphacts.org/specs/units/ 25/10/2013 ISWC 2013 14
  • 16. Dataset Descriptions www.openphacts.org/specs/datadesc/ Enable • Discovery – Name – Description – Coverage • Access control – License – File locations • Answer Provenance – Returned data links to description 25/10/2013 Commercial Data Description • Publicly discoverable – Advertisement for data – Bring in more customers • Restricted access by license Private Data Description • Hidden to all but authorised • Restricted access ISWC 2013 15
  • 17. Chemical mappings • Data is messy! • Identify common problems: – Charge imbalance – Stereochemistry • Link based on structure 25/10/2013 ISWC 2013 16
  • 18. Chemistry Registration ChemSpider Service Chemical Registration Service • Validates and standardizes chemical representations • Manual curation by RSC staff • Data loaded in ChemSpider • Open data: unsuitable for • Utilizes ChemSpider Validation and Standardization platform • Utilizes FDA rule set as basis for standardization • Generates OPSID for chemicals • Computes properties – Commercial data – Private data 25/10/2013 ISWC 2013 17
  • 20. Data Access • Each data set loaded into separate graph in cache • Pilot data same form as open ChEMBL data – Extend queries with sub-queries for each set • Restricted access – Virtuoso offers graph-based access restriction – Commercial data sets turned on/off 25/10/2013 ISWC 2013 19
  • 21. Conclusions • Drug discovery requires full data coverage – Public/open data • Open description • Open data – Commercial data • Open description • Restricted data – Private data • Restricted description • Restricted data • Pilot study with three commercial datasets 25/10/2013 ISWC 2013 20
  • 22. Conclusions • Data Modelling – Similar challenges as public data • Access restriction – Provided by standard mechanisms – Graph-based access • Open PHACTS Discovery Platform – Releasing version 1.3 (late 2013) – Version 1.4 will contain commercial data (2014) 25/10/2013 ISWC 2013 21
  • 23. Acknowledgements • GVK Bio GOSTAR gostardb.com • Thomson Reuters Integrity integrity.thomson-pharma.com • Aureus Sciences Elsevier AurSCOPE www.aureus-sciences.com 25/10/2013 ISWC 2013 22

Editor's Notes

  1. Big waste of resourcesNo competitive advantage
  2. A platform for integratedpharmacology data Reliedupon by pharma companiesPublic domain, commercial, and private data sourcesBasedon open standardsProvidesdomainspecific APIMakingiteasyto build multiple drugdiscoveryapplications:examplesdeveloped in the project
  3. Comprehensive picture requiredDrug discovery requires data from lots of domains:Chemistry: compound propertiesGenes:Proteins:Drug interactionsDiseasesBiological processes/interactionsPharmacology activities
  4. Already exist many public open datasets
  5. Driven by real business questions and use cases
  6. Cache copies of data: (1) Performance (2) Social (1000s hits a day)Chemistry data normalisation/alignment through ChemSpiderDomain specific APIAPI calls populate SPARQL queries
  7. Statistics to be added1,030,727,289 triplesHosted on beefy hardware; data in memory (aim)
  8. More data!Driven by users
  9. Sample of three commercial datasetsInformation on handful of targets only
  10. Commercial data does not have open license
  11. Same challenges as open data: data about the same things
  12. Statistics may affect the perception of the commercial dataset
  13. Data is messy; even chemicalsRequire linking between datasetsCommon backbone to integrate data
  14. FDA: US Food and Drug Administration Substance Registration SystemOPSID: Open PHACTS identifier for chemicals
  15. Data sets with license requirements