INCREASING THE EFFICIENCY OF PHARMACEUTICAL 
RESEARCH THROUGH DATA INTEGRATION 
Dr. Roland Bauer 
12-15.Oct. 2014 
ICIC 2014 Heidelberg 
Project Manager Content Integration & Development 
Elsevier Information Systems GmbH, Frankfurt 
ro.bauer@elsevier.com 
Matthew Clark Ph.D. 
Consultant, Life Science Services 
Elsevier Inc. Philadelphia, PA 
m.clark@elsevier.com
2 
ABOUT ME: 
- “Babes –Bolyai” University, Cluj-Napoca, 
Romania 
- Max-Planck-Institute for Polymer Research, 
Mainz, Germany 
- Elsevier
3 
Introduction & Setting the Stage 
Why? 
Content Integration : The Reaxys Case 
Integration Process Project Overview 
AGENDA
4 
INTRODUCTION & SETTING THE STAGE: THE DRUG 
DISCOVERY INFORMATION LANDSCAPE
5 
INTRODUCTION & SETTING THE STAGE: TENDENCIES IN THE 
DRUG DISCOVERY INFORMATION LANDSCAPE
6 
INTRODUCTION & SETTING THE STAGE: TENDENCIES IN THE 
DRUG DISCOVERY INFORMATION LANDSCAPE
WHY? 
7
ISSUE: CHEMICAL INFORMATION ACCESS IS FRAGMENTED 
• End users must learn many 
interfaces 
• Different data sources 
have different 
capabilities for searching 
• Scientists may not search all 
appropriate data sources 
Licensed 
Database 
Licensed 
Database 
Catalog 
Catalog 
E-Notebook 
E-Notebook 
References/ 
Full Text
INTEGRATION OF DATA PROVIDES BETTER ANSWERS 
Searching multiple sources with one search via a single 
interface increases efficiency 
Harmonized indexing allows asking similar question 
among all sources 
9 
Easier 
Access 
Enhanced 
usage 
Better 
value for 
investment 
Better 
decisions 
Faster 
progress
CONTENT INTEGRATION : 
THE REAXYS CASE 
10
11 
THE REAXYS DATABASE: 
CONTAINS INTEGRATED PUBLISHED CHEMISTRY DATA
12 
THE REAXYS DATABASE: 
…ALONG WITH EXPANDED BIBLIOGRAPHICAL INFORMATION
13 
THE REAXYS TREE : BROWSE CONTENT BY ONTOLOGY
TWO APPROACHES TOWARDS INTEGRATED CONTENT 
14 
Analysis 
system 
End-User 
Central 
Storage 
FEDERATED MODEL WAREHOUSE MODEL
TWO APPROACHES TOWARDS INTEGRATED CONTENT 
15 
FEDERATED MODEL WAREHOUSE MODEL 
Pros: 
- Easy scalability in case of new 
data sources 
- Delivery of short term „wins“ 
- Maintenance costs 
Cons: 
- Lack of normalization and 
harmonized indexing 
- Performance and availability 
dependent on the source 
systems 
Pros: 
- High data quality trough 
normalization 
- Unified Queries and Filters 
applicable 
Cons: 
- Long implementation times & 
higher starting costs 
- Expensive and difficult to 
accommodate changes in data 
types
16 
REAXYS EXTERNAL CONTENT INTEGRATION 
Database 
End-User 
ELN 1 
ELN 2 
CUSTOM IN 
HOUSE 
REACTIONS 
SOURCE 
Indexed 
Storage 
RX CONTENT EXTERNAL CONTENT
Customer Hosted 
17 
REAXYS EXTERNAL CONTENT INTEGRATION: 
IN HOUSE SCENARIO 
Database 
End-User 
ELN 1 
ELN 2 
CUSTOM IN 
HOUSE 
REACTIONS 
SOURCE 
Indexed 
Storage 
RX CONTENT EXTERNAL CONTENT
Customer 
Hosted 
Elsevier Hosted 
18 
REAXYS EXTERNAL CONTENT INTEGRATION: 
ELSEVIER HOSTED SCENARIO 
Database 
End-User 
ELN 1 
ELN 2 
CUSTOM IN 
HOUSE 
REACTIONS 
SOURCE 
Indexed 
Storage 
RX CONTENT EXTERNAL CONTENT
Elsevier Hosted Customer Hosted 
19 
REAXYS EXTERNAL CONTENT INTEGRATION: 
HYBRID HOSTING SCENARIO 
Database 
End-User 
ELN 1 
ELN 2 
CUSTOM IN 
HOUSE 
REACTIONS 
SOURCE 
Indexed 
Storage 
RX CONTENT EXTERNAL CONTENT
REAXYS PROVIDES A UNIFIED INFORMATION PORTAL 
• Provides a single powerful 
interface 
• Can integrate several 
notebook systems 
• Links chemistry, structures, 
sourcing, citations, and 
full-text of articles 
Structures, 
reactions, and 
Full-Text 
Licensed 
Reaction and 
Structure 
Databases 
E-Notebook 
Binding, 
Properties 
E-Notebook 
Patents
INTEGRATED SOLUTION SEARCH 
21 
List of integrated 
sources 
Sources list can include licensed 
databases, and multiple e-notebooks 
from organizational 
units 
All e-notebooks can be integrated 
and searched together
REACTION SEARCH RESULTS SEPARATED BY SOURCE 
22 
Results from 
each source on 
separate tab 
Show corresponding 
substances in … 
PubChem 
eMolecules 
Licensed 
PharmaCo e-notebook 
PharmaCo2 e-notebook 
Cross link to 
substance in all 
other sources 
where it is found 
E-notebooks
SUBSTANCE RESULTS 
23 
Results from each source on separate tab 
Including PubChem and eMolecules 
All filters fully 
active
INTEGRATION CASE STUDY: ROCHE IN HOUSE HOSTED 
Integrated Reaxys with several data sources: 
• Medicinal Chemistry E-notebooks 
• Development Chemistry E-notebooks 
• Several E-notebook systems of acquired organizations 
• Licensed Databases 
• Current Chemical Reactions 
• Several other databases 
Links to many more sources 
• Roche stockroom availability 
• Patent/Literature full text 
• Link to original e-notebook pages 
24 
Reaxys integrates these e-notebooks 
with each other, 
while they are still maintained 
as separate systems
CASE STUDY: ROCHE KEY DRIVERS 
From ACS Presentation by 
Michael Kapler, Roche Pharma Research and Early Development 
http://abstracts.acs.org/chem/245nm/program/view.php?obj_id=188977
INTEGRATION PROCESS 
PROJECT OVERVIEW 
26
PROCESS OVERVIEW FOR AN INTEGRATION PROJECT 
Initialisation: 
- Evaluation of Datasources and needed 
resources 
- Determine hosting scenario 
- Commercial and legal framework 
Kick-of: 
-requirements harvesting 
-establish milestones and top down workstreams 
-Refine & finalize plans 
Execution: 
-implement automatised ETL process 
-implement application customisation 
-install IT infrastructuee and interfaces 
Delivery 
- BETA release 
- Refinement 
- GoLive 
Sprint 1 
Sprint 2 
Sprint 3 
Sprint 4 
Hand over to BAU /Maintenance 
- Sprint Iterations a 3 weeks 
-Sprint number dependent 
on complexity 
(5-…)
DATA MODEL AND USER INTERFACE PROCESS 
Determine 
data 
sources 
Map to Reaxys 
Integration Data 
Model (XML) 
User interface 
configuration for new 
fields 
Unit 
conversions, 
data cleaning 
E-notebooks 
Licensed databases 
°C, K 
moles, grams 
Identify URL 
links to E-notebooks 
an other 
resources 
This is a key step for 
the integration project 
Design location, nature of displaying 
the fields, urls etc.
AUTOMATISED FABRICATION PROCESS (ETL) 
E-notebook 
1 
E-notebook 
2 
Bioassay db 
Transmit to 
fabrication 
server (sftp, 
scp) 
Fabrication 
combines data 
with Reaxys data 
for production 
Daily 
extraction to 
XML using 
defined data 
model 
…
30 
THANK YOU – QUESTIONS? 
Project Manager Content Integration & Development 
Elsevier Information Systems GmbH, Frankfurt 
ro.bauer@elsevier.com 
Matthew Clark Ph.D. 
Consultant, Life Science Services 
Elsevier Inc. Philadelphia, PA 
m.clark@elsevier.com 
Dr. Roland Bauer

ICIC 2014 Increasing the efficiency of pharmaceutical research through data integration

  • 1.
    INCREASING THE EFFICIENCYOF PHARMACEUTICAL RESEARCH THROUGH DATA INTEGRATION Dr. Roland Bauer 12-15.Oct. 2014 ICIC 2014 Heidelberg Project Manager Content Integration & Development Elsevier Information Systems GmbH, Frankfurt ro.bauer@elsevier.com Matthew Clark Ph.D. Consultant, Life Science Services Elsevier Inc. Philadelphia, PA m.clark@elsevier.com
  • 2.
    2 ABOUT ME: - “Babes –Bolyai” University, Cluj-Napoca, Romania - Max-Planck-Institute for Polymer Research, Mainz, Germany - Elsevier
  • 3.
    3 Introduction &Setting the Stage Why? Content Integration : The Reaxys Case Integration Process Project Overview AGENDA
  • 4.
    4 INTRODUCTION &SETTING THE STAGE: THE DRUG DISCOVERY INFORMATION LANDSCAPE
  • 5.
    5 INTRODUCTION &SETTING THE STAGE: TENDENCIES IN THE DRUG DISCOVERY INFORMATION LANDSCAPE
  • 6.
    6 INTRODUCTION &SETTING THE STAGE: TENDENCIES IN THE DRUG DISCOVERY INFORMATION LANDSCAPE
  • 7.
  • 8.
    ISSUE: CHEMICAL INFORMATIONACCESS IS FRAGMENTED • End users must learn many interfaces • Different data sources have different capabilities for searching • Scientists may not search all appropriate data sources Licensed Database Licensed Database Catalog Catalog E-Notebook E-Notebook References/ Full Text
  • 9.
    INTEGRATION OF DATAPROVIDES BETTER ANSWERS Searching multiple sources with one search via a single interface increases efficiency Harmonized indexing allows asking similar question among all sources 9 Easier Access Enhanced usage Better value for investment Better decisions Faster progress
  • 10.
    CONTENT INTEGRATION : THE REAXYS CASE 10
  • 11.
    11 THE REAXYSDATABASE: CONTAINS INTEGRATED PUBLISHED CHEMISTRY DATA
  • 12.
    12 THE REAXYSDATABASE: …ALONG WITH EXPANDED BIBLIOGRAPHICAL INFORMATION
  • 13.
    13 THE REAXYSTREE : BROWSE CONTENT BY ONTOLOGY
  • 14.
    TWO APPROACHES TOWARDSINTEGRATED CONTENT 14 Analysis system End-User Central Storage FEDERATED MODEL WAREHOUSE MODEL
  • 15.
    TWO APPROACHES TOWARDSINTEGRATED CONTENT 15 FEDERATED MODEL WAREHOUSE MODEL Pros: - Easy scalability in case of new data sources - Delivery of short term „wins“ - Maintenance costs Cons: - Lack of normalization and harmonized indexing - Performance and availability dependent on the source systems Pros: - High data quality trough normalization - Unified Queries and Filters applicable Cons: - Long implementation times & higher starting costs - Expensive and difficult to accommodate changes in data types
  • 16.
    16 REAXYS EXTERNALCONTENT INTEGRATION Database End-User ELN 1 ELN 2 CUSTOM IN HOUSE REACTIONS SOURCE Indexed Storage RX CONTENT EXTERNAL CONTENT
  • 17.
    Customer Hosted 17 REAXYS EXTERNAL CONTENT INTEGRATION: IN HOUSE SCENARIO Database End-User ELN 1 ELN 2 CUSTOM IN HOUSE REACTIONS SOURCE Indexed Storage RX CONTENT EXTERNAL CONTENT
  • 18.
    Customer Hosted ElsevierHosted 18 REAXYS EXTERNAL CONTENT INTEGRATION: ELSEVIER HOSTED SCENARIO Database End-User ELN 1 ELN 2 CUSTOM IN HOUSE REACTIONS SOURCE Indexed Storage RX CONTENT EXTERNAL CONTENT
  • 19.
    Elsevier Hosted CustomerHosted 19 REAXYS EXTERNAL CONTENT INTEGRATION: HYBRID HOSTING SCENARIO Database End-User ELN 1 ELN 2 CUSTOM IN HOUSE REACTIONS SOURCE Indexed Storage RX CONTENT EXTERNAL CONTENT
  • 20.
    REAXYS PROVIDES AUNIFIED INFORMATION PORTAL • Provides a single powerful interface • Can integrate several notebook systems • Links chemistry, structures, sourcing, citations, and full-text of articles Structures, reactions, and Full-Text Licensed Reaction and Structure Databases E-Notebook Binding, Properties E-Notebook Patents
  • 21.
    INTEGRATED SOLUTION SEARCH 21 List of integrated sources Sources list can include licensed databases, and multiple e-notebooks from organizational units All e-notebooks can be integrated and searched together
  • 22.
    REACTION SEARCH RESULTSSEPARATED BY SOURCE 22 Results from each source on separate tab Show corresponding substances in … PubChem eMolecules Licensed PharmaCo e-notebook PharmaCo2 e-notebook Cross link to substance in all other sources where it is found E-notebooks
  • 23.
    SUBSTANCE RESULTS 23 Results from each source on separate tab Including PubChem and eMolecules All filters fully active
  • 24.
    INTEGRATION CASE STUDY:ROCHE IN HOUSE HOSTED Integrated Reaxys with several data sources: • Medicinal Chemistry E-notebooks • Development Chemistry E-notebooks • Several E-notebook systems of acquired organizations • Licensed Databases • Current Chemical Reactions • Several other databases Links to many more sources • Roche stockroom availability • Patent/Literature full text • Link to original e-notebook pages 24 Reaxys integrates these e-notebooks with each other, while they are still maintained as separate systems
  • 25.
    CASE STUDY: ROCHEKEY DRIVERS From ACS Presentation by Michael Kapler, Roche Pharma Research and Early Development http://abstracts.acs.org/chem/245nm/program/view.php?obj_id=188977
  • 26.
  • 27.
    PROCESS OVERVIEW FORAN INTEGRATION PROJECT Initialisation: - Evaluation of Datasources and needed resources - Determine hosting scenario - Commercial and legal framework Kick-of: -requirements harvesting -establish milestones and top down workstreams -Refine & finalize plans Execution: -implement automatised ETL process -implement application customisation -install IT infrastructuee and interfaces Delivery - BETA release - Refinement - GoLive Sprint 1 Sprint 2 Sprint 3 Sprint 4 Hand over to BAU /Maintenance - Sprint Iterations a 3 weeks -Sprint number dependent on complexity (5-…)
  • 28.
    DATA MODEL ANDUSER INTERFACE PROCESS Determine data sources Map to Reaxys Integration Data Model (XML) User interface configuration for new fields Unit conversions, data cleaning E-notebooks Licensed databases °C, K moles, grams Identify URL links to E-notebooks an other resources This is a key step for the integration project Design location, nature of displaying the fields, urls etc.
  • 29.
    AUTOMATISED FABRICATION PROCESS(ETL) E-notebook 1 E-notebook 2 Bioassay db Transmit to fabrication server (sftp, scp) Fabrication combines data with Reaxys data for production Daily extraction to XML using defined data model …
  • 30.
    30 THANK YOU– QUESTIONS? Project Manager Content Integration & Development Elsevier Information Systems GmbH, Frankfurt ro.bauer@elsevier.com Matthew Clark Ph.D. Consultant, Life Science Services Elsevier Inc. Philadelphia, PA m.clark@elsevier.com Dr. Roland Bauer