SlideShare a Scribd company logo
MedChemica
What have we done? What could we do with
Advanced Analytics in the Chemistry
Industry?
Ed Griffen
MedChemica Ltd
MedChemica
Big Data – Focus on Benefits not Features
From the Gartner IT Glossary:
What is Big Data?
Big Data is
high-volume,
high-velocity and/or high-variety information assets
that demand cost-effective,
innovative forms of information processing that
enable
enhanced insight,
decision making,
and process automation.
2
Features
Benefits
MedChemica
Where is Big Data proving most Successful?
• Customer analysis
• Targeted advertising
• Language translation
3
• What do these have in common?
• Underlying theoretical model insufficiently accurate or unknown
• Very, very large data sets
• Straightforward statistical methods
• Most users are unskilled and not interested in mechanics
MedChemica
What are the classes of chemical problem?
4
‘Potency’ Properties Production Patents
• Lead finding
• Potency
improvement
• Pharmacokinetics
• Solubility
• Off target toxicity
• First
successful
route
• ‘Best” route
• Freedom to
Operate
Product Size
Duration of action
Safety margin
Speed
Cost
Commercial
Position
Common to all
• Pharmaceuticals
• Agrochemicals
• Flavors and Fragrances
• Consumer products
• Materials science
• Underlying theoretical model insufficiently
accurate or unknown
• Very, very large data sets
• Straightforward statistical methods
• Most users are unskilled and not interested in
mechanics
MedChemica
‘Big Data’ analysis for Chemistry
Making and testing compounds is expensive!
• No new compounds to make
• No new testing to do
• Exploit the compounds and data you’ve already paid for
• Accelerate all new projects
• Augment the skills and experience of your chemists
• Mythbusting…
All very cost effective
MedChemica
Help the HiPPOs – or they’ll crush you
6
1. McAfee & Brynjolfsson “Big Data: The Management Revolution”,
Harvard Business Review October 2012
“Companies often make most of
their important decisions by
relying on “HiPPO”—the highest-
paid person’s opinion.”1
Chemistry HiPPs:
• experts in pattern recognition
• judged on their ability to make the best decisions with partial data
• highly trained
• time poor
• delivery focused
• gatekeepers to the adoption of new approaches
MedChemica
Making a real textbook of Medicinal Chemistry
MMPA
MMPA
MMPA
Combine
and
Extract
Rules
Multiple Pharma
ADMET data
>437000 rules
Better
Project
decisions
Increased
Medicinal
Chemistry
learning
Kramer, Robb, Ting, Zheng, Griffen, et al: J. Med. Chem 2017
http://pubs.acs.org/doi/10.1021/acs.jmedchem.7b00935
‘Potency’ Properties Production Patents
• Lead finding
• Potency
improvement
• Pharmacokinetics
• Solubility
• Off target toxicity
• First
successful
route
• ‘Best” route
• Freedom to
Operate
MedChemica
Making the complicated simple: HOT-Fit
Learning from the development of clinical decision support software
Algorithms
Technology
Data
Speed
Benefits
Human
System Use
User
Satisfaction
Organization
Structure
Environment
E.Kilsdonk, L.W.Peute, M.W.M.Jaspers, Factors Influencing Implementation Success of Guideline-based Clinical Decision
Support Systems: a systematic review and gaps analysis, International Journal of Medical Informatics
http://dx.doi.org/10.1016/j.ijmedinf.2016.12.001
MedChemica
Chemistry Knowledge extraction methods
Remember: your HiPPO needs to understand
9
substructures Physical chemistry
descriptors(Hansch,
Taft, Fujita, Abraham)
Atomic, pair, triplet
descriptors
Indices
Counts & descriptive
statistics
MMPA
(M)LR Free Wilson
PLS
Trees / Forests
SVM
Bayesian NN
Deep Learning Dark
Black
Descriptors
Method
It’s a
summit –
but what
else is out
there?
MedChemica
• Matched Molecular Pairs –
Molecules that differ only by a
particular, well-defined
structural transformation
Griffen, E. et al. J. Med. Chem. 2011, 54(22), pp.7739-7750.
Advanced MMPA with MCPairs
• Transformation with environment
capture – MMPs can be recorded
as transformations from A B
Δ Data A-
B
1
2
2
3
3
3
4
4
4
12
23
3
34
4
4
A B
Environment is key - must be captured in the chemical encoding
MMPA: Environment really matters
HMe:
• Median Dlog(Solubility)
• 225 different
environments
2.5log
1.5log
HMe:
• Median Dlog(Clint)
Human microsomal
clearance
• 278 different
environments
MedChemica
MedChemica
Matched Molecular Pair methods matter
If you don’t use both you’ll miss 12-56% of the pairs
2 Methods:
Maximum Common SubStructure(MCSS) Fragment and Index(FI)
Warner, Sheridan Hussein & Rea
Strengths:
Ring replacement linker and core swaps
Macrocycle ring pairs
12
EGF D1 Cav3.2
fF+I
fMCSS
0.1
0.9
0.1 0.9 0.1 0.9 0.1 0.9
0.1
0.9
0.1
0.9
Leach et al J.Chem. Inf. Model. 2017 http://dx.doi.org/10.1021/acs.jcim.7b00335
MedChemica
Identify and group matching SMIRKS
Calc ulate statistical parameters for eac h unique
SMIRKS(n, median, sd, se, n_up/ n_down)
Is n ≥ 6?
Not enough data:
ignore transformation
Is the | median| ≤ 0.05 and the
interc entile range (10-90%) ≤ 0.3?
Perform two-tailed binomial test on the
transformation to determine the
signific anc e of the up/ down frequenc y
transformation is
c lassified as ‘neutral’
Transformation c lassified as
‘NED’ (No Effec t Determined)
Transformation c lassified as
‘increase’ or ‘ decrease’
depending on whic h direc tion the
property is c hanging
passfail
yesno
yesno
Rule selection
0 +ve-ve
Median data difference
Neutral IncreaseDecrease
NED
• No assumption of normal
distribution
• Manage ‘censored’ = qualified
/ out-of-range data
MedChemica
Making the complicated simple: HOT-Fit
Algorithms
Technology
Human
Organization
Data
Speed
System Use
User
Satisfaction
Structure
Environment
Benefits
MedChemica
Where to get data?
• Public data is unrepresentative
• Censored by publication bias
• Pharma data – can’t share
structures due to IP.
• Use chemical transformations to
encode knowledge from matched
molecular pair (MMP) analysis 
now sharable
Novartis: Kramer, C.; Kalliokoski,
T. et al The Experimental
Uncertainty of Heterogeneous
Public Ki Data J. Med. Chem
2012, 55, 5165
If project data really looked like
that, there would be no problem
in the Pharma industry.
MedChemica
Data Sources
Roche
Database
AZ
Data
MMP
finder
AZ
Database
MMP
finder
MMP
finder
Roche
Data
Genentech
Data
Grand Rule
Database
Grand Rule
Database
Grand Rule
Database
Grand Rule
Database
AZ
Exploitation
Roche
Exploitation
Genentech
Exploitation
>500 million pairs
MedChemica
Aggregation
Individual
company
firewall
Genentech
Database
0.5 million rules
MedChemica
Merge
Pharma 1 100k rules
Pharma 2 92k rules
Pharma 3 37k rules
5.8k rules in common (pre-merge) ~ 2%
New Rules 88k
~26% of total
Combining data yields brand new rules
Gains: 300 - 900%
Merging knowledge – GRDv1
MedChemica
Knowledge Extracted
Numbers of statistically valid transforms
Grouped Datasets Number of Rules
logD7.4 153449
Merged solubility 46655
In vitro microsomal clearance:
Human, rat, mouse, cyno, dog
88423
In vitro hepatocyte clearance :
Human, rat, mouse, cyno, dog 26627
MCDK permeability A-B / B – A efflux 1852
Cytochrome P450 inhibition:
2C9, 2D6 , 3A4 , 2C19 , 1A2
40605
Cardiac ion channels
NaV 1.5, hERG ion channel inhibition
15636
Glutathione Stability 116
plasma protein or albumin binding
Human, rat, mouse, cyno, dog
64622
Grand Rule
Database
v3
MedChemica
Single company vs merged
Comparison between Roche-only and GRD rules for human
microsomal clearance. Overall R2 is 0.76 and RMSE 0.11.
MedChemica
Chemists use logD as a benchmark:
• Standard to use lipophilicity as a design surrogate
• Provides a context for changes
• Key multi-objective design issues are centered round
conflicting logD correlations:
• Solubility & metabolic stabilitypotency & permeability
• Particularly useful to look at chemical transformations that
‘ break the dogma’ of logD correlation
MedChemica
Solubility : logD – trends & exceptions
>=20 examples per rule, n=13,453
R2 = 0.66, slope = -0.57, intercept = 0.
Magenta line: line of slope -1, intercept 0, dark blue line linear best fit, pale blue density ellipse contains
99% and the mid blue ellipse contains 50% of the transformations.
MedChemica
Exceptional Solubility transformations
Transformation median ΔlogD ±std
(nPairs)
median ΔlogSol ±std
(nPairs)
Comment
0.00 ± 0.67
(91)
0.73 ± 0.72
(87)
DlogD ==
Solubility 
-0.10 ±0.83
(83)
0.65 ± 0.96
(69)
0.07 ± 0.50
(108)
0.52 ± 0.77
(80)
-0.10 ± 0.54
(208)
0.40 ± 0.78
(115)
-0.59 ± 0.49
(82)
0.03 ± 0.72
(98)
DlogD 
Solubility ==
MedChemica
Clearance : logD – trends & exceptions
>=20 examples per rule, n=11,572
R2 = 0.40, slope 0.23, intercept = 0.
Magenta line: line of slope 1, intercept 0, dark blue line linear best fit, pale blue density
ellipse contains 99% and the mid blue ellipse contains 50% of the transformations.
MedChemica
Exceptional HLM transformations
Transformation median ΔlogD ±std
(nPairs)
HLM
median Δlog(Clint) ±std
(nPairs)
Comment
0.35±0.45
(15)
-0.34±0.71
(13)
DlogD 
Clint

0.70±0.74
(117)
-0.32±0.51
(53)
0.73±0.61
(26)
-0.23±0.36
(18)
0.00±0.11
(19)
-0.59±0.38
(14)
DlogD ==
Clint

-0.69±0.42
(8)
0.76±0.59
(7)
DlogD 
Clint 
MedChemica
Making the complicated simple: HOT-Fit
Algorithms
Technology
Human
Organization
Data
Speed
System Use
User
Satisfaction
Structure
Environment
Benefits
MedChemica
MMPA: Engineering challenges
• Quick to implement on a small scale
• Always becomes an n2 problem….
• ‘Challenging’ at enterprise scales 100,000+
- Cheminformatics ‘gotchas’
• Tautomers, charge states
• Unusual aromatic systems
• Highly symmetric molecules
• Capturing and coding environments accurately
- Structure and data integrity
- Assay ontologies
- Database schema optimized for cluster I/O
Speed at scale essential – time poor users
MedChemica
Interface Design depends on the User
27
• > 2 x 1012 searches / year
• Totally unskilled users
• Simple consistent interface
• Rocket scientists
?
Meet your HiPPO where they’re skilled
• Intuitive ( = fast & familiar)
• Summary data + option to drill into the
detail
• Web browsers
• Excel
MedChemica
Exploiting Knowledge for Compound Optimization
Measured
Data
rule
finder
Rule
Database
Compounds
from Rules
Problem molecule
New molecule
suggestions
rule
finder
MCPairs=
“..it’s like asking 150 of your peers
for ideas in just a few seconds” –
AZ Principal Scientist
MedChemica
Exploiting Knowledge for Compound Optimization
https://www.youtube.com/watch?v=nQxXddJDTfc
MedChemica
More examples of Success
30
Thompson; M.J. et al J. Med. Chem., 2015, 58 (23), pp 9309–9333
DOI: 10.1021/acs.jmedchem.5b01312
MedChemica
“Me-Betters” on a Massive scale
Enumerator
System
1162
Marketed
Drugs
Wealth of
Follow-on
opportunities
Grand Rule
Database
v3
Improve solubility & metabolism
= lower dose
= uid from bid/tid
Safer, better compliance
~425 improvement
suggestions / drug
MedChemica
‘Instant’ SAR exploration
https://www.youtube.com/watch?v=_FGSnD6PG3I
MedChemica
• MMP based clustering
• QSAR from MMPA
• Matched molecular series
•Interface design is key
There is so much more…
?
MedChemica
What can we do with Advanced Analytics?
Accelerate Chemistry by using:
• right algorithms that our users understand
• as much data as possible
• fast, “user appropriate” interfaces
deliver better products into development faster.
34
MedChemica
Collaborators and Users - experience

More Related Content

What's hot

Griffen MedChemica Virtual Tox Panel
Griffen MedChemica Virtual Tox PanelGriffen MedChemica Virtual Tox Panel
Griffen MedChemica Virtual Tox Panel
Ed Griffen
 
Best practices in chemical management webinar
Best practices in chemical management webinarBest practices in chemical management webinar
Best practices in chemical management webinar
SiteHawk
 
1530 track2 humphrey
1530 track2 humphrey1530 track2 humphrey
1530 track2 humphrey
Rising Media, Inc.
 
Potency Tester Creates Extra Revenue for Retailer
Potency Tester Creates Extra Revenue for RetailerPotency Tester Creates Extra Revenue for Retailer
Potency Tester Creates Extra Revenue for Retailer
gsetton
 
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesOpen-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databases
Greg Landrum
 
RSC Hatfield 2018 Kinase meeting : potency patents MMPA approaches
RSC Hatfield 2018  Kinase meeting : potency patents MMPA approachesRSC Hatfield 2018  Kinase meeting : potency patents MMPA approaches
RSC Hatfield 2018 Kinase meeting : potency patents MMPA approaches
Ed Griffen
 
KO-brochure-online-jan2015-hi
KO-brochure-online-jan2015-hiKO-brochure-online-jan2015-hi
KO-brochure-online-jan2015-hi
Steve Brough
 
What is in your vape?! CannMed 2019 Presentation
What is in your vape?! CannMed 2019 PresentationWhat is in your vape?! CannMed 2019 Presentation
What is in your vape?! CannMed 2019 Presentation
Markus Roggen
 
Slow is Smooth & Smooth is Fast!
Slow is Smooth & Smooth is Fast!Slow is Smooth & Smooth is Fast!
Slow is Smooth & Smooth is Fast!
Markus Roggen
 

What's hot (9)

Griffen MedChemica Virtual Tox Panel
Griffen MedChemica Virtual Tox PanelGriffen MedChemica Virtual Tox Panel
Griffen MedChemica Virtual Tox Panel
 
Best practices in chemical management webinar
Best practices in chemical management webinarBest practices in chemical management webinar
Best practices in chemical management webinar
 
1530 track2 humphrey
1530 track2 humphrey1530 track2 humphrey
1530 track2 humphrey
 
Potency Tester Creates Extra Revenue for Retailer
Potency Tester Creates Extra Revenue for RetailerPotency Tester Creates Extra Revenue for Retailer
Potency Tester Creates Extra Revenue for Retailer
 
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesOpen-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databases
 
RSC Hatfield 2018 Kinase meeting : potency patents MMPA approaches
RSC Hatfield 2018  Kinase meeting : potency patents MMPA approachesRSC Hatfield 2018  Kinase meeting : potency patents MMPA approaches
RSC Hatfield 2018 Kinase meeting : potency patents MMPA approaches
 
KO-brochure-online-jan2015-hi
KO-brochure-online-jan2015-hiKO-brochure-online-jan2015-hi
KO-brochure-online-jan2015-hi
 
What is in your vape?! CannMed 2019 Presentation
What is in your vape?! CannMed 2019 PresentationWhat is in your vape?! CannMed 2019 Presentation
What is in your vape?! CannMed 2019 Presentation
 
Slow is Smooth & Smooth is Fast!
Slow is Smooth & Smooth is Fast!Slow is Smooth & Smooth is Fast!
Slow is Smooth & Smooth is Fast!
 

Similar to SCI What can Big Data do for Chemistry 2017 MedChemica

Explainable AI in Drug Hunting
Explainable AI in Drug HuntingExplainable AI in Drug Hunting
Explainable AI in Drug Hunting
Ed Griffen
 
MedChemica Active Learning - Combining MMPA and ML
MedChemica Active Learning - Combining MMPA and MLMedChemica Active Learning - Combining MMPA and ML
MedChemica Active Learning - Combining MMPA and ML
Al Dossetter
 
Transforming Big Data into Big Value
Transforming Big Data into Big ValueTransforming Big Data into Big Value
Transforming Big Data into Big Value
Thomas Kelly, PMP
 
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Bigfinite
 
[Hongsermeier] clinical decision support services amdis final
[Hongsermeier] clinical decision support services amdis final[Hongsermeier] clinical decision support services amdis final
[Hongsermeier] clinical decision support services amdis final
Trimed Media Group
 
2015 bioinformatics personal_genomics_wim_vancriekinge
2015 bioinformatics personal_genomics_wim_vancriekinge2015 bioinformatics personal_genomics_wim_vancriekinge
2015 bioinformatics personal_genomics_wim_vancriekinge
Prof. Wim Van Criekinge
 
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Databricks
 
Accelerating multiple medicinal chemistry projects using Artificial Intellige...
Accelerating multiple medicinal chemistry projects using Artificial Intellige...Accelerating multiple medicinal chemistry projects using Artificial Intellige...
Accelerating multiple medicinal chemistry projects using Artificial Intellige...
Al Dossetter
 
KG_based pharma marketing.pptx
KG_based pharma marketing.pptxKG_based pharma marketing.pptx
KG_based pharma marketing.pptx
Sridhar Nomula
 
Meaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine researchMeaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine research
Nolan Nichols
 
Parkinson disease classification v2.0
Parkinson disease classification v2.0Parkinson disease classification v2.0
Parkinson disease classification v2.0
Nikhil Shrivastava, MS, SAFe PMPO
 
Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...
Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...
Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...
Perficient, Inc.
 
Semantic Technology for Provider-Payer-Pharma Data Collaboration
Semantic Technology for Provider-Payer-Pharma Data CollaborationSemantic Technology for Provider-Payer-Pharma Data Collaboration
Semantic Technology for Provider-Payer-Pharma Data Collaboration
Thomas Kelly, PMP
 
Microsoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big DataMicrosoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big Data
Health Catalyst
 
IRJET- Extending Association Rule Summarization Techniques to Assess Risk of ...
IRJET- Extending Association Rule Summarization Techniques to Assess Risk of ...IRJET- Extending Association Rule Summarization Techniques to Assess Risk of ...
IRJET- Extending Association Rule Summarization Techniques to Assess Risk of ...
IRJET Journal
 
Parkinson disease classification recorded v2.0
Parkinson disease classification recorded   v2.0Parkinson disease classification recorded   v2.0
Parkinson disease classification recorded v2.0
Nikhil Shrivastava, MS, SAFe PMPO
 
Practical Drug Discovery using Explainable Artificial Intelligence
Practical Drug Discovery using Explainable Artificial IntelligencePractical Drug Discovery using Explainable Artificial Intelligence
Practical Drug Discovery using Explainable Artificial Intelligence
Al Dossetter
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinar2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinar
Pistoia Alliance
 
Healthcare Analytics Adoption Model
Healthcare Analytics Adoption ModelHealthcare Analytics Adoption Model
Healthcare Analytics Adoption Model
Health Catalyst
 

Similar to SCI What can Big Data do for Chemistry 2017 MedChemica (20)

Explainable AI in Drug Hunting
Explainable AI in Drug HuntingExplainable AI in Drug Hunting
Explainable AI in Drug Hunting
 
MedChemica Active Learning - Combining MMPA and ML
MedChemica Active Learning - Combining MMPA and MLMedChemica Active Learning - Combining MMPA and ML
MedChemica Active Learning - Combining MMPA and ML
 
Transforming Big Data into Big Value
Transforming Big Data into Big ValueTransforming Big Data into Big Value
Transforming Big Data into Big Value
 
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
 
[Hongsermeier] clinical decision support services amdis final
[Hongsermeier] clinical decision support services amdis final[Hongsermeier] clinical decision support services amdis final
[Hongsermeier] clinical decision support services amdis final
 
2015 bioinformatics personal_genomics_wim_vancriekinge
2015 bioinformatics personal_genomics_wim_vancriekinge2015 bioinformatics personal_genomics_wim_vancriekinge
2015 bioinformatics personal_genomics_wim_vancriekinge
 
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
 
Accelerating multiple medicinal chemistry projects using Artificial Intellige...
Accelerating multiple medicinal chemistry projects using Artificial Intellige...Accelerating multiple medicinal chemistry projects using Artificial Intellige...
Accelerating multiple medicinal chemistry projects using Artificial Intellige...
 
KG_based pharma marketing.pptx
KG_based pharma marketing.pptxKG_based pharma marketing.pptx
KG_based pharma marketing.pptx
 
Meaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine researchMeaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine research
 
Parkinson disease classification v2.0
Parkinson disease classification v2.0Parkinson disease classification v2.0
Parkinson disease classification v2.0
 
Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...
Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...
Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...
 
Semantic Technology for Provider-Payer-Pharma Data Collaboration
Semantic Technology for Provider-Payer-Pharma Data CollaborationSemantic Technology for Provider-Payer-Pharma Data Collaboration
Semantic Technology for Provider-Payer-Pharma Data Collaboration
 
Microsoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big DataMicrosoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big Data
 
IRJET- Extending Association Rule Summarization Techniques to Assess Risk of ...
IRJET- Extending Association Rule Summarization Techniques to Assess Risk of ...IRJET- Extending Association Rule Summarization Techniques to Assess Risk of ...
IRJET- Extending Association Rule Summarization Techniques to Assess Risk of ...
 
Parkinson disease classification recorded v2.0
Parkinson disease classification recorded   v2.0Parkinson disease classification recorded   v2.0
Parkinson disease classification recorded v2.0
 
Practical Drug Discovery using Explainable Artificial Intelligence
Practical Drug Discovery using Explainable Artificial IntelligencePractical Drug Discovery using Explainable Artificial Intelligence
Practical Drug Discovery using Explainable Artificial Intelligence
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinar2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinar
 
Healthcare Analytics Adoption Model
Healthcare Analytics Adoption ModelHealthcare Analytics Adoption Model
Healthcare Analytics Adoption Model
 

Recently uploaded

The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Leonel Morgado
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
Abdul Wali Khan University Mardan,kP,Pakistan
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
European Sustainable Phosphorus Platform
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
terusbelajar5
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
TinyAnderson
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
MaheshaNanjegowda
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
Hitesh Sikarwar
 

Recently uploaded (20)

The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
 

SCI What can Big Data do for Chemistry 2017 MedChemica

  • 1. MedChemica What have we done? What could we do with Advanced Analytics in the Chemistry Industry? Ed Griffen MedChemica Ltd
  • 2. MedChemica Big Data – Focus on Benefits not Features From the Gartner IT Glossary: What is Big Data? Big Data is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation. 2 Features Benefits
  • 3. MedChemica Where is Big Data proving most Successful? • Customer analysis • Targeted advertising • Language translation 3 • What do these have in common? • Underlying theoretical model insufficiently accurate or unknown • Very, very large data sets • Straightforward statistical methods • Most users are unskilled and not interested in mechanics
  • 4. MedChemica What are the classes of chemical problem? 4 ‘Potency’ Properties Production Patents • Lead finding • Potency improvement • Pharmacokinetics • Solubility • Off target toxicity • First successful route • ‘Best” route • Freedom to Operate Product Size Duration of action Safety margin Speed Cost Commercial Position Common to all • Pharmaceuticals • Agrochemicals • Flavors and Fragrances • Consumer products • Materials science • Underlying theoretical model insufficiently accurate or unknown • Very, very large data sets • Straightforward statistical methods • Most users are unskilled and not interested in mechanics
  • 5. MedChemica ‘Big Data’ analysis for Chemistry Making and testing compounds is expensive! • No new compounds to make • No new testing to do • Exploit the compounds and data you’ve already paid for • Accelerate all new projects • Augment the skills and experience of your chemists • Mythbusting… All very cost effective
  • 6. MedChemica Help the HiPPOs – or they’ll crush you 6 1. McAfee & Brynjolfsson “Big Data: The Management Revolution”, Harvard Business Review October 2012 “Companies often make most of their important decisions by relying on “HiPPO”—the highest- paid person’s opinion.”1 Chemistry HiPPs: • experts in pattern recognition • judged on their ability to make the best decisions with partial data • highly trained • time poor • delivery focused • gatekeepers to the adoption of new approaches
  • 7. MedChemica Making a real textbook of Medicinal Chemistry MMPA MMPA MMPA Combine and Extract Rules Multiple Pharma ADMET data >437000 rules Better Project decisions Increased Medicinal Chemistry learning Kramer, Robb, Ting, Zheng, Griffen, et al: J. Med. Chem 2017 http://pubs.acs.org/doi/10.1021/acs.jmedchem.7b00935 ‘Potency’ Properties Production Patents • Lead finding • Potency improvement • Pharmacokinetics • Solubility • Off target toxicity • First successful route • ‘Best” route • Freedom to Operate
  • 8. MedChemica Making the complicated simple: HOT-Fit Learning from the development of clinical decision support software Algorithms Technology Data Speed Benefits Human System Use User Satisfaction Organization Structure Environment E.Kilsdonk, L.W.Peute, M.W.M.Jaspers, Factors Influencing Implementation Success of Guideline-based Clinical Decision Support Systems: a systematic review and gaps analysis, International Journal of Medical Informatics http://dx.doi.org/10.1016/j.ijmedinf.2016.12.001
  • 9. MedChemica Chemistry Knowledge extraction methods Remember: your HiPPO needs to understand 9 substructures Physical chemistry descriptors(Hansch, Taft, Fujita, Abraham) Atomic, pair, triplet descriptors Indices Counts & descriptive statistics MMPA (M)LR Free Wilson PLS Trees / Forests SVM Bayesian NN Deep Learning Dark Black Descriptors Method It’s a summit – but what else is out there?
  • 10. MedChemica • Matched Molecular Pairs – Molecules that differ only by a particular, well-defined structural transformation Griffen, E. et al. J. Med. Chem. 2011, 54(22), pp.7739-7750. Advanced MMPA with MCPairs • Transformation with environment capture – MMPs can be recorded as transformations from A B Δ Data A- B 1 2 2 3 3 3 4 4 4 12 23 3 34 4 4 A B Environment is key - must be captured in the chemical encoding
  • 11. MMPA: Environment really matters HMe: • Median Dlog(Solubility) • 225 different environments 2.5log 1.5log HMe: • Median Dlog(Clint) Human microsomal clearance • 278 different environments MedChemica
  • 12. MedChemica Matched Molecular Pair methods matter If you don’t use both you’ll miss 12-56% of the pairs 2 Methods: Maximum Common SubStructure(MCSS) Fragment and Index(FI) Warner, Sheridan Hussein & Rea Strengths: Ring replacement linker and core swaps Macrocycle ring pairs 12 EGF D1 Cav3.2 fF+I fMCSS 0.1 0.9 0.1 0.9 0.1 0.9 0.1 0.9 0.1 0.9 0.1 0.9 Leach et al J.Chem. Inf. Model. 2017 http://dx.doi.org/10.1021/acs.jcim.7b00335
  • 13. MedChemica Identify and group matching SMIRKS Calc ulate statistical parameters for eac h unique SMIRKS(n, median, sd, se, n_up/ n_down) Is n ≥ 6? Not enough data: ignore transformation Is the | median| ≤ 0.05 and the interc entile range (10-90%) ≤ 0.3? Perform two-tailed binomial test on the transformation to determine the signific anc e of the up/ down frequenc y transformation is c lassified as ‘neutral’ Transformation c lassified as ‘NED’ (No Effec t Determined) Transformation c lassified as ‘increase’ or ‘ decrease’ depending on whic h direc tion the property is c hanging passfail yesno yesno Rule selection 0 +ve-ve Median data difference Neutral IncreaseDecrease NED • No assumption of normal distribution • Manage ‘censored’ = qualified / out-of-range data
  • 14. MedChemica Making the complicated simple: HOT-Fit Algorithms Technology Human Organization Data Speed System Use User Satisfaction Structure Environment Benefits
  • 15. MedChemica Where to get data? • Public data is unrepresentative • Censored by publication bias • Pharma data – can’t share structures due to IP. • Use chemical transformations to encode knowledge from matched molecular pair (MMP) analysis  now sharable Novartis: Kramer, C.; Kalliokoski, T. et al The Experimental Uncertainty of Heterogeneous Public Ki Data J. Med. Chem 2012, 55, 5165 If project data really looked like that, there would be no problem in the Pharma industry.
  • 16. MedChemica Data Sources Roche Database AZ Data MMP finder AZ Database MMP finder MMP finder Roche Data Genentech Data Grand Rule Database Grand Rule Database Grand Rule Database Grand Rule Database AZ Exploitation Roche Exploitation Genentech Exploitation >500 million pairs MedChemica Aggregation Individual company firewall Genentech Database 0.5 million rules
  • 17. MedChemica Merge Pharma 1 100k rules Pharma 2 92k rules Pharma 3 37k rules 5.8k rules in common (pre-merge) ~ 2% New Rules 88k ~26% of total Combining data yields brand new rules Gains: 300 - 900% Merging knowledge – GRDv1
  • 18. MedChemica Knowledge Extracted Numbers of statistically valid transforms Grouped Datasets Number of Rules logD7.4 153449 Merged solubility 46655 In vitro microsomal clearance: Human, rat, mouse, cyno, dog 88423 In vitro hepatocyte clearance : Human, rat, mouse, cyno, dog 26627 MCDK permeability A-B / B – A efflux 1852 Cytochrome P450 inhibition: 2C9, 2D6 , 3A4 , 2C19 , 1A2 40605 Cardiac ion channels NaV 1.5, hERG ion channel inhibition 15636 Glutathione Stability 116 plasma protein or albumin binding Human, rat, mouse, cyno, dog 64622 Grand Rule Database v3
  • 19. MedChemica Single company vs merged Comparison between Roche-only and GRD rules for human microsomal clearance. Overall R2 is 0.76 and RMSE 0.11.
  • 20. MedChemica Chemists use logD as a benchmark: • Standard to use lipophilicity as a design surrogate • Provides a context for changes • Key multi-objective design issues are centered round conflicting logD correlations: • Solubility & metabolic stabilitypotency & permeability • Particularly useful to look at chemical transformations that ‘ break the dogma’ of logD correlation
  • 21. MedChemica Solubility : logD – trends & exceptions >=20 examples per rule, n=13,453 R2 = 0.66, slope = -0.57, intercept = 0. Magenta line: line of slope -1, intercept 0, dark blue line linear best fit, pale blue density ellipse contains 99% and the mid blue ellipse contains 50% of the transformations.
  • 22. MedChemica Exceptional Solubility transformations Transformation median ΔlogD ±std (nPairs) median ΔlogSol ±std (nPairs) Comment 0.00 ± 0.67 (91) 0.73 ± 0.72 (87) DlogD == Solubility  -0.10 ±0.83 (83) 0.65 ± 0.96 (69) 0.07 ± 0.50 (108) 0.52 ± 0.77 (80) -0.10 ± 0.54 (208) 0.40 ± 0.78 (115) -0.59 ± 0.49 (82) 0.03 ± 0.72 (98) DlogD  Solubility ==
  • 23. MedChemica Clearance : logD – trends & exceptions >=20 examples per rule, n=11,572 R2 = 0.40, slope 0.23, intercept = 0. Magenta line: line of slope 1, intercept 0, dark blue line linear best fit, pale blue density ellipse contains 99% and the mid blue ellipse contains 50% of the transformations.
  • 24. MedChemica Exceptional HLM transformations Transformation median ΔlogD ±std (nPairs) HLM median Δlog(Clint) ±std (nPairs) Comment 0.35±0.45 (15) -0.34±0.71 (13) DlogD  Clint  0.70±0.74 (117) -0.32±0.51 (53) 0.73±0.61 (26) -0.23±0.36 (18) 0.00±0.11 (19) -0.59±0.38 (14) DlogD == Clint  -0.69±0.42 (8) 0.76±0.59 (7) DlogD  Clint 
  • 25. MedChemica Making the complicated simple: HOT-Fit Algorithms Technology Human Organization Data Speed System Use User Satisfaction Structure Environment Benefits
  • 26. MedChemica MMPA: Engineering challenges • Quick to implement on a small scale • Always becomes an n2 problem…. • ‘Challenging’ at enterprise scales 100,000+ - Cheminformatics ‘gotchas’ • Tautomers, charge states • Unusual aromatic systems • Highly symmetric molecules • Capturing and coding environments accurately - Structure and data integrity - Assay ontologies - Database schema optimized for cluster I/O Speed at scale essential – time poor users
  • 27. MedChemica Interface Design depends on the User 27 • > 2 x 1012 searches / year • Totally unskilled users • Simple consistent interface • Rocket scientists ? Meet your HiPPO where they’re skilled • Intuitive ( = fast & familiar) • Summary data + option to drill into the detail • Web browsers • Excel
  • 28. MedChemica Exploiting Knowledge for Compound Optimization Measured Data rule finder Rule Database Compounds from Rules Problem molecule New molecule suggestions rule finder MCPairs= “..it’s like asking 150 of your peers for ideas in just a few seconds” – AZ Principal Scientist
  • 29. MedChemica Exploiting Knowledge for Compound Optimization https://www.youtube.com/watch?v=nQxXddJDTfc
  • 30. MedChemica More examples of Success 30 Thompson; M.J. et al J. Med. Chem., 2015, 58 (23), pp 9309–9333 DOI: 10.1021/acs.jmedchem.5b01312
  • 31. MedChemica “Me-Betters” on a Massive scale Enumerator System 1162 Marketed Drugs Wealth of Follow-on opportunities Grand Rule Database v3 Improve solubility & metabolism = lower dose = uid from bid/tid Safer, better compliance ~425 improvement suggestions / drug
  • 33. MedChemica • MMP based clustering • QSAR from MMPA • Matched molecular series •Interface design is key There is so much more… ?
  • 34. MedChemica What can we do with Advanced Analytics? Accelerate Chemistry by using: • right algorithms that our users understand • as much data as possible • fast, “user appropriate” interfaces deliver better products into development faster. 34

Editor's Notes

  1. Lot’s of people come forward with ideas to ‘revolutionise drug discovery’, but being more data driven is surprisingly cheap compared to most of them. Eg ‘new modalities’ like therapeutic RNAs or chimeric antigen receptors, r even large ring macrocycles.
  2. We may be at the summit but who can tell? And what is around us? Alternatively we may want to have a completely clear view and potential cliffs and valleys, but by the time you get there, so much has been published that compounds are probabaly in the clinic if not to market – but of course there may still be opportunities