SlideShare a Scribd company logo
Mining Compounds, Targets and Indications
from the Patent Corpus with SureChEMBL
and Open PHACTS
George Papadatos, PhD
ChEMBL Group, EMBL-EBI
georgep@ebi.ac.uk
#ShefChem16
Outline
•  Overview of chemical annotations in SureChEMBL
•  Biological annotations for Open PHACTS
•  Relevance scoring
•  Open PHACTS patent API
•  Use cases and examples
Bioactivity data
Compound
Assay/Target
>Thrombin
MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLE
RECVEETCSYEEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGT
NYRGHVNITRSGIECQLWRSRYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYT
TDPTVRRQECSIPVCGQDQVTVAMTPRSEGSSVNLSPPLEQCVPDRGQQYQGRLAVT
THGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGDEEGVWCYVAGKPGDFGY
CDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEADCGLRPLF
EKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDR
WVLTAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWR
ENLDRDIALMKLKKPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTA
NVGKGQPSVLQVVNLPIVERPVCKDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGG
PFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFYTHVFRLKKWIQKVIDQFGE
3. Insight, tools and resources for translational drug discovery
2. Organization, integration, curation and standardization of pharmacology data
1. Scientific facts
Ki = 4.5nM
APTT = 11 min.
ChEMBL: Data for drug discovery
What do people do with ChEMBL?
•  Chemical space clustering & visualisation
•  (Q)SAR analysis and Virtual Screening
•  Data modeling, activity cliffs, Free-Wilson, MMP analysis
•  Bioisosteric replacements mining
•  De novo drug design
•  In silico evolutionary compound optimisation
•  Target prediction (target fishing)
•  Off-target prediction and ADR analysis
•  Polypharmacology networks
•  Druggability / Drug-likeness
•  Text mining
•  Third-party application integration
Why looking at patent documents?
•  Patent filing and searching
•  Legal, financial and commercial incentives & interests
•  Prior art, novelty, freedom to operate searches
•  Competitive intelligence
•  Unprecedented wealth of knowledge
•  Most of the knowledge will never be disclosed anywhere else
•  Compounds, scaffolds, reactions, formulations
•  Biological targets, sequences, diseases, indications
•  Average lag of 2-4 years between patent document and journal
publication disclosure for chemistry, 4-5 for biological targets
Rodriguez-Esteban & Bundschus. Text Mining Patents for Biomedical Knowledge. Drug Discovery Today 2016.
From SureChem to SureChEMBL
•  2007: SureChem is founded
•  2010: Acquired by Macmillan
•  2012: Revamped commercial product: SureChemDirect
•  Modular, cloud-based architecture
•  Dec 2013: Donated to EMBL-EBI
•  Sept 2014: SureChEMBL UI is launched
•  Jan 2015: Data client and map file dumps
•  April 2015: Bio-annotations are added
•  May 2016: SureChEMBL data available via Open PHACTS API
SureChEMBL data processing
WO	
EP	
Applica,ons
&	Granted	
US	
Applica,ons	
&	granted	
JP	
Abstracts	
Patent
Offices
Chemistry	
Database	
SureChEMBL System
Patent	
PDFs	
(service)	
Applica,on	
Users	
API	
Database	
En,ty	
Recogni,on	
SureChem IP
1-[4-ethoxy-3-(6,7-dihydro-1-methyl-7-oxo-3-
propyl-1H-pyrazolo[4,3-d]pyrimidin-5-
yl)phenylsulfonyl]-4-methylpiperazine	
Image	to	
Structure	
(one	method)	
Molfiles	
Name	to	
Structure	
(five	methods)	
OCR
Processed	
patents	
(service)
Homepage
Help	
Search	by	keywords	and	
meta-data	
Search	by	
chemical	
structure	
(sketch	
compound)	
Search	by	SMILES,	
MOL,	SMARTS,	
name,	InChI	
Search	by	patent	number	
Filter	by	authority	
(US,	EP,	WO	and	JP)	
Filter	by	document	sec,on	
(,tle,	claims,	abstract,	
descrip,on	and	images)	
Chemical	
search	type	
(substructure,	
similarity,	
iden,cal)	
Filter	
by	date	
Filter	by	MW	
www.surechembl.org
www.surechembl.org
www.surechembl.org	
http://chembl.blogspot.co.uk/2015/10/advanced-keyword-and-structure-searches.html
https://www.surechembl.org/document/US-9359381-B2
https://www.surechembl.org/document/US-9359381-B2
On-the-fly bio-annotation with mapping to DB IDs and ontologies
https://www.surechembl.org/document/US-9359381-B2
https://www.surechembl.org/document/US-9359381-B2
https://www.surechembl.org/document/US-9359381-B2
SureChEMBL Data Content and Growth
•  > 18M unique structures
•  ~13M are in a drug-like phys-chem space
•  > 15M chemically annotated patents 
•  ~5M are life-science relevant (based on patent codes)
•  ~80,000 novel compounds extracted from ~50,000 new
patents monthly
•  1–7 days for a published patent to be chemically annotated,
indexed and searchable in SureChEMBL
•  SureChEMBL provides search access to all patents (not just
chemically annotated ones) ~120M patents
•  Connectivity match on single components - UniChem
ChEMBL-SureChEMBL overlap
21.4%
SureChEMBL
ChEMBL
1.5M
17M
Too granular? Try scaffolds instead
Level 1 scaffold overlap
57%
SureChEMBLChEMBL
61K
298K
Level 1 scaffold overlap
57%
SureChEMBLChEMBL
61K
298K
Can we have everything?
Cost
TimeQuality
Common sources of errors
•  Small, poor quality images
•  OCR errors in names (OCR done by IFI). There is an OCR correction
step, but cannot fix all errors
Becomes: ‘2,6-Difluoro-Λ/-{1 -r(4-iodo-2-methylphenyl)methvn-1 H-
pyrazol-3- vDbenzamide’
•  Reliability better for US patents due to inclusion of mol files
hps://www.surechembl.org/document/US-20150183799-A1	Table	2	
PDF	
XML	
SC	 Google	patents
What about testing?
•  Rigorous testing hampered by lack of good test sets
•  Akhondi corpus* annotates compounds, genes and diseases
•  à But not normalised to identifiers
•  à Does not convert to compound structures
•  à Identifies all entities, not relevant entities
•  BioCreative competition corpus covers only abstracts (21K)
•  Comparison with ‘gold standard’/’ground truth’ data sets
•  SciFinder for compounds / Other sets for relevance
•  GVK BIO and BindingDB for targets
•  BUT not necessarily comprehensive, so can’t assess precision
* Akhondi et al (2014)
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0107477
Comparison of compound coverage
•  Comparison of compounds with SciFinder
•  Set of 47 patents from Akhondi* corpus
•  SciFinder compounds for these extracted
•  65% of SciFinder compounds were found by SureChEMBL
•  ‘Missed’ compounds included some cases that were out of scope
for SureChEMBL pipeline (e.g., Markush, tables)
* Akhondi et al (2014)
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0107477
Accessing SureChEMBL data
Web Interface
Data client feed
Local compound-patent db schema
Daily updates
Compound-patent map file
Quarterly updates
UniChem
Weekly updates
Access to 135M structures
http://chembl.blogspot.co.uk/2015/08/accessing-surechembl-data-in-bulk.html
Identify key compounds in patent docs
•  What are key compounds in med. chem. patents?
•  ‘Best’, most promising/important compound
•  Drug candidate
•  Optimal property profile
•  Most active
•  Automated identification of key compounds in patents
•  Using compound information only
•  Main assumption: busy chemical space around key compounds
•  Number of NNs, MCS extraction and R-group frequency
Hattori, K., Wakabayashi, H., & Tamaki, K. (2008). JCIM, 48(1), 135–142. doi:10.1021/ci7002686
Tyrchan, C., Boström, J., Giordanetto, F., Winter, J., & Muresan, S. (2012). JCIM, 52(6), 1480–1489.
doi:10.1021/ci3001293
Workflow
1.  Read a file with chemistry extracted from the Levitra family of patents
•  US65663601
2.  Filter by different text-mining and chemoinformatics properties to
remove noise and enrich the genuinely novel structures
3.  Visualise the chemical space using MDS and dimensionality reduction
•  Identify and fix outliers in the chemistry space.
4.  Find the Murcko and MCS scaffolds
5.  Compare the derived MCS core with the actual Markush structure
6.  Identify key compounds using structural information only
•  Count number of NNs per compound
•  R-Group decomposition and frequency of R-Groups
1Sayle, R. et al. (2012). JCIM, 52(1), 51–62. doi:10.1021/ci200463r
Scaffold and chemical space analysis
hp://nbviewer.ipython.org/github/rdkit/UGM_2014/blob/master/Notebooks/Vardenafil.ipynb
Bio-annotations and Open PHACTS
SureChEMBL bio-annotation
•  SciBite’s Termite text-mining engine run on 4M life-science patents
from SureChEMBL corpus
•  Genes (identified by HGNC symbols, e.g. FDFT1) and diseases
(identified by MeSH IDs, e.g. D009765) annotated
•  Section/frequency information annotated (e.g., in title, abstract,
claims, total frequency)
•  Relevance score (0-3) to flag important chemical and biological
entities and remove noise
Relevance scoring
•  Patents designed to obfuscate – not all entities are equal
Over 50 proteins mentioned!
SRC proto-oncogene, non-receptor tyrosine kinase ; aurora kinase A ; insulin ; microtubule-associated protein tau ; catenin (cadherin-
associated protein), beta 1, 88kDa ; interleukin 1, alpha ; c-src tyrosine kinase ; phosducin-like 2 ; colony stimulating factor 2 (granulocyte-
macrophage) ; tumor necrosis factor ; glycogen synthase kinase 3 beta ; amyloid beta (A4) precursor protein ; asparaginase ; interferon,
beta 1, fibroblast ; jun proto-oncogene ; coagulation factor II (thrombin) ; acetylcholinesterase ; FGR proto-oncogene, Src family tyrosine
kinase ; BLK proto-oncogene, Src family tyrosine kinase ; angiotensin I converting enzyme ; albumin ; axin 1 ; heat shock transcription
factor 1 ; v-myb avian myeloblastosis viral oncogene homolog ; NADH dehydrogenase, subunit 1 (complex I) ; LYN proto-oncogene, Src
family tyrosine kinase ; LCK proto-oncogene, Src family tyrosine kinase ; CCAAT/enhancer binding protein (C/EBP), alpha ; HCK proto-
oncogene, Src family tyrosine kinase ; ATP citrate lyase ; Glycogen synthase kinase 3 ; beta-catenin ; interferon ; tnf superfamily ; lactate
dehydrogenase ; neurotrophic factor ; pyruvate kinase ; fibroblast growth factor ; glycogen synthase ; tumor necrosis factor alpha ; serine
threonine kinase ; tau protein ; interleukin ; albumin ; thrombin ; eif2b ; ion channel ; cytokine ; heat shock factor ; axin ; cAMP
Response element binding protein ; amyloid beta ; Microtubule Interacting Protein ; c-Jun N-terminal kinases ; src homology ; calcium
channel ; atpase ; acetylcholinesterase
Some more interesting than others…
The method of claim 27, wherein the method comprises inhibiting Aurora-2, GSK-3, or Src activity
Slide by Lee Harland
Same for compounds!
Real scope of patent
Same for compounds!
Tens of drugs trivially mentioned
Real scope of patent
Same for compounds!
Substituents
Tens of drugs trivially mentioned
Real scope of patent
Relevance scoring – genes/diseases
•  Various features used:
•  Term frequency
•  Position (title, abstract, figure, caption, table)
•  Frequency distribution
•  Scores range from 0 – 3
•  3 – most important entities in the patent
•  2 – important entities in the patent
•  1 – mentioned entities in the patent
•  0 – ambiguous entity/likely annotation error
Relevance scoring - compounds
•  Main assumptions for relevance:
1.  Very frequent compounds are usually irrelevant
2.  Compounds with busy chemical space around them are interesting
•  Use distribution of close analogues (NNs) among compounds found in the same
patent family – FCFP_4 similarity
•  Scores range from 0 – 3
•  3 – highest number of NNs: most important entities in the patent
•  2 – important entities in the patent
•  1 – few NNs: mentioned entities in the patent
•  0 – singletons or trivial entities, most likely errors or reagents, solvents,
substituents
Hatori, K., Wakabayashi, H., & Tamaki, K. (2008). JCIM, 48(1), 135–142. doi:10.1021/ci7002686
Tyrchan, C., Boström, J., Giordaneto, F., Winter, J., & Muresan, S. (2012). JCIM, 52(6), 1480–1489.
doi:10.1021/ci3001293
Open PHACTS Sources
Open PHACTS Architecture
Drug Discovery Today 2012, 17:21 (doi:10.1016/j.drudis.2012.05.016)
Open PHACTS Patent API
https://dev.openphacts.org/docs/2.1
Open PHACTS Patent API
Disease	
Compound	
Target	
Patent	
extracted	
links	
https://dev.openphacts.org/docs/2.1
Open PHACTS Patent API
Disease	
Compound	
Target	
Patent	
inferred	links	
extracted	
links
Use case #1: Patent to Entities
1.  From a patent get compounds, genes and diseases
2.  Filter to remove noise
•  Frequency and relevance score
3.  Process and visualise
Disease	
Compound	
Target	
Patent	
?
?
?
https://www.surechembl.org/document/US-7718693-B2/
https://www.surechembl.org/document/US-7718693-B2/
201 relevant structures:
23 relevant targets and diseases:
Does it make sense?
Does it make sense?
US-7718693-B2
Does it make sense?
•  Calculate MCS
Does it make sense?
•  Calculate MCS
US-7718693-B2
Use case #2: Drug targets & indications for compound
1.  Search patents for a compound (approved drug)
2.  Filter to remove noise
•  Frequency, relevance score and classification code
3.  For remaining patents, get disease and target entities
4.  Filter to remove noise
•  Frequency and relevance score
5.  Visualise results
Disease	
Compound	
Target	
Patent
Eluxadoline (JNJ-27018966, VIBERZI)
CHEMBL2159122
SCHEMBL12971682
FDA Approval: 2015
1) Get patents for Eluxadoline
•  UniChem call à SCHEMBL12971682
•  Compound URI:
•  http://rdf.ebi.ac.uk/resource/surechembl/molecule/SCHEMBL12971682
•  API call:
•  Relevance score >=1 à 17 patents (patentome):
1) Get patents for Eluxadoline
•  17 relevant patents (patentome):
Results
Relevant targets:
Relevant diseases:
Does it make sense?
http://www.accessdata.fda.gov/drugsatfda_docs/label/2015/206940s000lbl.pdf
Does it make sense?
http://www.accessdata.fda.gov/drugsatfda_docs/label/2015/206940s000lbl.pdf
New PARP inhibitors
From target to scaffolds
PARP1
Niraparib, Phase 3
Olaparib, FDA-approved
Talazoparib, Phase 3
WO-2008020180-A2
US-20100035883-A1
US-20080167345-A1
Take home message
•  It is now possible to search and interlink the key structures,
scaffolds, targets and diseases from med. chem. patent
corpus automatically
•  By high-throughput text-mining only
•  Thanks to simple heuristics (relevance scores and frequency)
•  Using the Open PHACTS API
•  For the first time in a free resource in such scale
Disease	
Compound	
Target	
Patent
What next: Other ideas and use cases
•  Target validation / Druggability
•  For a target, get me all related and relevant diseases
•  Compare with DisGeNET / Open Targets, etc.
•  Any known patented scaffolds for my target?
•  Start a pharmacophore hypothesis for patent busting
•  Find a tool compound for a new assay
•  Novelty checking / Due diligence
•  What do we know about this scaffold / compound?
•  Add ChEMBL pharmacology and pathway information
The Data Are Out There
What’s cooking in ChEMBL22
BindingDB patent set
•  BindingDB manually extracts binding activities from tables
in granted US med. chem. patents
•  Coverage 2013-2016, currently ~1,100 patents
•  On-going effort
•  Includes exact and binned bioactivities
•  Ki, IC50, Kd, EC50 activity types
•  Compounds mapped to example no in patent tables
•  Detailed assay description
•  http://www.bindingdb.org/bind/ByPatent.jsp
BindingDB patent bioactivity integration
•  1,017 US patents (mapped to SureChEMBL IDs)
•  68.6K compounds
•  ~70 unique cmpds per patent
•  Only 5% exist in ChEMBL (exact match) à complementary
•  450 single protein targets
•  ~10% not in ChEMBL, e.g. GPR4 in US-8748435-B2
•  100,329 bioactivity values
•  ~100 activities per patent
•  Compounds and targets curated by ChEMBL
•  Available in ChEMBL22
http://www.bindingdb.org/bind/ByPatent.jsp
Example: BI patent
https://www.surechembl.org/document/US-9120769-B2/
174 activities in Table 2
IC50=94nM against HSD11B1 (P28845)
Manual example number dereference
Conclusion: current patent data landscape
•  ChEMBL
•  Literature SAR data, drug annotations, manually curated
•  Coming soon: bioactivities from patents
•  SureChEMBL
•  Chemical annotations automatically mined from patents
•  Open PHACTS
•  The above plus biological annotations automatically mined
from patents, with relevance scoring
•  Public API
Acknowledgements
•  ChEMBL and SureChEMBL
•  Anna Gaulton
•  Mark Davies
•  Nathan Dedman
•  James Siddle
•  Anne Hersey
•  SciBite
•  Lee Harland
•  Open PHACTS consortium
•  Nick Lynch
•  Daniela Digles
•  Antonis Loizou
•  surechembl-help@ebi.ac.uk
Technology partners
Mining Compounds, Targets and Indications
from the Patent Corpus with SureChEMBL
and Open PHACTS
George Papadatos, PhD
ChEMBL Group, EMBL-EBI
georgep@ebi.ac.uk
#ShefChem16

More Related Content

What's hot

Knowledge is Property- All YOU need to know ABC of Patent Searching
Knowledge is Property- All YOU need to know ABC of Patent SearchingKnowledge is Property- All YOU need to know ABC of Patent Searching
Knowledge is Property- All YOU need to know ABC of Patent Searching
Prity Khastgir IPR Strategic India Patent Attorney Amplify Innovation
 
Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...
Sunghwan Kim
 
Digging out Structures for Repurposing: Non-competitive Intelligence ...
Digging out Structures for Repurposing: Non-competitive Intelligence        ...Digging out Structures for Repurposing: Non-competitive Intelligence        ...
Digging out Structures for Repurposing: Non-competitive Intelligence ...
Chris Southan
 
PubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligencePubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligence
Sunghwan Kim
 
Patent chemisty big bang: utilities for SMEs
Patent chemisty big bang: utilities for SMEsPatent chemisty big bang: utilities for SMEs
Patent chemisty big bang: utilities for SMEs
Chris Southan
 
ICIC 2014 From SureChem to SureChEMBL
ICIC 2014 From SureChem to SureChEMBLICIC 2014 From SureChem to SureChEMBL
ICIC 2014 From SureChem to SureChEMBL
Dr. Haxel Consult
 
Searching for patent information in PubChem
Searching for patent information in PubChem Searching for patent information in PubChem
Searching for patent information in PubChem
Sunghwan Kim
 
ChemSpider Overview SLides August 2007
ChemSpider Overview SLides August 2007ChemSpider Overview SLides August 2007
How can you access PubChem programmatically?
How can you access PubChem programmatically?How can you access PubChem programmatically?
How can you access PubChem programmatically?
Sunghwan Kim
 
Giab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptxGiab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptx
GenomeInABottle
 
Assay Development and Drug Repurposing Core
Assay Development and Drug Repurposing CoreAssay Development and Drug Repurposing Core
Assay Development and Drug Repurposing Core
Michigan State University Research
 
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
Chris Southan
 
Jan2016 horizon GIAB
Jan2016 horizon GIABJan2016 horizon GIAB
Jan2016 horizon GIAB
GenomeInABottle
 
Aug2015 horizon diagnostics
Aug2015 horizon diagnosticsAug2015 horizon diagnostics
Aug2015 horizon diagnostics
GenomeInABottle
 
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
Dr. Haxel Consult
 
PubChem LCSS
PubChem LCSSPubChem LCSS
PubChem LCSS
Jian Zhang
 
ASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottleASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottle
GenomeInABottle
 
Aug2015 Giab nist integration methods
Aug2015 Giab nist integration methodsAug2015 Giab nist integration methods
Aug2015 Giab nist integration methods
GenomeInABottle
 
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
Prof. Wim Van Criekinge
 
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
GenomeInABottle
 

What's hot (20)

Knowledge is Property- All YOU need to know ABC of Patent Searching
Knowledge is Property- All YOU need to know ABC of Patent SearchingKnowledge is Property- All YOU need to know ABC of Patent Searching
Knowledge is Property- All YOU need to know ABC of Patent Searching
 
Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...
 
Digging out Structures for Repurposing: Non-competitive Intelligence ...
Digging out Structures for Repurposing: Non-competitive Intelligence        ...Digging out Structures for Repurposing: Non-competitive Intelligence        ...
Digging out Structures for Repurposing: Non-competitive Intelligence ...
 
PubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligencePubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligence
 
Patent chemisty big bang: utilities for SMEs
Patent chemisty big bang: utilities for SMEsPatent chemisty big bang: utilities for SMEs
Patent chemisty big bang: utilities for SMEs
 
ICIC 2014 From SureChem to SureChEMBL
ICIC 2014 From SureChem to SureChEMBLICIC 2014 From SureChem to SureChEMBL
ICIC 2014 From SureChem to SureChEMBL
 
Searching for patent information in PubChem
Searching for patent information in PubChem Searching for patent information in PubChem
Searching for patent information in PubChem
 
ChemSpider Overview SLides August 2007
ChemSpider Overview SLides August 2007ChemSpider Overview SLides August 2007
ChemSpider Overview SLides August 2007
 
How can you access PubChem programmatically?
How can you access PubChem programmatically?How can you access PubChem programmatically?
How can you access PubChem programmatically?
 
Giab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptxGiab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptx
 
Assay Development and Drug Repurposing Core
Assay Development and Drug Repurposing CoreAssay Development and Drug Repurposing Core
Assay Development and Drug Repurposing Core
 
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
 
Jan2016 horizon GIAB
Jan2016 horizon GIABJan2016 horizon GIAB
Jan2016 horizon GIAB
 
Aug2015 horizon diagnostics
Aug2015 horizon diagnosticsAug2015 horizon diagnostics
Aug2015 horizon diagnostics
 
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
 
PubChem LCSS
PubChem LCSSPubChem LCSS
PubChem LCSS
 
ASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottleASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottle
 
Aug2015 Giab nist integration methods
Aug2015 Giab nist integration methodsAug2015 Giab nist integration methods
Aug2015 Giab nist integration methods
 
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
 
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
 

Viewers also liked

Data Visualization
Data VisualizationData Visualization
Data Visualization
George Sloane
 
Five Incredibly Important, Earthshaking Shifts in Consumer Behavior...in 30 M...
Five Incredibly Important, Earthshaking Shifts in Consumer Behavior...in 30 M...Five Incredibly Important, Earthshaking Shifts in Consumer Behavior...in 30 M...
Five Incredibly Important, Earthshaking Shifts in Consumer Behavior...in 30 M...
mikesolomon
 
2016-11-30 BitVisor Summit 5 「BitVisorの現状と今後」(公開版)
2016-11-30 BitVisor Summit 5 「BitVisorの現状と今後」(公開版)2016-11-30 BitVisor Summit 5 「BitVisorの現状と今後」(公開版)
2016-11-30 BitVisor Summit 5 「BitVisorの現状と今後」(公開版)
Takahiro Shinagawa
 
Social media for placement networking
Social media for placement networkingSocial media for placement networking
Social media for placement networking
Sue Beckingham
 
Creating Learning Environments with Communities of Practice
Creating Learning Environments with Communities of PracticeCreating Learning Environments with Communities of Practice
Creating Learning Environments with Communities of Practice
Olivier Serrat
 
6 things every LinkedIn profile needs
6 things every LinkedIn profile needs6 things every LinkedIn profile needs
6 things every LinkedIn profile needs
Jeff Shuey
 
Accelerate you customer Acquisition with Sales Sprints
Accelerate you customer Acquisition with Sales SprintsAccelerate you customer Acquisition with Sales Sprints
Accelerate you customer Acquisition with Sales Sprints
Michael Humblet
 
Ionic 2 como ferramenta para desenvolvimento móvel
Ionic 2 como ferramenta para desenvolvimento móvelIonic 2 como ferramenta para desenvolvimento móvel
Ionic 2 como ferramenta para desenvolvimento móvel
Gustavo Costa
 
Débat du 12 janvier 2017_Primaires citoyennes_ELABE_1
Débat du 12 janvier 2017_Primaires citoyennes_ELABE_1Débat du 12 janvier 2017_Primaires citoyennes_ELABE_1
Débat du 12 janvier 2017_Primaires citoyennes_ELABE_1
contact Elabe
 
Incubation de demandeurs d'emploi au sein d'un espace de Coworking à Lille
Incubation de demandeurs d'emploi au sein d'un espace de Coworking à LilleIncubation de demandeurs d'emploi au sein d'un espace de Coworking à Lille
Incubation de demandeurs d'emploi au sein d'un espace de Coworking à Lille
Emmanuel Duvette
 
Serverless Architectures and Continuous Delivery
Serverless Architectures and Continuous DeliveryServerless Architectures and Continuous Delivery
Serverless Architectures and Continuous Delivery
Robin Weston
 
Congress seeks metrics for the NIST Cybersecurity Framework
Congress seeks metrics for the NIST Cybersecurity FrameworkCongress seeks metrics for the NIST Cybersecurity Framework
Congress seeks metrics for the NIST Cybersecurity Framework
David Sweigert
 
DAR Resposta - Guia para famílias após o diagnóstico de autismo
DAR Resposta - Guia para famílias após o diagnóstico de autismoDAR Resposta - Guia para famílias após o diagnóstico de autismo
DAR Resposta - Guia para famílias após o diagnóstico de autismo
Sara Martins
 
Create responsive websites with Django, REST and AngularJS
Create responsive websites with Django, REST and AngularJSCreate responsive websites with Django, REST and AngularJS
Create responsive websites with Django, REST and AngularJS
Hannes Hapke
 
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までーDeep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
nlab_utokyo
 
¿Qué es inbound marketing?
¿Qué es inbound marketing?¿Qué es inbound marketing?
¿Qué es inbound marketing?
HubSpot Español
 
Getting Radical with Brand
Getting Radical with BrandGetting Radical with Brand
Getting Radical with Brand
Heidi Hackemer
 
IT Workers Share Their Best IT Career Advice
IT Workers Share Their Best IT Career AdviceIT Workers Share Their Best IT Career Advice
IT Workers Share Their Best IT Career Advice
Robert Half
 

Viewers also liked (20)

Data Visualization
Data VisualizationData Visualization
Data Visualization
 
Five Incredibly Important, Earthshaking Shifts in Consumer Behavior...in 30 M...
Five Incredibly Important, Earthshaking Shifts in Consumer Behavior...in 30 M...Five Incredibly Important, Earthshaking Shifts in Consumer Behavior...in 30 M...
Five Incredibly Important, Earthshaking Shifts in Consumer Behavior...in 30 M...
 
2016-11-30 BitVisor Summit 5 「BitVisorの現状と今後」(公開版)
2016-11-30 BitVisor Summit 5 「BitVisorの現状と今後」(公開版)2016-11-30 BitVisor Summit 5 「BitVisorの現状と今後」(公開版)
2016-11-30 BitVisor Summit 5 「BitVisorの現状と今後」(公開版)
 
Social media for placement networking
Social media for placement networkingSocial media for placement networking
Social media for placement networking
 
Creating Learning Environments with Communities of Practice
Creating Learning Environments with Communities of PracticeCreating Learning Environments with Communities of Practice
Creating Learning Environments with Communities of Practice
 
6 things every LinkedIn profile needs
6 things every LinkedIn profile needs6 things every LinkedIn profile needs
6 things every LinkedIn profile needs
 
Accelerate you customer Acquisition with Sales Sprints
Accelerate you customer Acquisition with Sales SprintsAccelerate you customer Acquisition with Sales Sprints
Accelerate you customer Acquisition with Sales Sprints
 
Ionic 2 como ferramenta para desenvolvimento móvel
Ionic 2 como ferramenta para desenvolvimento móvelIonic 2 como ferramenta para desenvolvimento móvel
Ionic 2 como ferramenta para desenvolvimento móvel
 
Débat du 12 janvier 2017_Primaires citoyennes_ELABE_1
Débat du 12 janvier 2017_Primaires citoyennes_ELABE_1Débat du 12 janvier 2017_Primaires citoyennes_ELABE_1
Débat du 12 janvier 2017_Primaires citoyennes_ELABE_1
 
Incubation de demandeurs d'emploi au sein d'un espace de Coworking à Lille
Incubation de demandeurs d'emploi au sein d'un espace de Coworking à LilleIncubation de demandeurs d'emploi au sein d'un espace de Coworking à Lille
Incubation de demandeurs d'emploi au sein d'un espace de Coworking à Lille
 
Serverless Architectures and Continuous Delivery
Serverless Architectures and Continuous DeliveryServerless Architectures and Continuous Delivery
Serverless Architectures and Continuous Delivery
 
Congress seeks metrics for the NIST Cybersecurity Framework
Congress seeks metrics for the NIST Cybersecurity FrameworkCongress seeks metrics for the NIST Cybersecurity Framework
Congress seeks metrics for the NIST Cybersecurity Framework
 
DAR Resposta - Guia para famílias após o diagnóstico de autismo
DAR Resposta - Guia para famílias após o diagnóstico de autismoDAR Resposta - Guia para famílias após o diagnóstico de autismo
DAR Resposta - Guia para famílias após o diagnóstico de autismo
 
Create responsive websites with Django, REST and AngularJS
Create responsive websites with Django, REST and AngularJSCreate responsive websites with Django, REST and AngularJS
Create responsive websites with Django, REST and AngularJS
 
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までーDeep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
 
Ciclo cardiaco
Ciclo cardiacoCiclo cardiaco
Ciclo cardiaco
 
¿Qué es inbound marketing?
¿Qué es inbound marketing?¿Qué es inbound marketing?
¿Qué es inbound marketing?
 
Getting Radical with Brand
Getting Radical with BrandGetting Radical with Brand
Getting Radical with Brand
 
IT Workers Share Their Best IT Career Advice
IT Workers Share Their Best IT Career AdviceIT Workers Share Their Best IT Career Advice
IT Workers Share Their Best IT Career Advice
 
Coordenadas Geograficas
Coordenadas GeograficasCoordenadas Geograficas
Coordenadas Geograficas
 

Similar to SureChEMBL and Open PHACTS

Mining Drug Targets, Structures and Activity Data
Mining Drug Targets, Structures and Activity DataMining Drug Targets, Structures and Activity Data
Mining Drug Targets, Structures and Activity Data
Chris Southan
 
An Integrated Approach To Drug Discovery Using Parallel Synthesis
An Integrated Approach To Drug Discovery Using Parallel SynthesisAn Integrated Approach To Drug Discovery Using Parallel Synthesis
An Integrated Approach To Drug Discovery Using Parallel Synthesis
Graham Smith
 
Practical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS projectPractical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS project
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Patent annotations: From SureChEMBL to Open PHACTS
Patent annotations: From SureChEMBL to Open PHACTSPatent annotations: From SureChEMBL to Open PHACTS
Patent annotations: From SureChEMBL to Open PHACTS
open_phacts
 
Mashing Up Drug Discovery
Mashing Up Drug DiscoveryMashing Up Drug Discovery
Mashing Up Drug Discovery
SciBite Limited
 
Mapping millions of peptidoforms to Genome Coordinates
Mapping millions of peptidoforms to Genome CoordinatesMapping millions of peptidoforms to Genome Coordinates
Mapping millions of peptidoforms to Genome Coordinates
Yasset Perez-Riverol
 
100505 koenig biological_databases
100505 koenig biological_databases100505 koenig biological_databases
100505 koenig biological_databasesMeetika Gupta
 
New Approach Methods - What is That?
New Approach Methods - What is That?New Approach Methods - What is That?
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
Prof. Wim Van Criekinge
 
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Web-based access to experimental and predicted data for environmental fate, t...
Web-based access to experimental and predicted data for environmental fate, t...Web-based access to experimental and predicted data for environmental fate, t...
Web-based access to experimental and predicted data for environmental fate, t...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
Prof. Wim Van Criekinge
 
Heptotox biomarker discovery Ontomine
Heptotox biomarker discovery OntomineHeptotox biomarker discovery Ontomine
Heptotox biomarker discovery Ontomine
Rajeev Gangal
 
Pistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS Foundation
Pistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS FoundationPistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS Foundation
Pistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS Foundation
Pistoia Alliance
 
GPU-accelerated Virtual Screening
GPU-accelerated Virtual ScreeningGPU-accelerated Virtual Screening
GPU-accelerated Virtual Screening
Olexandr Isayev
 
Discovery PBPK: Efficiently using machine learning & PBPK modeling to drive l...
Discovery PBPK: Efficiently using machine learning & PBPK modeling to drive l...Discovery PBPK: Efficiently using machine learning & PBPK modeling to drive l...
Discovery PBPK: Efficiently using machine learning & PBPK modeling to drive l...
PhinC Development
 
Quantifying the content of biomedical semantic resources as a core for drug d...
Quantifying the content of biomedical semantic resources as a core for drug d...Quantifying the content of biomedical semantic resources as a core for drug d...
Quantifying the content of biomedical semantic resources as a core for drug d...
Syed Muhammad Ali Hasnain
 
Vanderwall cheminformatics Drexel Part 1
Vanderwall cheminformatics Drexel Part 1Vanderwall cheminformatics Drexel Part 1
Vanderwall cheminformatics Drexel Part 1
Jean-Claude Bradley
 
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 

Similar to SureChEMBL and Open PHACTS (20)

Mining Drug Targets, Structures and Activity Data
Mining Drug Targets, Structures and Activity DataMining Drug Targets, Structures and Activity Data
Mining Drug Targets, Structures and Activity Data
 
An Integrated Approach To Drug Discovery Using Parallel Synthesis
An Integrated Approach To Drug Discovery Using Parallel SynthesisAn Integrated Approach To Drug Discovery Using Parallel Synthesis
An Integrated Approach To Drug Discovery Using Parallel Synthesis
 
Practical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS projectPractical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS project
 
Patent annotations: From SureChEMBL to Open PHACTS
Patent annotations: From SureChEMBL to Open PHACTSPatent annotations: From SureChEMBL to Open PHACTS
Patent annotations: From SureChEMBL to Open PHACTS
 
Mashing Up Drug Discovery
Mashing Up Drug DiscoveryMashing Up Drug Discovery
Mashing Up Drug Discovery
 
Mapping millions of peptidoforms to Genome Coordinates
Mapping millions of peptidoforms to Genome CoordinatesMapping millions of peptidoforms to Genome Coordinates
Mapping millions of peptidoforms to Genome Coordinates
 
100505 koenig biological_databases
100505 koenig biological_databases100505 koenig biological_databases
100505 koenig biological_databases
 
New Approach Methods - What is That?
New Approach Methods - What is That?New Approach Methods - What is That?
New Approach Methods - What is That?
 
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
 
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
 
GMueller_Barcelona
GMueller_BarcelonaGMueller_Barcelona
GMueller_Barcelona
 
Web-based access to experimental and predicted data for environmental fate, t...
Web-based access to experimental and predicted data for environmental fate, t...Web-based access to experimental and predicted data for environmental fate, t...
Web-based access to experimental and predicted data for environmental fate, t...
 
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
 
Heptotox biomarker discovery Ontomine
Heptotox biomarker discovery OntomineHeptotox biomarker discovery Ontomine
Heptotox biomarker discovery Ontomine
 
Pistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS Foundation
Pistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS FoundationPistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS Foundation
Pistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS Foundation
 
GPU-accelerated Virtual Screening
GPU-accelerated Virtual ScreeningGPU-accelerated Virtual Screening
GPU-accelerated Virtual Screening
 
Discovery PBPK: Efficiently using machine learning & PBPK modeling to drive l...
Discovery PBPK: Efficiently using machine learning & PBPK modeling to drive l...Discovery PBPK: Efficiently using machine learning & PBPK modeling to drive l...
Discovery PBPK: Efficiently using machine learning & PBPK modeling to drive l...
 
Quantifying the content of biomedical semantic resources as a core for drug d...
Quantifying the content of biomedical semantic resources as a core for drug d...Quantifying the content of biomedical semantic resources as a core for drug d...
Quantifying the content of biomedical semantic resources as a core for drug d...
 
Vanderwall cheminformatics Drexel Part 1
Vanderwall cheminformatics Drexel Part 1Vanderwall cheminformatics Drexel Part 1
Vanderwall cheminformatics Drexel Part 1
 
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
 

Recently uploaded

STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 

Recently uploaded (20)

STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 

SureChEMBL and Open PHACTS

  • 1. Mining Compounds, Targets and Indications from the Patent Corpus with SureChEMBL and Open PHACTS George Papadatos, PhD ChEMBL Group, EMBL-EBI georgep@ebi.ac.uk #ShefChem16
  • 2. Outline •  Overview of chemical annotations in SureChEMBL •  Biological annotations for Open PHACTS •  Relevance scoring •  Open PHACTS patent API •  Use cases and examples
  • 3. Bioactivity data Compound Assay/Target >Thrombin MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLE RECVEETCSYEEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGT NYRGHVNITRSGIECQLWRSRYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYT TDPTVRRQECSIPVCGQDQVTVAMTPRSEGSSVNLSPPLEQCVPDRGQQYQGRLAVT THGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGDEEGVWCYVAGKPGDFGY CDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEADCGLRPLF EKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDR WVLTAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWR ENLDRDIALMKLKKPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTA NVGKGQPSVLQVVNLPIVERPVCKDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGG PFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFYTHVFRLKKWIQKVIDQFGE 3. Insight, tools and resources for translational drug discovery 2. Organization, integration, curation and standardization of pharmacology data 1. Scientific facts Ki = 4.5nM APTT = 11 min. ChEMBL: Data for drug discovery
  • 4. What do people do with ChEMBL? •  Chemical space clustering & visualisation •  (Q)SAR analysis and Virtual Screening •  Data modeling, activity cliffs, Free-Wilson, MMP analysis •  Bioisosteric replacements mining •  De novo drug design •  In silico evolutionary compound optimisation •  Target prediction (target fishing) •  Off-target prediction and ADR analysis •  Polypharmacology networks •  Druggability / Drug-likeness •  Text mining •  Third-party application integration
  • 5. Why looking at patent documents? •  Patent filing and searching •  Legal, financial and commercial incentives & interests •  Prior art, novelty, freedom to operate searches •  Competitive intelligence •  Unprecedented wealth of knowledge •  Most of the knowledge will never be disclosed anywhere else •  Compounds, scaffolds, reactions, formulations •  Biological targets, sequences, diseases, indications •  Average lag of 2-4 years between patent document and journal publication disclosure for chemistry, 4-5 for biological targets Rodriguez-Esteban & Bundschus. Text Mining Patents for Biomedical Knowledge. Drug Discovery Today 2016.
  • 6. From SureChem to SureChEMBL •  2007: SureChem is founded •  2010: Acquired by Macmillan •  2012: Revamped commercial product: SureChemDirect •  Modular, cloud-based architecture •  Dec 2013: Donated to EMBL-EBI •  Sept 2014: SureChEMBL UI is launched •  Jan 2015: Data client and map file dumps •  April 2015: Bio-annotations are added •  May 2016: SureChEMBL data available via Open PHACTS API
  • 7. SureChEMBL data processing WO EP Applica,ons & Granted US Applica,ons & granted JP Abstracts Patent Offices Chemistry Database SureChEMBL System Patent PDFs (service) Applica,on Users API Database En,ty Recogni,on SureChem IP 1-[4-ethoxy-3-(6,7-dihydro-1-methyl-7-oxo-3- propyl-1H-pyrazolo[4,3-d]pyrimidin-5- yl)phenylsulfonyl]-4-methylpiperazine Image to Structure (one method) Molfiles Name to Structure (five methods) OCR Processed patents (service)
  • 13. On-the-fly bio-annotation with mapping to DB IDs and ontologies https://www.surechembl.org/document/US-9359381-B2
  • 16. SureChEMBL Data Content and Growth •  > 18M unique structures •  ~13M are in a drug-like phys-chem space •  > 15M chemically annotated patents  •  ~5M are life-science relevant (based on patent codes) •  ~80,000 novel compounds extracted from ~50,000 new patents monthly •  1–7 days for a published patent to be chemically annotated, indexed and searchable in SureChEMBL •  SureChEMBL provides search access to all patents (not just chemically annotated ones) ~120M patents
  • 17. •  Connectivity match on single components - UniChem ChEMBL-SureChEMBL overlap 21.4% SureChEMBL ChEMBL 1.5M 17M
  • 18. Too granular? Try scaffolds instead
  • 19. Level 1 scaffold overlap 57% SureChEMBLChEMBL 61K 298K
  • 20. Level 1 scaffold overlap 57% SureChEMBLChEMBL 61K 298K
  • 21. Can we have everything? Cost TimeQuality
  • 22. Common sources of errors •  Small, poor quality images •  OCR errors in names (OCR done by IFI). There is an OCR correction step, but cannot fix all errors Becomes: ‘2,6-Difluoro-Λ/-{1 -r(4-iodo-2-methylphenyl)methvn-1 H- pyrazol-3- vDbenzamide’ •  Reliability better for US patents due to inclusion of mol files
  • 24. What about testing? •  Rigorous testing hampered by lack of good test sets •  Akhondi corpus* annotates compounds, genes and diseases •  à But not normalised to identifiers •  à Does not convert to compound structures •  à Identifies all entities, not relevant entities •  BioCreative competition corpus covers only abstracts (21K) •  Comparison with ‘gold standard’/’ground truth’ data sets •  SciFinder for compounds / Other sets for relevance •  GVK BIO and BindingDB for targets •  BUT not necessarily comprehensive, so can’t assess precision * Akhondi et al (2014) http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0107477
  • 25. Comparison of compound coverage •  Comparison of compounds with SciFinder •  Set of 47 patents from Akhondi* corpus •  SciFinder compounds for these extracted •  65% of SciFinder compounds were found by SureChEMBL •  ‘Missed’ compounds included some cases that were out of scope for SureChEMBL pipeline (e.g., Markush, tables) * Akhondi et al (2014) http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0107477
  • 26. Accessing SureChEMBL data Web Interface Data client feed Local compound-patent db schema Daily updates Compound-patent map file Quarterly updates UniChem Weekly updates Access to 135M structures http://chembl.blogspot.co.uk/2015/08/accessing-surechembl-data-in-bulk.html
  • 27. Identify key compounds in patent docs •  What are key compounds in med. chem. patents? •  ‘Best’, most promising/important compound •  Drug candidate •  Optimal property profile •  Most active •  Automated identification of key compounds in patents •  Using compound information only •  Main assumption: busy chemical space around key compounds •  Number of NNs, MCS extraction and R-group frequency Hattori, K., Wakabayashi, H., & Tamaki, K. (2008). JCIM, 48(1), 135–142. doi:10.1021/ci7002686 Tyrchan, C., Boström, J., Giordanetto, F., Winter, J., & Muresan, S. (2012). JCIM, 52(6), 1480–1489. doi:10.1021/ci3001293
  • 28. Workflow 1.  Read a file with chemistry extracted from the Levitra family of patents •  US65663601 2.  Filter by different text-mining and chemoinformatics properties to remove noise and enrich the genuinely novel structures 3.  Visualise the chemical space using MDS and dimensionality reduction •  Identify and fix outliers in the chemistry space. 4.  Find the Murcko and MCS scaffolds 5.  Compare the derived MCS core with the actual Markush structure 6.  Identify key compounds using structural information only •  Count number of NNs per compound •  R-Group decomposition and frequency of R-Groups 1Sayle, R. et al. (2012). JCIM, 52(1), 51–62. doi:10.1021/ci200463r
  • 29. Scaffold and chemical space analysis hp://nbviewer.ipython.org/github/rdkit/UGM_2014/blob/master/Notebooks/Vardenafil.ipynb
  • 31. SureChEMBL bio-annotation •  SciBite’s Termite text-mining engine run on 4M life-science patents from SureChEMBL corpus •  Genes (identified by HGNC symbols, e.g. FDFT1) and diseases (identified by MeSH IDs, e.g. D009765) annotated •  Section/frequency information annotated (e.g., in title, abstract, claims, total frequency) •  Relevance score (0-3) to flag important chemical and biological entities and remove noise
  • 32. Relevance scoring •  Patents designed to obfuscate – not all entities are equal Over 50 proteins mentioned! SRC proto-oncogene, non-receptor tyrosine kinase ; aurora kinase A ; insulin ; microtubule-associated protein tau ; catenin (cadherin- associated protein), beta 1, 88kDa ; interleukin 1, alpha ; c-src tyrosine kinase ; phosducin-like 2 ; colony stimulating factor 2 (granulocyte- macrophage) ; tumor necrosis factor ; glycogen synthase kinase 3 beta ; amyloid beta (A4) precursor protein ; asparaginase ; interferon, beta 1, fibroblast ; jun proto-oncogene ; coagulation factor II (thrombin) ; acetylcholinesterase ; FGR proto-oncogene, Src family tyrosine kinase ; BLK proto-oncogene, Src family tyrosine kinase ; angiotensin I converting enzyme ; albumin ; axin 1 ; heat shock transcription factor 1 ; v-myb avian myeloblastosis viral oncogene homolog ; NADH dehydrogenase, subunit 1 (complex I) ; LYN proto-oncogene, Src family tyrosine kinase ; LCK proto-oncogene, Src family tyrosine kinase ; CCAAT/enhancer binding protein (C/EBP), alpha ; HCK proto- oncogene, Src family tyrosine kinase ; ATP citrate lyase ; Glycogen synthase kinase 3 ; beta-catenin ; interferon ; tnf superfamily ; lactate dehydrogenase ; neurotrophic factor ; pyruvate kinase ; fibroblast growth factor ; glycogen synthase ; tumor necrosis factor alpha ; serine threonine kinase ; tau protein ; interleukin ; albumin ; thrombin ; eif2b ; ion channel ; cytokine ; heat shock factor ; axin ; cAMP Response element binding protein ; amyloid beta ; Microtubule Interacting Protein ; c-Jun N-terminal kinases ; src homology ; calcium channel ; atpase ; acetylcholinesterase Some more interesting than others… The method of claim 27, wherein the method comprises inhibiting Aurora-2, GSK-3, or Src activity Slide by Lee Harland
  • 33. Same for compounds! Real scope of patent
  • 34. Same for compounds! Tens of drugs trivially mentioned Real scope of patent
  • 35. Same for compounds! Substituents Tens of drugs trivially mentioned Real scope of patent
  • 36. Relevance scoring – genes/diseases •  Various features used: •  Term frequency •  Position (title, abstract, figure, caption, table) •  Frequency distribution •  Scores range from 0 – 3 •  3 – most important entities in the patent •  2 – important entities in the patent •  1 – mentioned entities in the patent •  0 – ambiguous entity/likely annotation error
  • 37. Relevance scoring - compounds •  Main assumptions for relevance: 1.  Very frequent compounds are usually irrelevant 2.  Compounds with busy chemical space around them are interesting •  Use distribution of close analogues (NNs) among compounds found in the same patent family – FCFP_4 similarity •  Scores range from 0 – 3 •  3 – highest number of NNs: most important entities in the patent •  2 – important entities in the patent •  1 – few NNs: mentioned entities in the patent •  0 – singletons or trivial entities, most likely errors or reagents, solvents, substituents Hatori, K., Wakabayashi, H., & Tamaki, K. (2008). JCIM, 48(1), 135–142. doi:10.1021/ci7002686 Tyrchan, C., Boström, J., Giordaneto, F., Winter, J., & Muresan, S. (2012). JCIM, 52(6), 1480–1489. doi:10.1021/ci3001293
  • 39. Open PHACTS Architecture Drug Discovery Today 2012, 17:21 (doi:10.1016/j.drudis.2012.05.016)
  • 40. Open PHACTS Patent API https://dev.openphacts.org/docs/2.1
  • 41. Open PHACTS Patent API Disease Compound Target Patent extracted links https://dev.openphacts.org/docs/2.1
  • 42. Open PHACTS Patent API Disease Compound Target Patent inferred links extracted links
  • 43. Use case #1: Patent to Entities 1.  From a patent get compounds, genes and diseases 2.  Filter to remove noise •  Frequency and relevance score 3.  Process and visualise Disease Compound Target Patent ? ? ?
  • 46. Does it make sense?
  • 47. Does it make sense? US-7718693-B2
  • 48. Does it make sense? •  Calculate MCS
  • 49. Does it make sense? •  Calculate MCS US-7718693-B2
  • 50. Use case #2: Drug targets & indications for compound 1.  Search patents for a compound (approved drug) 2.  Filter to remove noise •  Frequency, relevance score and classification code 3.  For remaining patents, get disease and target entities 4.  Filter to remove noise •  Frequency and relevance score 5.  Visualise results Disease Compound Target Patent
  • 52. 1) Get patents for Eluxadoline •  UniChem call à SCHEMBL12971682 •  Compound URI: •  http://rdf.ebi.ac.uk/resource/surechembl/molecule/SCHEMBL12971682 •  API call: •  Relevance score >=1 à 17 patents (patentome):
  • 53. 1) Get patents for Eluxadoline •  17 relevant patents (patentome):
  • 55. Does it make sense? http://www.accessdata.fda.gov/drugsatfda_docs/label/2015/206940s000lbl.pdf
  • 56. Does it make sense? http://www.accessdata.fda.gov/drugsatfda_docs/label/2015/206940s000lbl.pdf
  • 58. From target to scaffolds PARP1 Niraparib, Phase 3 Olaparib, FDA-approved Talazoparib, Phase 3 WO-2008020180-A2 US-20100035883-A1 US-20080167345-A1
  • 59. Take home message •  It is now possible to search and interlink the key structures, scaffolds, targets and diseases from med. chem. patent corpus automatically •  By high-throughput text-mining only •  Thanks to simple heuristics (relevance scores and frequency) •  Using the Open PHACTS API •  For the first time in a free resource in such scale Disease Compound Target Patent
  • 60. What next: Other ideas and use cases •  Target validation / Druggability •  For a target, get me all related and relevant diseases •  Compare with DisGeNET / Open Targets, etc. •  Any known patented scaffolds for my target? •  Start a pharmacophore hypothesis for patent busting •  Find a tool compound for a new assay •  Novelty checking / Due diligence •  What do we know about this scaffold / compound? •  Add ChEMBL pharmacology and pathway information
  • 61. The Data Are Out There
  • 63. BindingDB patent set •  BindingDB manually extracts binding activities from tables in granted US med. chem. patents •  Coverage 2013-2016, currently ~1,100 patents •  On-going effort •  Includes exact and binned bioactivities •  Ki, IC50, Kd, EC50 activity types •  Compounds mapped to example no in patent tables •  Detailed assay description •  http://www.bindingdb.org/bind/ByPatent.jsp
  • 64. BindingDB patent bioactivity integration •  1,017 US patents (mapped to SureChEMBL IDs) •  68.6K compounds •  ~70 unique cmpds per patent •  Only 5% exist in ChEMBL (exact match) à complementary •  450 single protein targets •  ~10% not in ChEMBL, e.g. GPR4 in US-8748435-B2 •  100,329 bioactivity values •  ~100 activities per patent •  Compounds and targets curated by ChEMBL •  Available in ChEMBL22 http://www.bindingdb.org/bind/ByPatent.jsp
  • 65. Example: BI patent https://www.surechembl.org/document/US-9120769-B2/ 174 activities in Table 2 IC50=94nM against HSD11B1 (P28845) Manual example number dereference
  • 66. Conclusion: current patent data landscape •  ChEMBL •  Literature SAR data, drug annotations, manually curated •  Coming soon: bioactivities from patents •  SureChEMBL •  Chemical annotations automatically mined from patents •  Open PHACTS •  The above plus biological annotations automatically mined from patents, with relevance scoring •  Public API
  • 67.
  • 68.
  • 69.
  • 70. Acknowledgements •  ChEMBL and SureChEMBL •  Anna Gaulton •  Mark Davies •  Nathan Dedman •  James Siddle •  Anne Hersey •  SciBite •  Lee Harland •  Open PHACTS consortium •  Nick Lynch •  Daniela Digles •  Antonis Loizou •  surechembl-help@ebi.ac.uk
  • 72. Mining Compounds, Targets and Indications from the Patent Corpus with SureChEMBL and Open PHACTS George Papadatos, PhD ChEMBL Group, EMBL-EBI georgep@ebi.ac.uk #ShefChem16