SlideShare a Scribd company logo
1 of 1
From Famine To Feast: The Patent Chemistry “Big Bang” In Pubchem

 Christopher Southan, TW2Informatics, Göteborg, Sweden,

 Andrew Hinton and Nicko Goncharoff, SureChem, Digital-Science, London, UK



Within only a couple of years the number of patent-extracted structures in PubChem has gone from 2 up to 14.5 million. The major sources, in order of addition, are Thomson Pharma, IBM, SCRIPDB
and SureChemOpen. The latter deposited 8.3 million in December 2012, of which 4.6 million had unique compound identifiers (CIDs). This new public patent chemistry “feast” has a range of
implications and utilities, some of which will be touched upon in this poster. Estimating a potential total for useful structures extractable from the entire patent corpus is a subject of conjecture.
However, those related to medicinal chemistry (i.e. International Patent Classification C07D) are only in the order of 200K patent document families. Thus, at least the major proportion of example
structures specified in drug discovery patents, including those with activity results, can now be found in PubChem. While statistical updates will be presented, currently ~30% of the 47 million CIDs
now have at least one patent extraction (although some of these will be common chemistry) and, significantly, ~15% of these are from patent-only sources. A valuable consequence is that compounds
with structure-activity (SAR) data extracted from medicinal chemistry papers in ChEMBL now have a high probability of either identity or similarity intersects with structures from patents. Two sources,
IBM and SCRIPDB, have patent numbers indexed in the PubChem CID records but for SureChem the substance entries link out to the free SureChemOpen application. This provides not only the
location of each structure, typically as an IUPAC name, in the full-text and/or images but also a view of the entire chemistry extracted from that document. Examples will be shown where SAR tables in
patents are an order of magnitude larger than those in the eventual paper. This information expansion that patents bring to PubChem (and via the use of SurChemOpen extrinsically) is transformative
considering not only that ~70% more data is thereby “unlocked” but also the increased connectivity between papers, abstracts, patents and protein targets (e.g. from PubChem BioAssay) via their
chemical content becomes easier to navigate and exploit. A comparative analysis of Mw distributions between the major patent sources and ChEMBL as the standard for bioactive content will also be
presented. The differences give an insight into complementary for PubChem content between manual expert extraction and the automated name-to-structure conversion pipelines.




1                                                                                                       2
     Despite estimates that ~ 70 of all published medicinal chemistry data are patent-only
     these have hitherto been a “Cinderella “ data source for bioactive chemical structures.
     However, three recent trends have revolutionised public access, 1) availability of full-text
     via the major patent authorities, 2) automated Chemical Named Entity Recognition, or
     CNER technology (used here to also encompass image conversions and Complex Work
     Unit processing) and 3) the submission of structures to PubChem by major patent-
     extraction sources. An example is SureChem, owned by Digital-Science. This uses a
     proprietary pipeline that incorporates four commercial 3rd party CNER tools to
     automatically recognise chemistry. Currently, 14 million unique structures have been
     extracted from 22 million full text-patents, made publically accessible and searchable in
     SureChemOpen.



                                                                                                             In December of 2012 SureChem deposited a filtered set of 8.3 million structures to
                                                                                                             become the 4th major and largest patent chemistry source in PubChem, joining Thomson
                                                                                                             Pharma (2007) IBM (2011) and SCRIPDB (2012). A breakdown of these source counts is
                                                                                                             shown. The figures indicate complementarity where each source contributes some unique
                                                                                                             content to the 14.5 million. Consequently, 30% of all CIDs have at least one patent
                                                                                                             source and 16% are patent-only.




 3                                                                                                       4
     The coverage (by these 14.5 million) of pharmacologically significant sources and
     property cuts is shown below The 7.7 million patent structures passing the Lipinski rule-
     of-five plus an expanded molecular weight (Mw) filtered encompass lead-like examples,
     most of which are associated with bioactivity data in filings. The similarity coverage of
     vendor and bioactivity space of ChEMBL and BioAssay will potentially extend well
     beyond the exact matches intersects. Note also that coverage of drug-like space, in the
     last three rows, is between 65% and 95%.




                                                                                                             Molecular weight is an informative property distribution for comparing compound
                                                                                                             collections and database subsets. The four patent sources, along with total PubChem
                                                                                                             and ChEMBL for comparison, are plotted above. The bi-modal blip in PubChem is
                                                                                                             related to chemical vendors. SureChem (in red) is most similar to the manually curated
                                                                                                             Thomson Pharma. Compounds with SAR manually extracted from papers by ChEMBL
                                                                                                             have a narrower spread because CNER pipelines capture more reagents and
                                                                                                             intermediates. This is evidenced by the “shoulder” around 250 da in both SureChem and
                                                                                                             SCRIPDB, as well as the lower median Mw for IBM.




 5
     As an example of “opening up” access to patent chemistry we can take the new antimalarial candidate MMV390048. Via the image in the press release, the lead structure can be mapped to CID
     53311393. PubChem links this to WO2011086531 and we can find example structures, in the cases below converted from images of compounds 10 and 11. The SureChem conversions can be
     used to populate a table of structures aligned to the SAR data.

                                                                                                                                                                      Conclusions
                                                                                                                                           1.   There has been a “big bang” in our ability to interrogate
                                                                                                                                                and exploit the patent corpus via open sources
                                                                                                                                           2.   Positively impact drug discovery, pharmacology and
                                                                                                                                                chemical biology
                                                                                                                                           3.   The similarity envelope for the 14.5 million patent CIDs
                                                                                                                                                is pre-computed with 6.9 million of these also connect
                                                                                                                                                directly to non-patent sources
                                                                                                                                           4.   PubChem continues to enhance linkages to patents in
                                                                                                                                                parallel with those to targets and publications




 References and Acknowledgments

 •Suriyawongkul I, Southan C, & Muresan S. The Cinderella of Biological Data Integration: Addressing the Challenges of Entity and Relationship Mining from Patent Sources. In) Data
 Integration in the Life Sciences (2010), doi:10.1007/978-3-642-15120-0_9
 •WO2011086531, “New Antimarial Agents” Medicines for Malaria Ventures
 •http://pubchem.ncbi.nlm.nih.gov/
 •http://surechem.com/
 •http://blog.surechem.com/2012/12/surechem-data-goes-live-in-pubchem.html
 •http://cdsouthan.blogspot.se/2012/12/surechem-patent-chemistry-pushes.html

More Related Content

Viewers also liked

Will the real drug targets please stand up ?
Will the real drug targets please stand up ? Will the real drug targets please stand up ?
Will the real drug targets please stand up ? Chris Southan
 
Chemicalize.org: User-Selected PubChem Source of Structures from Text
Chemicalize.org: User-Selected PubChem Source of Structures from TextChemicalize.org: User-Selected PubChem Source of Structures from Text
Chemicalize.org: User-Selected PubChem Source of Structures from TextChris Southan
 
Evolving consensus-based curatorial strategies
Evolving consensus-based curatorial strategiesEvolving consensus-based curatorial strategies
Evolving consensus-based curatorial strategiesChris Southan
 
Curatorial data wrangling for the Guide to PHARMACOLGY
Curatorial data wrangling for the Guide to PHARMACOLGY Curatorial data wrangling for the Guide to PHARMACOLGY
Curatorial data wrangling for the Guide to PHARMACOLGY Chris Southan
 
ELIXIR Database Provider Survey
ELIXIR Database Provider SurveyELIXIR Database Provider Survey
ELIXIR Database Provider SurveyChris Southan
 
Patent chemisty big bang: utilities for SMEs
Patent chemisty big bang: utilities for SMEsPatent chemisty big bang: utilities for SMEs
Patent chemisty big bang: utilities for SMEsChris Southan
 
NIH Rare Disease Day Poster
NIH Rare Disease Day PosterNIH Rare Disease Day Poster
NIH Rare Disease Day PosterChris Southan
 
Guide to PHARMACOLOGY: a web-Based Compendium for Research and Education
Guide to PHARMACOLOGY: a web-Based Compendium for Research and EducationGuide to PHARMACOLOGY: a web-Based Compendium for Research and Education
Guide to PHARMACOLOGY: a web-Based Compendium for Research and EducationChris Southan
 
BIA 10-2474 in GtoPdb
BIA 10-2474 in GtoPdbBIA 10-2474 in GtoPdb
BIA 10-2474 in GtoPdbChris Southan
 
Capturing BIA-10-2474 and related FAAH inhibitor data
Capturing BIA-10-2474 and related FAAH inhibitor dataCapturing BIA-10-2474 and related FAAH inhibitor data
Capturing BIA-10-2474 and related FAAH inhibitor dataChris Southan
 
GtoPdb and GtoImmuPdb in context
GtoPdb and GtoImmuPdb in contextGtoPdb and GtoImmuPdb in context
GtoPdb and GtoImmuPdb in contextChris Southan
 
Antimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosureAntimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosureChris Southan
 

Viewers also liked (12)

Will the real drug targets please stand up ?
Will the real drug targets please stand up ? Will the real drug targets please stand up ?
Will the real drug targets please stand up ?
 
Chemicalize.org: User-Selected PubChem Source of Structures from Text
Chemicalize.org: User-Selected PubChem Source of Structures from TextChemicalize.org: User-Selected PubChem Source of Structures from Text
Chemicalize.org: User-Selected PubChem Source of Structures from Text
 
Evolving consensus-based curatorial strategies
Evolving consensus-based curatorial strategiesEvolving consensus-based curatorial strategies
Evolving consensus-based curatorial strategies
 
Curatorial data wrangling for the Guide to PHARMACOLGY
Curatorial data wrangling for the Guide to PHARMACOLGY Curatorial data wrangling for the Guide to PHARMACOLGY
Curatorial data wrangling for the Guide to PHARMACOLGY
 
ELIXIR Database Provider Survey
ELIXIR Database Provider SurveyELIXIR Database Provider Survey
ELIXIR Database Provider Survey
 
Patent chemisty big bang: utilities for SMEs
Patent chemisty big bang: utilities for SMEsPatent chemisty big bang: utilities for SMEs
Patent chemisty big bang: utilities for SMEs
 
NIH Rare Disease Day Poster
NIH Rare Disease Day PosterNIH Rare Disease Day Poster
NIH Rare Disease Day Poster
 
Guide to PHARMACOLOGY: a web-Based Compendium for Research and Education
Guide to PHARMACOLOGY: a web-Based Compendium for Research and EducationGuide to PHARMACOLOGY: a web-Based Compendium for Research and Education
Guide to PHARMACOLOGY: a web-Based Compendium for Research and Education
 
BIA 10-2474 in GtoPdb
BIA 10-2474 in GtoPdbBIA 10-2474 in GtoPdb
BIA 10-2474 in GtoPdb
 
Capturing BIA-10-2474 and related FAAH inhibitor data
Capturing BIA-10-2474 and related FAAH inhibitor dataCapturing BIA-10-2474 and related FAAH inhibitor data
Capturing BIA-10-2474 and related FAAH inhibitor data
 
GtoPdb and GtoImmuPdb in context
GtoPdb and GtoImmuPdb in contextGtoPdb and GtoImmuPdb in context
GtoPdb and GtoImmuPdb in context
 
Antimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosureAntimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosure
 

Similar to The Patent Chemistry “Big Bang” In Pubchem

The Open Patent Chemistry “Big Bang”: Implications, Opportunities and Caveats
The Open Patent Chemistry “Big Bang”: Implications, Opportunities and CaveatsThe Open Patent Chemistry “Big Bang”: Implications, Opportunities and Caveats
The Open Patent Chemistry “Big Bang”: Implications, Opportunities and CaveatsChris Southan
 
Integrating Patents with Research Data
Integrating Patents with Research DataIntegrating Patents with Research Data
Integrating Patents with Research DataChris Southan
 
Lecture 12 – chemoinformatic
Lecture 12 – chemoinformatic Lecture 12 – chemoinformatic
Lecture 12 – chemoinformatic RAJAN ROLTA
 
ICIC 2017: Looking at the gift horse: pros and cons of over 20 million patent...
ICIC 2017: Looking at the gift horse: pros and cons of over 20 million patent...ICIC 2017: Looking at the gift horse: pros and cons of over 20 million patent...
ICIC 2017: Looking at the gift horse: pros and cons of over 20 million patent...Dr. Haxel Consult
 
Desperately seeking DARCP
Desperately seeking DARCPDesperately seeking DARCP
Desperately seeking DARCPChris Southan
 
Multiplexing analysis of 1000 approved drugs in PubChem
Multiplexing analysis of 1000 approved drugs in PubChemMultiplexing analysis of 1000 approved drugs in PubChem
Multiplexing analysis of 1000 approved drugs in PubChemChris Southan
 
Cheminformatics, concept by kk sahu sir
Cheminformatics, concept by kk sahu sirCheminformatics, concept by kk sahu sir
Cheminformatics, concept by kk sahu sirKAUSHAL SAHU
 
PubChem as a resource for chemical information training
PubChem as a resource for chemical information trainingPubChem as a resource for chemical information training
PubChem as a resource for chemical information trainingSunghwan Kim
 
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...Dr. Haxel Consult
 
Causes and consequences of automated extraction of patent-specified virtual d...
Causes and consequences of automated extraction of patent-specified virtual d...Causes and consequences of automated extraction of patent-specified virtual d...
Causes and consequences of automated extraction of patent-specified virtual d...Chris Southan
 
History of computers in pharmaceutical research & development
History of computers in pharmaceutical research & developmentHistory of computers in pharmaceutical research & development
History of computers in pharmaceutical research & developmentAnkitKumar2596
 
Pros and cons of patent-extracted structures in PubChem
Pros and cons of patent-extracted structures in PubChemPros and cons of patent-extracted structures in PubChem
Pros and cons of patent-extracted structures in PubChemChris Southan
 
Improved Chemical Text mining of Patents using automatic spelling correction ...
Improved Chemical Text mining of Patents using automatic spelling correction ...Improved Chemical Text mining of Patents using automatic spelling correction ...
Improved Chemical Text mining of Patents using automatic spelling correction ...NextMove Software
 
FAIR connectivity for DARCP
FAIR  connectivity for DARCPFAIR  connectivity for DARCP
FAIR connectivity for DARCPChris Southan
 
La chemoinformatica: uno strumento computazionale per la chimica farmaceutica
La chemoinformatica: uno strumento computazionale per la chimica farmaceuticaLa chemoinformatica: uno strumento computazionale per la chimica farmaceutica
La chemoinformatica: uno strumento computazionale per la chimica farmaceuticaCRS4 Research Center in Sardinia
 
PubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biologyPubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biologyChris Southan
 
Exploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural productsExploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural productsSunghwan Kim
 
Chemistry Reserach as a Social Machine
 Chemistry Reserach as a Social Machine Chemistry Reserach as a Social Machine
Chemistry Reserach as a Social MachineJeremy Frey
 

Similar to The Patent Chemistry “Big Bang” In Pubchem (20)

The Open Patent Chemistry “Big Bang”: Implications, Opportunities and Caveats
The Open Patent Chemistry “Big Bang”: Implications, Opportunities and CaveatsThe Open Patent Chemistry “Big Bang”: Implications, Opportunities and Caveats
The Open Patent Chemistry “Big Bang”: Implications, Opportunities and Caveats
 
Integrating Patents with Research Data
Integrating Patents with Research DataIntegrating Patents with Research Data
Integrating Patents with Research Data
 
Lecture 12 – chemoinformatic
Lecture 12 – chemoinformatic Lecture 12 – chemoinformatic
Lecture 12 – chemoinformatic
 
ICIC 2017: Looking at the gift horse: pros and cons of over 20 million patent...
ICIC 2017: Looking at the gift horse: pros and cons of over 20 million patent...ICIC 2017: Looking at the gift horse: pros and cons of over 20 million patent...
ICIC 2017: Looking at the gift horse: pros and cons of over 20 million patent...
 
Desperately seeking DARCP
Desperately seeking DARCPDesperately seeking DARCP
Desperately seeking DARCP
 
Multiplexing analysis of 1000 approved drugs in PubChem
Multiplexing analysis of 1000 approved drugs in PubChemMultiplexing analysis of 1000 approved drugs in PubChem
Multiplexing analysis of 1000 approved drugs in PubChem
 
Cheminformatics, concept by kk sahu sir
Cheminformatics, concept by kk sahu sirCheminformatics, concept by kk sahu sir
Cheminformatics, concept by kk sahu sir
 
PubChem as a resource for chemical information training
PubChem as a resource for chemical information trainingPubChem as a resource for chemical information training
PubChem as a resource for chemical information training
 
Precompetitive preclinical ADME/tox data and set it free on the web to facili...
Precompetitive preclinical ADME/tox data and set it free on the web to facili...Precompetitive preclinical ADME/tox data and set it free on the web to facili...
Precompetitive preclinical ADME/tox data and set it free on the web to facili...
 
Pallavi gupta
Pallavi guptaPallavi gupta
Pallavi gupta
 
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
 
Causes and consequences of automated extraction of patent-specified virtual d...
Causes and consequences of automated extraction of patent-specified virtual d...Causes and consequences of automated extraction of patent-specified virtual d...
Causes and consequences of automated extraction of patent-specified virtual d...
 
History of computers in pharmaceutical research & development
History of computers in pharmaceutical research & developmentHistory of computers in pharmaceutical research & development
History of computers in pharmaceutical research & development
 
Pros and cons of patent-extracted structures in PubChem
Pros and cons of patent-extracted structures in PubChemPros and cons of patent-extracted structures in PubChem
Pros and cons of patent-extracted structures in PubChem
 
Improved Chemical Text mining of Patents using automatic spelling correction ...
Improved Chemical Text mining of Patents using automatic spelling correction ...Improved Chemical Text mining of Patents using automatic spelling correction ...
Improved Chemical Text mining of Patents using automatic spelling correction ...
 
FAIR connectivity for DARCP
FAIR  connectivity for DARCPFAIR  connectivity for DARCP
FAIR connectivity for DARCP
 
La chemoinformatica: uno strumento computazionale per la chimica farmaceutica
La chemoinformatica: uno strumento computazionale per la chimica farmaceuticaLa chemoinformatica: uno strumento computazionale per la chimica farmaceutica
La chemoinformatica: uno strumento computazionale per la chimica farmaceutica
 
PubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biologyPubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biology
 
Exploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural productsExploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural products
 
Chemistry Reserach as a Social Machine
 Chemistry Reserach as a Social Machine Chemistry Reserach as a Social Machine
Chemistry Reserach as a Social Machine
 

More from Chris Southan

Connectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivityConnectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivityChris Southan
 
Peptide tribulations
Peptide tribulationsPeptide tribulations
Peptide tribulationsChris Southan
 
Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2 Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2 Chris Southan
 
Guide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updaeGuide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updaeChris Southan
 
In silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentIn silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentChris Southan
 
Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?Chris Southan
 
Seeking glimmers of light in Pharos “Tdark” proteins
Seeking glimmers of light in  Pharos “Tdark” proteinsSeeking glimmers of light in  Pharos “Tdark” proteins
Seeking glimmers of light in Pharos “Tdark” proteinsChris Southan
 
5HT2A modulators update for SAFER
5HT2A modulators update for SAFER5HT2A modulators update for SAFER
5HT2A modulators update for SAFERChris Southan
 
Quality and noise in big chemistry databases
Quality and noise in big chemistry databasesQuality and noise in big chemistry databases
Quality and noise in big chemistry databasesChris Southan
 
Connecting chemistry-to-biology
Connecting chemistry-to-biology Connecting chemistry-to-biology
Connecting chemistry-to-biology Chris Southan
 
GtoPdb June 2019 poster
GtoPdb June 2019 posterGtoPdb June 2019 poster
GtoPdb June 2019 posterChris Southan
 
PubChem as a source of systems biology perturbagens
PubChem as a source of  systems biology perturbagensPubChem as a source of  systems biology perturbagens
PubChem as a source of systems biology perturbagensChris Southan
 
Will the real proteins please stand up
Will the real proteins please stand upWill the real proteins please stand up
Will the real proteins please stand upChris Southan
 
Peptide Tribulations
Peptide TribulationsPeptide Tribulations
Peptide TribulationsChris Southan
 
Looking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIRLooking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIRChris Southan
 
Guide to Immunopharmacology update
Guide to Immunopharmacology updateGuide to Immunopharmacology update
Guide to Immunopharmacology updateChris Southan
 
Druggable Proteome sources in UniProt
Druggable Proteome sources in UniProtDruggable Proteome sources in UniProt
Druggable Proteome sources in UniProtChris Southan
 
Peptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdbPeptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdbChris Southan
 
Pub Med to PubChem Connectivity
Pub Med to PubChem ConnectivityPub Med to PubChem Connectivity
Pub Med to PubChem ConnectivityChris Southan
 
The IUPHAR/MMV Guide to Malaria Pharmacology
The  IUPHAR/MMV Guide to Malaria Pharmacology  The  IUPHAR/MMV Guide to Malaria Pharmacology
The IUPHAR/MMV Guide to Malaria Pharmacology Chris Southan
 

More from Chris Southan (20)

Connectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivityConnectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivity
 
Peptide tribulations
Peptide tribulationsPeptide tribulations
Peptide tribulations
 
Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2 Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2
 
Guide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updaeGuide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updae
 
In silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentIn silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug Development
 
Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?
 
Seeking glimmers of light in Pharos “Tdark” proteins
Seeking glimmers of light in  Pharos “Tdark” proteinsSeeking glimmers of light in  Pharos “Tdark” proteins
Seeking glimmers of light in Pharos “Tdark” proteins
 
5HT2A modulators update for SAFER
5HT2A modulators update for SAFER5HT2A modulators update for SAFER
5HT2A modulators update for SAFER
 
Quality and noise in big chemistry databases
Quality and noise in big chemistry databasesQuality and noise in big chemistry databases
Quality and noise in big chemistry databases
 
Connecting chemistry-to-biology
Connecting chemistry-to-biology Connecting chemistry-to-biology
Connecting chemistry-to-biology
 
GtoPdb June 2019 poster
GtoPdb June 2019 posterGtoPdb June 2019 poster
GtoPdb June 2019 poster
 
PubChem as a source of systems biology perturbagens
PubChem as a source of  systems biology perturbagensPubChem as a source of  systems biology perturbagens
PubChem as a source of systems biology perturbagens
 
Will the real proteins please stand up
Will the real proteins please stand upWill the real proteins please stand up
Will the real proteins please stand up
 
Peptide Tribulations
Peptide TribulationsPeptide Tribulations
Peptide Tribulations
 
Looking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIRLooking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIR
 
Guide to Immunopharmacology update
Guide to Immunopharmacology updateGuide to Immunopharmacology update
Guide to Immunopharmacology update
 
Druggable Proteome sources in UniProt
Druggable Proteome sources in UniProtDruggable Proteome sources in UniProt
Druggable Proteome sources in UniProt
 
Peptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdbPeptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdb
 
Pub Med to PubChem Connectivity
Pub Med to PubChem ConnectivityPub Med to PubChem Connectivity
Pub Med to PubChem Connectivity
 
The IUPHAR/MMV Guide to Malaria Pharmacology
The  IUPHAR/MMV Guide to Malaria Pharmacology  The  IUPHAR/MMV Guide to Malaria Pharmacology
The IUPHAR/MMV Guide to Malaria Pharmacology
 

Recently uploaded

Call Girls Kochi Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kochi Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Kochi Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kochi Just Call 8250077686 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...Dipal Arora
 
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...chandars293
 
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service AvailableDipal Arora
 
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...astropune
 
Bangalore Call Girls Nelamangala Number 9332606886 Meetin With Bangalore Esc...
Bangalore Call Girls Nelamangala Number 9332606886  Meetin With Bangalore Esc...Bangalore Call Girls Nelamangala Number 9332606886  Meetin With Bangalore Esc...
Bangalore Call Girls Nelamangala Number 9332606886 Meetin With Bangalore Esc...narwatsonia7
 
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...tanya dube
 
Top Rated Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
Top Rated  Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...Top Rated  Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
Top Rated Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...chandars293
 
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...Genuine Call Girls
 
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...astropune
 
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...Dipal Arora
 
Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...Dipal Arora
 
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...Call Girls in Nagpur High Profile
 
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Top Rated Bangalore Call Girls Richmond Circle ⟟ 9332606886 ⟟ Call Me For Ge...
Top Rated Bangalore Call Girls Richmond Circle ⟟  9332606886 ⟟ Call Me For Ge...Top Rated Bangalore Call Girls Richmond Circle ⟟  9332606886 ⟟ Call Me For Ge...
Top Rated Bangalore Call Girls Richmond Circle ⟟ 9332606886 ⟟ Call Me For Ge...narwatsonia7
 
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Top Rated Bangalore Call Girls Mg Road ⟟ 9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Mg Road ⟟   9332606886 ⟟ Call Me For Genuine S...Top Rated Bangalore Call Girls Mg Road ⟟   9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Mg Road ⟟ 9332606886 ⟟ Call Me For Genuine S...narwatsonia7
 
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 

Recently uploaded (20)

Call Girls Kochi Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kochi Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Kochi Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kochi Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...
 
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
 
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service Available
 
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
 
Bangalore Call Girls Nelamangala Number 9332606886 Meetin With Bangalore Esc...
Bangalore Call Girls Nelamangala Number 9332606886  Meetin With Bangalore Esc...Bangalore Call Girls Nelamangala Number 9332606886  Meetin With Bangalore Esc...
Bangalore Call Girls Nelamangala Number 9332606886 Meetin With Bangalore Esc...
 
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
 
Top Rated Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
Top Rated  Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...Top Rated  Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
Top Rated Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
 
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
 
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
 
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
 
Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...
 
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
 
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
 
Top Rated Bangalore Call Girls Richmond Circle ⟟ 9332606886 ⟟ Call Me For Ge...
Top Rated Bangalore Call Girls Richmond Circle ⟟  9332606886 ⟟ Call Me For Ge...Top Rated Bangalore Call Girls Richmond Circle ⟟  9332606886 ⟟ Call Me For Ge...
Top Rated Bangalore Call Girls Richmond Circle ⟟ 9332606886 ⟟ Call Me For Ge...
 
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
 
Top Rated Bangalore Call Girls Mg Road ⟟ 9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Mg Road ⟟   9332606886 ⟟ Call Me For Genuine S...Top Rated Bangalore Call Girls Mg Road ⟟   9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Mg Road ⟟ 9332606886 ⟟ Call Me For Genuine S...
 
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
 

The Patent Chemistry “Big Bang” In Pubchem

  • 1. From Famine To Feast: The Patent Chemistry “Big Bang” In Pubchem Christopher Southan, TW2Informatics, Göteborg, Sweden, Andrew Hinton and Nicko Goncharoff, SureChem, Digital-Science, London, UK Within only a couple of years the number of patent-extracted structures in PubChem has gone from 2 up to 14.5 million. The major sources, in order of addition, are Thomson Pharma, IBM, SCRIPDB and SureChemOpen. The latter deposited 8.3 million in December 2012, of which 4.6 million had unique compound identifiers (CIDs). This new public patent chemistry “feast” has a range of implications and utilities, some of which will be touched upon in this poster. Estimating a potential total for useful structures extractable from the entire patent corpus is a subject of conjecture. However, those related to medicinal chemistry (i.e. International Patent Classification C07D) are only in the order of 200K patent document families. Thus, at least the major proportion of example structures specified in drug discovery patents, including those with activity results, can now be found in PubChem. While statistical updates will be presented, currently ~30% of the 47 million CIDs now have at least one patent extraction (although some of these will be common chemistry) and, significantly, ~15% of these are from patent-only sources. A valuable consequence is that compounds with structure-activity (SAR) data extracted from medicinal chemistry papers in ChEMBL now have a high probability of either identity or similarity intersects with structures from patents. Two sources, IBM and SCRIPDB, have patent numbers indexed in the PubChem CID records but for SureChem the substance entries link out to the free SureChemOpen application. This provides not only the location of each structure, typically as an IUPAC name, in the full-text and/or images but also a view of the entire chemistry extracted from that document. Examples will be shown where SAR tables in patents are an order of magnitude larger than those in the eventual paper. This information expansion that patents bring to PubChem (and via the use of SurChemOpen extrinsically) is transformative considering not only that ~70% more data is thereby “unlocked” but also the increased connectivity between papers, abstracts, patents and protein targets (e.g. from PubChem BioAssay) via their chemical content becomes easier to navigate and exploit. A comparative analysis of Mw distributions between the major patent sources and ChEMBL as the standard for bioactive content will also be presented. The differences give an insight into complementary for PubChem content between manual expert extraction and the automated name-to-structure conversion pipelines. 1 2 Despite estimates that ~ 70 of all published medicinal chemistry data are patent-only these have hitherto been a “Cinderella “ data source for bioactive chemical structures. However, three recent trends have revolutionised public access, 1) availability of full-text via the major patent authorities, 2) automated Chemical Named Entity Recognition, or CNER technology (used here to also encompass image conversions and Complex Work Unit processing) and 3) the submission of structures to PubChem by major patent- extraction sources. An example is SureChem, owned by Digital-Science. This uses a proprietary pipeline that incorporates four commercial 3rd party CNER tools to automatically recognise chemistry. Currently, 14 million unique structures have been extracted from 22 million full text-patents, made publically accessible and searchable in SureChemOpen. In December of 2012 SureChem deposited a filtered set of 8.3 million structures to become the 4th major and largest patent chemistry source in PubChem, joining Thomson Pharma (2007) IBM (2011) and SCRIPDB (2012). A breakdown of these source counts is shown. The figures indicate complementarity where each source contributes some unique content to the 14.5 million. Consequently, 30% of all CIDs have at least one patent source and 16% are patent-only. 3 4 The coverage (by these 14.5 million) of pharmacologically significant sources and property cuts is shown below The 7.7 million patent structures passing the Lipinski rule- of-five plus an expanded molecular weight (Mw) filtered encompass lead-like examples, most of which are associated with bioactivity data in filings. The similarity coverage of vendor and bioactivity space of ChEMBL and BioAssay will potentially extend well beyond the exact matches intersects. Note also that coverage of drug-like space, in the last three rows, is between 65% and 95%. Molecular weight is an informative property distribution for comparing compound collections and database subsets. The four patent sources, along with total PubChem and ChEMBL for comparison, are plotted above. The bi-modal blip in PubChem is related to chemical vendors. SureChem (in red) is most similar to the manually curated Thomson Pharma. Compounds with SAR manually extracted from papers by ChEMBL have a narrower spread because CNER pipelines capture more reagents and intermediates. This is evidenced by the “shoulder” around 250 da in both SureChem and SCRIPDB, as well as the lower median Mw for IBM. 5 As an example of “opening up” access to patent chemistry we can take the new antimalarial candidate MMV390048. Via the image in the press release, the lead structure can be mapped to CID 53311393. PubChem links this to WO2011086531 and we can find example structures, in the cases below converted from images of compounds 10 and 11. The SureChem conversions can be used to populate a table of structures aligned to the SAR data. Conclusions 1. There has been a “big bang” in our ability to interrogate and exploit the patent corpus via open sources 2. Positively impact drug discovery, pharmacology and chemical biology 3. The similarity envelope for the 14.5 million patent CIDs is pre-computed with 6.9 million of these also connect directly to non-patent sources 4. PubChem continues to enhance linkages to patents in parallel with those to targets and publications References and Acknowledgments •Suriyawongkul I, Southan C, & Muresan S. The Cinderella of Biological Data Integration: Addressing the Challenges of Entity and Relationship Mining from Patent Sources. In) Data Integration in the Life Sciences (2010), doi:10.1007/978-3-642-15120-0_9 •WO2011086531, “New Antimarial Agents” Medicines for Malaria Ventures •http://pubchem.ncbi.nlm.nih.gov/ •http://surechem.com/ •http://blog.surechem.com/2012/12/surechem-data-goes-live-in-pubchem.html •http://cdsouthan.blogspot.se/2012/12/surechem-patent-chemistry-pushes.html