Integrating patent chemistry withpublic and private non-patentresearch resourcesNicko Goncharoff           ACS Fall 2012An...
SureChem Data CollectionDatabase of automatically mined structure datafrom text and images•20M annotated US, EP, WO full t...
 Free resource for researchers          Professional search needs Enables linking to public and          Data export, ...
Chemistry Mining Workflow
Public Patent Chemistry       Landscape
Current Patent Sources In                                 PubChem                   4000000                               ...
Patent & Literature Sources in                 PubChem                                  The Big Three Thomson Pharma,     ...
SureChem to Deposit AllStructures* into PubChem - 2012•1976 to present•Deposition of structures only•View related patents ...
SureChem and IBM in PubChem       (2 Example Patents)SureChem Total: 776 IBM Total : 527                                  ...
Identifying Relevant Chemistry -               IC 50    US-20120035195-A1 BACE2, Hoffman LaRoche
Structures with IC 50 Values         US-20120035195-A1 PDF      SureChemOpen       Excel
Search IC 50 Structures in PubChem              search
SureChem Unique Contribution                SureChem                                               Pubchem                ...
Identifying Relevant Chemistry                               Patent                               US-20120035195-A1http://...
SureChem Chemical Relevance               Filtering• Frequency counts of chemicals within patents• Additional molecular pr...
ConclusionSureChem deposition into PubChem will  –   Significantly expand public patent chemistry scope  –   Contribute un...
Integrating Patents with Research Data
Integrating Patents with Research Data
Upcoming SlideShare
Loading in...5
×

Integrating Patents with Research Data

324

Published on

SureChem ACS 2012. Presented by Nico on behalf of all three authors. The data is searchable at https://open.surechem.com/login. Related information included recent posts at http://cdsouthan.blogspot.se/

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
324
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Integrating Patents with Research Data

  1. 1. Integrating patent chemistry withpublic and private non-patentresearch resourcesNicko Goncharoff ACS Fall 2012Andrew Hinton, PhD 19 AugustChristopher Southan, PhD
  2. 2. SureChem Data CollectionDatabase of automatically mined structure datafrom text and images•20M annotated US, EP, WO full text recordsand Japan patent abstracts•12M unique chemical structures I•MEDLINE – 19M abstracts (coming Q4)
  3. 3.  Free resource for researchers  Professional search needs Enables linking to public and  Data export, alerts, patent family proprietary content search, chemical relevance filters…  API or Data Feed access to chemistry & full text  Integrate with internal databases & workflows
  4. 4. Chemistry Mining Workflow
  5. 5. Public Patent Chemistry Landscape
  6. 6. Current Patent Sources In PubChem 4000000 3.7 M 3500000 3000000Numbers of SIDs 2.3 M 2500000 2000000 1500000 1000000 500000 280 K 10 K 0 EPO(Sling) Chemicalize.org IBM Thomson Thompson Pharma
  7. 7. Patent & Literature Sources in PubChem The Big Three Thomson Pharma, ChEMBL +patents and literature PubMed + Journals 3,756,283 918,077 41% lead-like 45% lead-like 3,291,940 281,920 515,745 52,975 129,448 67,437 2,113,169 IBM, pre-2000 patents 2,369,481 32% lead-like
  8. 8. SureChem to Deposit AllStructures* into PubChem - 2012•1976 to present•Deposition of structures only•View related patents in SureChemOpen•*Some filtering of common chemistry likely
  9. 9. SureChem and IBM in PubChem (2 Example Patents)SureChem Total: 776 IBM Total : 527 US583593, Inhibitors of squalene synthetase and protein farnesyltransferase. Abbott 478 298 229 SureChem Total: 832 IBM Total: 239 686 146 93 WO-1994018188-A1 4-hydroxy-benzopyran-2-ones and 4- hydroxy-cycloalkyl[b]pyran-2-ones HIV protease inhibitors, Upjohn
  10. 10. Identifying Relevant Chemistry - IC 50 US-20120035195-A1 BACE2, Hoffman LaRoche
  11. 11. Structures with IC 50 Values US-20120035195-A1 PDF SureChemOpen Excel
  12. 12. Search IC 50 Structures in PubChem search
  13. 13. SureChem Unique Contribution SureChem Pubchem 79 96 (ThomsonPharma , Chemicalize) Stage No. of Structures Available from SureChem (SC) 1848 Pre-Exist in PubChem 669 Pre-Exist – not from IC 50 table 573 Pre-Exist – from IC 50 table 96 (12 from TP + 84 via chemicalize.org) Unique-SC with IC 50 79 Unique-SC – beyond IC 50 table 1100
  14. 14. Identifying Relevant Chemistry Patent US-20120035195-A1http://opentox.informatik.un i-freiburg.de/ches- mapper/
  15. 15. SureChem Chemical Relevance Filtering• Frequency counts of chemicals within patents• Additional molecular property filtering i.e. Lipinski descriptors• Natural Language Processing – based indexing of Exemplified Compounds Automated indexing of Exemplified Compounds in text
  16. 16. ConclusionSureChem deposition into PubChem will – Significantly expand public patent chemistry scope – Contribute unique and timely MedChem-relevant data – Enable open drug discovery and chemical biology – Advance progress toward a more open, federated chemical information network
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×