Integrating patent chemistry withpublic and private non-patentresearch resources	  Nicko Goncharoff           ACS Fall 201...
SureChem Data Collection!Database of automatically mined structure datafrom text and images!!• 20M annotated US, EP, WO fu...
ª  Free resource for researchers!         ª  Professional search needs!ª  Enables linking to public and          ª  Da...
Chemistry Mining Workflow!
Public Patent Chemistry Landscape!
Current Patent Sources In PubChem!                   4000000                                           3.7 M              ...
Patent & Literature Sources in                    PubChem !                                                      The	  Big...
SureChem to Deposit All Structures*      into PubChem - 2012!• 1976 to present• Deposition of structures only• View relate...
SureChem and IBM in PubChem 
             (2 Example Patents)!SureChem Total: 776! IBM Total : 527!                       ...
Identifying Relevant Chemistry - IC50!    US-20120035195-A1 BACE2, Hoffman LaRoche
Structures with IC50 Values!         US-20120035195-A1PDF       SureChemOpen       Excel
Search IC50 Structures in PubChem!              search
SureChem Unique Contribution!                SureChem                                               Pubchem               ...
Identifying Relevant Chemistry!                                 Patent 
                                 US-20120035195-A1...
SureChem Chemical Relevance Filtering!•  Frequency	  counts	  of	  chemicals	  within	  patents	  •  AddiHonal	  molecular...
Conclusion!SureChem deposition into PubChem will  –  Significantly expand public patent chemistry scope  –  Contribute uni...
SureChem - Integrating with public and proprietary data sources (ACS Fall 2012)
SureChem - Integrating with public and proprietary data sources (ACS Fall 2012)
Upcoming SlideShare
Loading in …5
×

SureChem - Integrating with public and proprietary data sources (ACS Fall 2012)

2,605 views

Published on

Presentation at Fall ACS meeting about upcoming deposition of all SureChem chemical structure data into PubChem

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,605
On SlideShare
0
From Embeds
0
Number of Embeds
110
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

SureChem - Integrating with public and proprietary data sources (ACS Fall 2012)

  1. Integrating patent chemistry withpublic and private non-patentresearch resources  Nicko Goncharoff ACS Fall 2012Andrew Hinton, PhD 19 AugustChristopher Southan, PhD
  2. SureChem Data Collection!Database of automatically mined structure datafrom text and images!!• 20M annotated US, EP, WO full text recordsand Japan patent abstracts! I!• 12M unique chemical structures!• MEDLINE – 19M abstracts (coming Q4)!
  3. ª  Free resource for researchers! ª  Professional search needs!ª  Enables linking to public and ª  Data export, alerts, patent family proprietary content search, chemical relevance filters…! ª  API or Data Feed access to chemistry & full text! ª  Integrate with internal databases & workflows
  4. Chemistry Mining Workflow!
  5. Public Patent Chemistry Landscape!
  6. Current Patent Sources In PubChem! 4000000 3.7 M 3500000 3000000Numbers of SIDs 2.3 M 2500000 2000000 1500000 1000000 500000 280 K 10 K 0 EPO(Sling) Chemicalize.org IBM Thomson Thompson Pharma
  7. Patent & Literature Sources in PubChem ! The  Big  Three   Thomson Pharma,! ChEMBL + !patents and literature ! PubMed + Journals! 3,756,283! 918,077! 41% lead-like! 45% lead-like! 3,291,940   281,920   515,745   52,975   129,448   67,437   2,113,169   IBM,    pre-­‐2000  patents      2,369,481        32%  lead-­‐like    
  8. SureChem to Deposit All Structures* into PubChem - 2012!• 1976 to present• Deposition of structures only• View related patents in SureChemOpen• *Some filtering of common chemistry likely
  9. SureChem and IBM in PubChem 
 (2 Example Patents)!SureChem Total: 776! IBM Total : 527! US583593, Inhibitors of squalene synthetase and protein farnesyltransferase. Abbott ! 478   298   229   SureChem Total: 832 ! IBM Total: 239! 686   146   93   WO-1994018188-A1 ! 4-hydroxy-benzopyran-2-ones and 4- hydroxy-cycloalkyl[b]pyran-2-ones HIV protease inhibitors, Upjohn!
  10. Identifying Relevant Chemistry - IC50! US-20120035195-A1 BACE2, Hoffman LaRoche
  11. Structures with IC50 Values! US-20120035195-A1PDF SureChemOpen Excel
  12. Search IC50 Structures in PubChem! search
  13. SureChem Unique Contribution! SureChem Pubchem 79 96 (ThomsonPharma , Chemicalize) Stage! No. of Structures! Available from SureChem (SC)! 1848! Pre-Exist in PubChem! 669! Pre-Exist – not from IC50 table! 573! Pre-Exist – from IC50 table! 96 (12 from TP + 84 via chemicalize.org)! Unique-SC with IC50! 79! Unique-SC – beyond IC50 table! 1100!
  14. Identifying Relevant Chemistry! Patent 
 US-20120035195-A1!http://opentox.informatik.uni- freiburg.de/ches-mapper/!
  15. SureChem Chemical Relevance Filtering!•  Frequency  counts  of  chemicals  within  patents  •  AddiHonal  molecular  property  filtering  i.e.  Lipinski  descriptors   !•  Natural  Language  Processing  –  based  indexing  of  Exemplified  Compounds   ! ! Automated indexing of Exemplified Compounds in text!
  16. Conclusion!SureChem deposition into PubChem will –  Significantly expand public patent chemistry scope –  Contribute unique and timely MedChem-relevant data –  Enable open drug discovery and chemical biology –  Advance progress toward a more open, federated chemical information network

×