SureChem - Integrating with public and proprietary data sources (ACS Fall 2012)

  • 2,145 views
Uploaded on

Presentation at Fall ACS meeting about upcoming deposition of all SureChem chemical structure data into PubChem

Presentation at Fall ACS meeting about upcoming deposition of all SureChem chemical structure data into PubChem

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,145
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
0
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Integrating patent chemistry withpublic and private non-patentresearch resources  Nicko Goncharoff ACS Fall 2012Andrew Hinton, PhD 19 AugustChristopher Southan, PhD
  • 2. SureChem Data Collection!Database of automatically mined structure datafrom text and images!!• 20M annotated US, EP, WO full text recordsand Japan patent abstracts! I!• 12M unique chemical structures!• MEDLINE – 19M abstracts (coming Q4)!
  • 3. ª  Free resource for researchers! ª  Professional search needs!ª  Enables linking to public and ª  Data export, alerts, patent family proprietary content search, chemical relevance filters…! ª  API or Data Feed access to chemistry & full text! ª  Integrate with internal databases & workflows
  • 4. Chemistry Mining Workflow!
  • 5. Public Patent Chemistry Landscape!
  • 6. Current Patent Sources In PubChem! 4000000 3.7 M 3500000 3000000Numbers of SIDs 2.3 M 2500000 2000000 1500000 1000000 500000 280 K 10 K 0 EPO(Sling) Chemicalize.org IBM Thomson Thompson Pharma
  • 7. Patent & Literature Sources in PubChem ! The  Big  Three   Thomson Pharma,! ChEMBL + !patents and literature ! PubMed + Journals! 3,756,283! 918,077! 41% lead-like! 45% lead-like! 3,291,940   281,920   515,745   52,975   129,448   67,437   2,113,169   IBM,    pre-­‐2000  patents      2,369,481        32%  lead-­‐like    
  • 8. SureChem to Deposit All Structures* into PubChem - 2012!• 1976 to present• Deposition of structures only• View related patents in SureChemOpen• *Some filtering of common chemistry likely
  • 9. SureChem and IBM in PubChem 
 (2 Example Patents)!SureChem Total: 776! IBM Total : 527! US583593, Inhibitors of squalene synthetase and protein farnesyltransferase. Abbott ! 478   298   229   SureChem Total: 832 ! IBM Total: 239! 686   146   93   WO-1994018188-A1 ! 4-hydroxy-benzopyran-2-ones and 4- hydroxy-cycloalkyl[b]pyran-2-ones HIV protease inhibitors, Upjohn!
  • 10. Identifying Relevant Chemistry - IC50! US-20120035195-A1 BACE2, Hoffman LaRoche
  • 11. Structures with IC50 Values! US-20120035195-A1PDF SureChemOpen Excel
  • 12. Search IC50 Structures in PubChem! search
  • 13. SureChem Unique Contribution! SureChem Pubchem 79 96 (ThomsonPharma , Chemicalize) Stage! No. of Structures! Available from SureChem (SC)! 1848! Pre-Exist in PubChem! 669! Pre-Exist – not from IC50 table! 573! Pre-Exist – from IC50 table! 96 (12 from TP + 84 via chemicalize.org)! Unique-SC with IC50! 79! Unique-SC – beyond IC50 table! 1100!
  • 14. Identifying Relevant Chemistry! Patent 
 US-20120035195-A1!http://opentox.informatik.uni- freiburg.de/ches-mapper/!
  • 15. SureChem Chemical Relevance Filtering!•  Frequency  counts  of  chemicals  within  patents  •  AddiHonal  molecular  property  filtering  i.e.  Lipinski  descriptors   !•  Natural  Language  Processing  –  based  indexing  of  Exemplified  Compounds   ! ! Automated indexing of Exemplified Compounds in text!
  • 16. Conclusion!SureChem deposition into PubChem will –  Significantly expand public patent chemistry scope –  Contribute unique and timely MedChem-relevant data –  Enable open drug discovery and chemical biology –  Advance progress toward a more open, federated chemical information network