SureChem Pubchem Deposition Preview - ICIC 2012 Conference

546 views

Published on

Following on from our initial announcement at Fall ACS 2012, we deposited all our data into PubChem on a 'hold basis'. This presentation gives an initial idea of SureChem's contribution to novel patent chemistry that will be available on PubChem.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
546
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Tracing structures associated with SAR data from the PDF to SureChemOpen
  • Structures have been manually exported from the PDF.
  • Structures are then searched in PubChem
  • Results show 96 structures from table were in PubChem. SureChem has all of those, plus an additional 79 structures.
  • SureChem Pubchem Deposition Preview - ICIC 2012 Conference

    1. 1. Integrating patent chemistry withpublic research resourcesAndrew Hinton, PhD ICIC 2012Christopher Southan, PhD 17 OctoberEvan Bolton, PhDNicko Goncharoff
    2. 2. SureChem Data CollectionDatabase of automatically mined structure datafrom text and images•20M annotated US, EP, WO full text recordsand Japan patent abstracts•12.8M unique chemical structures I•MEDLINE – 19M abstracts (upcoming)
    3. 3.  Free resource for researchers  Professional search needs Enables linking to public and  Data export, alerts, patent family proprietary content search, chemical relevance filters…  API or Data Feed access to chemistry & full text  Integrate with internal databases & workflows
    4. 4. Chemistry Mining Workflow
    5. 5. Public Patent Chemistry – A Changing Landscape
    6. 6. SureChem Depositing All* Structures into PubChem – Q4 2012•1976 to present•Deposition of structures only•Currently ‘on hold’•Will link to patents in SureChemOpen * After filtering of fragments and highly common chemistry
    7. 7. Compounds Derived from Patents and Literature found in PubChem By Molecular Weight Range (MWT) and Source Compounds Dervied from Patents and Literature found in PubChem Banded by Molecular Weight Range and Source *8.29M 9,000,000 Drug-like 66% 8,000,000 600-700Compounds in PubChem 500-600 7,000,000 MWT 6,000,000 400-500 3.99M MWT 5,000,000 3.80M Drug-like 60% Drug-like 62% 4,000,000 2.36M 300-400 3,000,000 Drug-like 51% MWT 2,000,000 0.76M 1,000,000 Drug-like 69% 200-300 100-200 0 ChEMBL IBM Thomson SCRIPDB SureChem Pharma *Provisional Numbers Source
    8. 8. SureChem Deposition PushesPubChem to 40 Million Compounds
    9. 9. Uniques and OverlapsSC - SCRIPDB SC - IBM 1.5M 1.2MSC - TPharma SC - ChEMBL 0.9M 0.1M
    10. 10. ChEMBL overlaps with Patent Sources in PubChem
    11. 11. Intersects – Patent Document View (2 Examples – SC & IBM)SureChem Total: 776 IBM Total : 527 US583593, Inhibitors of squalene synthetase and protein farnesyltransferase. Abbott 478 298 229 SureChem Total: 832 IBM Total: 239 686 146 93 WO-1994018188-A1 4-hydroxy-benzopyran-2-ones and 4- hydroxy-cycloalkyl[b]pyran-2-ones HIV protease inhibitors, Upjohn
    12. 12. Identifying Relevant Chemistry - IC 50 US-20120035195-A1 BACE2, Hoffman LaRoche
    13. 13. Structures with IC 50 Values US-20120035195-A1 PDF SureChemOpen Excel
    14. 14. Search IC 50 Structures in PubChem search
    15. 15. SureChem Unique Contribution SureChem Pubchem 96 (ThomsonPharma , 79 Chemicalize) Stage No. of Structures Available from SureChem (SC) 1848 Pre-Exist in PubChem 669 Pre-Exist – not from IC 50 table 573 Pre-Exist – from IC 50 table 96 (12 from TP + 84 via chemicalize.org) Unique-SC with IC 50 79 Unique-SC – beyond IC 50 table 1100
    16. 16. SureChem Chemical Relevance Filtering• Frequency counts of chemicals within patents• Additional molecular property filtering and structural alerts• Structural identification of “Likely Exemplars”• Natural Language Processing – based indexing of Exemplified Compounds Automated indexing of Exemplified Compounds in text
    17. 17. ConclusionsSureChem deposition into PubChem: – Significantly expands public patent chemistry scope – Contributes unique and timely MedChem-relevant data – Enables open drug discovery and chemical biology – Advances progress toward a more open, federated chemical information network
    18. 18. SureChem is a product from Digital Science

    ×