Sourcing High-Quality Online Data Resources for Computational Toxicology Antony Williams Bio-IT World, Current Methods for...
The Community Depends On Us <ul><li>“ We don’t want another Love Canal!” </li></ul><ul><li>“ What we know about PCBs shoul...
Comp Tox Models Depend on DATA <ul><li>Models for Computational Toxicology depend on the quality of the training set </li>...
Nothing but the Facts <ul><li>Jean-Claude Bradley, Drexel University  </li></ul><ul><li>“ There are no facts, only measure...
Open Notebook Science <ul><li>UsefulChem Blog:  http://tinyurl.com/48dyujh </li></ul>
Aqueous Solubility of ECGC <ul><li>Epigallocatechin gallate solubility in water </li></ul>
Melting Point of DMT
Content is King and  Quality  Costs <ul><li>Chemistry “content” is big  money </li></ul><ul><ul><li>Patent searching </li>...
Where can we find data online?
Where is chemistry online? <ul><li>Encyclopedic articles (Wikipedia) </li></ul><ul><li>Chemical vendor databases </li></ul...
Lots of “Public Compound” Databases <ul><li>PubChem </li></ul><ul><li>Drugbank </li></ul><ul><li>ChEBI/ChEMBL </li></ul><u...
Toxicology Data
 
Chemistry on the Internet <ul><li>ChemSpider “links” chemistry on the internet </li></ul><ul><ul><li>Almost 25 million com...
www.chemspider.com
Search for a Chemical
Available Information… <ul><li>Linked to vendors, safety data, toxicity, metabolism </li></ul>
We Have Delivered the Vision <ul><ul><li>“ Build a Structure Centric Community to </li></ul></ul><ul><ul><li>Serve Chemist...
Dialects describing chemicals
What is the Structure of Vitamin K?
MeSH <ul><li>A lipid cofactor that is required for normal blood clotting. Several forms of vitamin K have been identified:...
What is the Structure of Vitamin K1?
What is the Structure of Vitamin K1?
CAS’s Common Chemistry
Wikipedia
 
 
ChEBI – Manual Curation
 
 
PubChem
 
<ul><li>“ 2-methyl-3-(3,7,11,15-tetramethyl hexadec-2-enyl)naphthalene-1,4-dione” </li></ul><ul><li>Variants of systematic...
Public Domain Chemistry Databases <ul><li>Our  databases are a mess… </li></ul><ul><li>Non-curated databases are prolifera...
Vancomycin <ul><li>Who will curate? </li></ul><ul><li>PubChem is not resourced to clean these errors </li></ul><ul><li>How...
The FDA’s DailyMed
  Structures on DailyMed
Lack of Stereochemisty
  Incorrect Structures
Wow!
We want to model DILI… <ul><li>Drug metabolism in the liver can convert some drugs into highly reactive intermediates,  </...
Initial DILI data – Names and Data <ul><li>Griseofulvin </li></ul><ul><li>Hycanthone </li></ul><ul><li>Hydrochlorothiazide...
So you want data on drugs??? <ul><li>Sourcing data based on drug names is difficult! </li></ul><ul><li>Where would you fin...
Vytorin: Ezetimibe/Simvastatin
Vytorin: Ezetimibe/Simvastatin
Vytorin: Ezetimibe/Simvastatin
Vytorin: Ezetimibe/Simvastatin
Vytorin: Ezetimibe/Simvastatin
Symbicort: Budesonide + Formoterol
Symbicort: Budesonide + Formoterol ChemIDPlus Wikipedia
DrugBank: Search Symbicort…
Symbicort: Budesonide + Formoterol <ul><li>PubChem </li></ul><ul><ul><li>8 structures called Budesonide. 1 “correct” </li>...
Taxol: Paclitaxel  44  structures
Taxol: Paclitaxel  Bioassay  Data
Public Domain Chemistry Databases <ul><li>An examination of quality in databases – inter/intra lab comparison of processes...
 
Drug Name Generic Name ChEBI ChemSpider CAS Com. Chem ChemIDPlus DailyMed DrugBank PubChem Wikipedia Spiriva Tiotropium Br...
Personal  Experiences <ul><li>Highest Quality Resources : DSSTox (EPA), ChEBI (EBI) </li></ul><ul><li>High Quality Resourc...
What can be done to help? <ul><li>“ Crowdsourcing” – gather the support of members of the community to add, annotate and c...
The Future: Open Source and Data <ul><li>Open source software : descriptors and algorithms </li></ul><ul><li>QSAR should b...
The Future: Open PHACTS <ul><li>The  Open PHACTS  project will develop an open access innovation platform, called  Open Ph...
Exposing Data for Semantic Web…
Coming soon… <ul><li>Book chapter: </li></ul><ul><ul><li>“ Accessing, Using and Creating Chemical Property Databases For C...
Thank you Email: williamsa@rsc.org  Twitter: ChemConnector Blog: www.chemspider.com/blog Personal Blog: www.chemconnector....
Upcoming SlideShare
Loading in...5
×

Sourcing high quality online data resources for computational toxicology

1,118

Published on

The internet continues to offer increased access to chemistry data that may be of value to scientists interested in populating systems containing reference toxicology data as well as to provide data for the development of predictive models. This presentation will give an overview of some of the various sources of data available via the internet, provide an overview of some of the challenges associated with gathering high-quality data and discuss methods by which to mesh together disparate data sources.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,118
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
11
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Sourcing high quality online data resources for computational toxicology

  1. 1. Sourcing High-Quality Online Data Resources for Computational Toxicology Antony Williams Bio-IT World, Current Methods for Computational Toxicology and Chemogenomics
  2. 2. The Community Depends On Us <ul><li>“ We don’t want another Love Canal!” </li></ul><ul><li>“ What we know about PCBs should warn us all!” </li></ul><ul><li>The public is “suspicious” of pharma… </li></ul><ul><li>“ Chemicals are dangerous” </li></ul>
  3. 3. Comp Tox Models Depend on DATA <ul><li>Models for Computational Toxicology depend on the quality of the training set </li></ul><ul><li>There are multiple issues with data quality including: </li></ul><ul><ul><li>Experimental </li></ul></ul><ul><ul><ul><li>The validity of the method, Reproducibility, Sample quality, Data capture, Transcription of values </li></ul></ul></ul><ul><ul><li>Computational </li></ul></ul><ul><ul><ul><li>Accurately representing the data – correct units, annotations, quality flags, attribution, are the structures correct? </li></ul></ul></ul>
  4. 4. Nothing but the Facts <ul><li>Jean-Claude Bradley, Drexel University </li></ul><ul><li>“ There are no facts, only measurements embedded within assumptions” </li></ul>
  5. 5. Open Notebook Science <ul><li>UsefulChem Blog: http://tinyurl.com/48dyujh </li></ul>
  6. 6. Aqueous Solubility of ECGC <ul><li>Epigallocatechin gallate solubility in water </li></ul>
  7. 7. Melting Point of DMT
  8. 8. Content is King and Quality Costs <ul><li>Chemistry “content” is big money </li></ul><ul><ul><li>Patent searching </li></ul></ul><ul><ul><li>Structures and properties </li></ul></ul><ul><ul><li>Drug databases </li></ul></ul><ul><ul><li>Literature databases </li></ul></ul><ul><li>Chemical Abstracts Service (CAS), the “Gold Standard” in Chemistry related information </li></ul><ul><ul><li>101 years of content </li></ul></ul><ul><ul><li>$260 million revenue (2006) </li></ul></ul><ul><ul><li>>50 million substances </li></ul></ul><ul><ul><li>>60 million sequences </li></ul></ul>
  9. 9. Where can we find data online?
  10. 10. Where is chemistry online? <ul><li>Encyclopedic articles (Wikipedia) </li></ul><ul><li>Chemical vendor databases </li></ul><ul><li>Metabolic pathway databases </li></ul><ul><li>Property databases </li></ul><ul><li>Patents with chemical structures </li></ul><ul><li>Drug Discovery data </li></ul><ul><li>Scientific publications </li></ul><ul><li>Compound aggregators </li></ul><ul><li>Blogs/Wikis and Open Notebook Science </li></ul>
  11. 11. Lots of “Public Compound” Databases <ul><li>PubChem </li></ul><ul><li>Drugbank </li></ul><ul><li>ChEBI/ChEMBL </li></ul><ul><li>KEGG </li></ul><ul><li>LipidMAPs </li></ul><ul><li>ChemIDPlus </li></ul><ul><li>eMolecules </li></ul><ul><li>ZINC </li></ul><ul><li>Lots of chemical vendors </li></ul><ul><li>ChemSpider </li></ul>
  12. 12. Toxicology Data
  13. 14. Chemistry on the Internet <ul><li>ChemSpider “links” chemistry on the internet </li></ul><ul><ul><li>Almost 25 million compounds, 400 data sources </li></ul></ul><ul><ul><li>Allows community deposition, curation, annotation </li></ul></ul><ul><ul><li>Integrating properties, publications, patents, media </li></ul></ul><ul><ul><li>Text, structure and substructure searching </li></ul></ul>
  14. 15. www.chemspider.com
  15. 16. Search for a Chemical
  16. 17. Available Information… <ul><li>Linked to vendors, safety data, toxicity, metabolism </li></ul>
  17. 18. We Have Delivered the Vision <ul><ul><li>“ Build a Structure Centric Community to </li></ul></ul><ul><ul><li>Serve Chemists” </li></ul></ul><ul><ul><li>Integrate chemical structure data on the web </li></ul></ul><ul><ul><li>Create a “structure-based hub” to information, data and algorithmic predictions </li></ul></ul><ul><ul><li>Let chemists contribute their own data </li></ul></ul><ul><ul><li>Allow the community to curate/correct data </li></ul></ul>
  18. 19. Dialects describing chemicals
  19. 20. What is the Structure of Vitamin K?
  20. 21. MeSH <ul><li>A lipid cofactor that is required for normal blood clotting. Several forms of vitamin K have been identified: VITAMIN K 1 (phytomenadione) derived from plants , VITAMIN K 2 (menaquinone) from bacteria, and synthetic naphthoquinone provitamins, VITAMIN K 3 (menadione). Vitamin K 3 provitamins, after being alkylated in vivo, exhibit the antifibrinolytic activity of vitamin K. Green leafy vegetables, liver, cheese, butter, and egg yolk are good sources of vitamin K </li></ul>
  21. 22. What is the Structure of Vitamin K1?
  22. 23. What is the Structure of Vitamin K1?
  23. 24. CAS’s Common Chemistry
  24. 25. Wikipedia
  25. 28. ChEBI – Manual Curation
  26. 31. PubChem
  27. 33. <ul><li>“ 2-methyl-3-(3,7,11,15-tetramethyl hexadec-2-enyl)naphthalene-1,4-dione” </li></ul><ul><li>Variants of systematic names on PubChem </li></ul><ul><li>2-methyl-3-[(E,7R,11R)-3,7,11,15-tetramethyl </li></ul><ul><li>2-methyl-3-[(E,7S,11R)-3,7,11,15-tetramethyl </li></ul><ul><li>2-methyl-3-[(E,7R,11S)-3,7,11,15-tetramethyl </li></ul><ul><li>2-methyl-3-[(E,7S,11S)-3,7,11,15-tetramethyl </li></ul><ul><li>2-methyl-3-[(E,11S)-3,7,11,15-tetramethyl </li></ul><ul><li>2-methyl-3-[(E)-3,7,11,15-tetramethyl </li></ul><ul><li>2-methyl-3-(3,7,11,15-tetramethyl </li></ul><ul><li>2-methyl-3-[(E)-3,7,11,15-tetramethyl </li></ul>
  28. 34. Public Domain Chemistry Databases <ul><li>Our databases are a mess… </li></ul><ul><li>Non-curated databases are proliferating errors </li></ul><ul><li>We source and deposit data between databases </li></ul><ul><li>Original sources of errors hard to determine </li></ul><ul><li>Curation is time-consuming, challenging and exacting </li></ul>
  29. 35. Vancomycin <ul><li>Who will curate? </li></ul><ul><li>PubChem is not resourced to clean these errors </li></ul><ul><li>How would you clean such a large dataset? </li></ul>
  30. 36. The FDA’s DailyMed
  31. 37. Structures on DailyMed
  32. 38. Lack of Stereochemisty
  33. 39. Incorrect Structures
  34. 40. Wow!
  35. 41. We want to model DILI… <ul><li>Drug metabolism in the liver can convert some drugs into highly reactive intermediates, </li></ul><ul><li>This can affect the structure and functions of the liver. </li></ul><ul><li>Drug-induced liver injury (DILI), is the #1 reason drugs are not approved and withdrawn from market after approval </li></ul><ul><li>Estimated global annual incidence rate of DILI is 13.9-24.0 per 100,000 inhabitants </li></ul><ul><li>DILI accounts for an estimated 3-9% of all adverse drug reactions reported to health authorities </li></ul><ul><li>Herbal components can cause DILI too </li></ul>Thanks to Sean Ekins https://dilin.dcri.duke.edu/for-researchers/info/
  36. 42. Initial DILI data – Names and Data <ul><li>Griseofulvin </li></ul><ul><li>Hycanthone </li></ul><ul><li>Hydrochlorothiazide </li></ul><ul><li>Hydrocortisone </li></ul><ul><li>Hydroxyurea </li></ul><ul><li>Idarubicin HCl </li></ul><ul><li>Idoxuridine </li></ul><ul><li>Imipramine HCl </li></ul><ul><li>indomethacin </li></ul><ul><li>isoniazid </li></ul><ul><li>Isoproterenol HCl </li></ul><ul><li>Isotretinoin </li></ul><ul><li>Isoxsuprine HCl </li></ul><ul><li>Kanamycin Sulfate </li></ul><ul><li>Ketorolac Tromethamine </li></ul><ul><li>Ketotifen </li></ul><ul><li>Labetalol </li></ul>
  37. 43. So you want data on drugs??? <ul><li>Sourcing data based on drug names is difficult! </li></ul><ul><li>Where would you find the “correct chemical structures”? </li></ul><ul><li>What databases can you trust? </li></ul>
  38. 44. Vytorin: Ezetimibe/Simvastatin
  39. 45. Vytorin: Ezetimibe/Simvastatin
  40. 46. Vytorin: Ezetimibe/Simvastatin
  41. 47. Vytorin: Ezetimibe/Simvastatin
  42. 48. Vytorin: Ezetimibe/Simvastatin
  43. 49. Symbicort: Budesonide + Formoterol
  44. 50. Symbicort: Budesonide + Formoterol ChemIDPlus Wikipedia
  45. 51. DrugBank: Search Symbicort…
  46. 52. Symbicort: Budesonide + Formoterol <ul><li>PubChem </li></ul><ul><ul><li>8 structures called Budesonide. 1 “correct” </li></ul></ul><ul><ul><li>6 structures called Formoterol. 1 “correct” </li></ul></ul><ul><ul><li>Search on “Symbicort” gives 1 structure. </li></ul></ul>
  47. 53. Taxol: Paclitaxel 44 structures
  48. 54. Taxol: Paclitaxel Bioassay Data
  49. 55. Public Domain Chemistry Databases <ul><li>An examination of quality in databases – inter/intra lab comparison of processes for 150 drugs </li></ul>
  50. 57. Drug Name Generic Name ChEBI ChemSpider CAS Com. Chem ChemIDPlus DailyMed DrugBank PubChem Wikipedia Spiriva Tiotropium Bromide No Hits  No Hits    4/0  Depakote Valproate semisodium        No Structure Basen Voglibose   No Hits  No Hits  2/1  Symbicort 1) Budesonide       8/1  Symbicort 2) Formoterol WRONG  No Hits    6/1  Vytorin 1) Ezetimibe   No Hits      Vytorin 2) Simvastatin       2/1  Taxol Paclitaxel       44/1  Thalidomid Thalidomide No Hits        Zocor Simvastatin       2/1  Crestor Rosuvastatin   No Hits    2/1 
  51. 58. Personal Experiences <ul><li>Highest Quality Resources : DSSTox (EPA), ChEBI (EBI) </li></ul><ul><li>High Quality Resources : DrugBank, Human Metabolome Database, ChemIDPlus, ChemSpider, KEGG </li></ul><ul><li>Are there others you use??? </li></ul>
  52. 59. What can be done to help? <ul><li>“ Crowdsourcing” – gather the support of members of the community to add, annotate and curate data </li></ul><ul><li>Wikipedia is the domain success story for crowdsourcing. </li></ul><ul><ul><li>PubChem is an example of “crowdsourced deposition” of chemistry data </li></ul></ul><ul><ul><li>ChemSpider is an example of “crowdsourced deposition and curation” </li></ul></ul>
  53. 60. The Future: Open Source and Data <ul><li>Open source software : descriptors and algorithms </li></ul><ul><li>QSAR should be cheaper and better! </li></ul><ul><li>Selectively share your models with collaborators </li></ul><ul><li>Centralized hosting of models / predictions </li></ul>
  54. 61. The Future: Open PHACTS <ul><li>The Open PHACTS project will develop an open access innovation platform, called Open Pharmacological Space (OPS), via a semantic web approach. OPS will be comprised of data, vocabularies and infrastructure needed to accelerate drug-oriented research. </li></ul>
  55. 62. Exposing Data for Semantic Web…
  56. 63. Coming soon… <ul><li>Book chapter: </li></ul><ul><ul><li>“ Accessing, Using and Creating Chemical Property Databases For Computational Toxicology Modeling ” </li></ul></ul><ul><ul><li>Antony J. Williams, Sean Ekins, Ola Spjuth and Egon L. Willighagen </li></ul></ul>
  57. 64. Thank you Email: williamsa@rsc.org Twitter: ChemConnector Blog: www.chemspider.com/blog Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×