Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How One Monkey on a Typewriter Made a Difference to Online Chemistry

1,106 views

Published on

On Friday September 16th I was honored with the award for the North Carolina American Chemical Society Distinguished Speaker Award and got to review the past 20 years of my career. This was my short intro bio

"Antony Williams is a Ph.D. NMR spectroscopist and cheminformatician who has worked in academia, government, a Fortune 500 company, and two start-ups. He is co-founder of the free online chemical database ChemSpider, originally started as a hobby project and ultimately acquired by the Royal Society of Chemistry (in the UK) and now used by over 50,000 users per day. He is now a computational chemist at the Environmental Protection Agency in the National Center for Computational Toxicology and is focused on developing web applications to support data dissemination and progress efforts in allowing for faster and cheaper approaches to identify potential toxicological effects of chemicals. He has published >180 papers, >25 book chapters and a number of books. He is known as the ChemConnector on social networks. "

Published in: Science
  • Be the first to comment

  • Be the first to like this

How One Monkey on a Typewriter Made a Difference to Online Chemistry

  1. 1. How One Monkey on a Typewriter Made a Difference to Online Chemistry Antony Williams ORCID ID:0000-0002-2668-4821 I do know chimpanzees are not monkeys but the photos are more fun! Keyboard
  2. 2. Before we begin… • It’s going to be kind of a random walk… • The slides will go online at SlideShare http://www.slideshare.net/AntonyWilliams/. Some slides for tonight are ca. 10 years old! • Any offense is unintentional…I am Welsh!
  3. 3. What type of chemist am I? …sometimes colorful …sometimes a monkey as you will soon see…
  4. 4. Career to date… • NMR Spectroscopist (PhD) 1985-88 • EPR Spectroscopist (NRC) 1988-90 • NMR Facility Manager (U. of Ottawa) 1990-92 • NMR Leader (Kodak) & cheminformatics 1992-97 • Chief Science Officer (ACD/Labs) 1997-2007 • ChemSpider development (2 years) 2007-09 • VP eScience (RSC) 2009-15 • …and now at NCCT at the EPA
  5. 5. My Cheminformatics started here..
  6. 6. Focusing on cheminformatics.. NMR Prediction Computer-Assisted Structure Elucidation
  7. 7. Focusing on cheminformatics..
  8. 8. The Kodak Transition
  9. 9. Phew – I left March 1997…
  10. 10. >10 Years at ACD/Labs Analytical data processing NMR Prediction CASE Systems QSAR modeling PhysChem prediction Structure Drawing Nomenclature
  11. 11. Structure Drawing and Nomenclature Free ChemSketch for Home/Education  Understanding structure representation and nomenclature became ESSENTIAL for building and curating databases!
  12. 12. The Web is the Way Structure Drawing (First drawing applet) NMR Prediction ONLINE PhysChem prediction ONLINE Nomenclature ONLINE
  13. 13. My Greatest Pride – CASE Anyone struggling with a Structure Elucidation??
  14. 14. How many isomers for a formula? C10H17Br2ClO2, 50,502,293 C15H22O2, 138,136,211,624 C15H20O1, 37,568,150,635 C12H12O3, 68,930,547,646 C13H20O3, 14,431,269,166 C11H12N2O2, 3⋅1011 <n1012
  15. 15. How many isomers for a formula? C10H17Br2ClO2, 50,502,293 C15H22O2, 138,136,211,624 C15H20O1, 37,568,150,635 C12H12O3, 68,930,547,646 C13H20O3, 14,431,269,166 C11H12N2O2, 3⋅1011 <n1012
  16. 16. COSY Correlations Vicinal H-H couplings Geminal H-H couplings 9 19 N N O O H H H H H H H H H H H H H H H HHH H H H H 1 2 3 4 5 6 7 8 10 11 1213 14 16 17 18 20 21 22 23
  17. 17. HMBC Correlations (8Hz Optimized) 9 17a/b N N O O H H H H H H H H H H H H H H H HHH H H H H 1 2 3 4 5 6 7 8 10 11a 1213 1416 18a 20a 21 22 23a 23b 18b 20b 11b
  18. 18. Strychnine Non-standard Correlations 9 17a/b N N O O H H H H H H H H H H H H H H H HHH H H H H 1 2 3 4 5 6 7 8 10 11a 1213 1416 18a 20a 21 22 23a 23b 18b 20b 11b 19 9 17a/b N N O O H H H H H H H H H H H H H H H HHH H H H H 1 2 3 4 5 6 7 8 10 11a 1213 1416 18a 20a 21 22 23a 23b 18b 20b 11b 9 17a/b N N O O H H H H H H H H H H H H H H H HHH H H H H 1 2 3 4 5 6 7 8 10 11a 1213 1416 18a 20a 21 22 23a 23b 18b 20b 11b 9 19N N O O H H H H H H H H H H H H H H H HHH H H H H 1 2 3 4 5 6 7 8 10 11a 12 13 14 16 17 18b 20 21 22 23 11b 2 JC 2 JCH 4 JCH 3 JCH 5 JCH
  19. 19. J. Cheminf. 2012, 4:5
  20. 20. Number of Skeletal Atoms J. Cheminf. 2012, 4:5
  21. 21. Errors in published structures…
  22. 22. 2007 – A Hobby Project
  23. 23. A hobby gone wild…Year 1 • Hobby-project connecting chemistry data on the web • Three servers – one purchased, two hand-built • Software begged and borrowed • Some late nights – 10pm to 2am for over a year • Survival of the naysayers in the community • Taking advantage of a changing world of data availability and crowdsourcing by willing participants • NO funding
  24. 24. But in WEEK 1 of release… “…The Zoo is filled with monkeys. (The same monkeys who are trying to write Shakespeare by hitting typewriter keys at random).”
  25. 25. So, if I’m a monkey…
  26. 26. I really tried to stay quiet…
  27. 27. A long public conversation… but people liked it…
  28. 28. Building a Structure Centric Community for Chemists Ability to curate and add to the database • Add structures and sets • “Clean” structures • Add data (spectra, CIFs, images) • Add links to other pages (URLs) • Add publication details • Year 2 - Will anybody help us?
  29. 29. Will anybody help us??? Daily crowdsourced curation underway • 40 curation emails per day • 100 identifiers per day removed, approved or added
  30. 30. Data Quality 2007
  31. 31. Data Quality 2007
  32. 32. Data quality issues are everywhere
  33. 33. Data quality issues are everywhere
  34. 34. Data Quality Issues Williams and Ekins, DDT, 16: 747-750 (2011) Science Translational Medicine 2011
  35. 35. Data Quality just LAST NIGHT! Carbon felt, 1.27cm (0.5in) thick Single-walled carbon nanotubes Multi-walled carbon nanotubes (MWNTs), 95+% Graphite rod, 13cm (5.125in) dia x 30.5cm (12in) long Graphite rod, 6.15mm (0.242in) dia x 152mm (6in) long Graphite rod, 6.15mm (0.242in) dia x 305mm (12in) long Carbon powder acetylene carbon acetylene carbon a methyl group Acheson graphite C5M Methylidyne radical Carbon rods, 5N 308068-56-6 Activated Carbon Powder GRAPHITE SYNTHETIC Activated carbon, Graphite Fullerene soot, as produced
  36. 36. What is the Structure of Vitamin K1?
  37. 37. What is the Structure of Vitamin K1?
  38. 38. Wolfram Alpha
  39. 39. DailyMed
  40. 40. Wikipedia – ok that’s not right!!
  41. 41. How I spent Xmas Time in 2007…
  42. 42. More issues than I imagined…
  43. 43. ChemBoxes
  44. 44. Types of Errors Found • Structure drawing errors • Misassociation of names and structures • IUPAC Name Errors • Links out to databases were to wrong structures • Property errors/validation • CAS Number validation
  45. 45. Monkeys and details.. https://en.wikipedia.org/wiki/Talk:Tacrolimus#IUPAC_Name_and_structure
  46. 46. Vitamin K1
  47. 47. Oops… https://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Chemistry/CAS_validation
  48. 48. That’s AWESOME!
  49. 49. 7900 Chemicals Released
  50. 50. Hybrid Man-Machine Curation?
  51. 51. WikiBox Chemicals/ChemBox Validation http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Chemicals/Chembox_validation
  52. 52. May 2009 RSC acquisition
  53. 53. ChemSpider Today… >50,000 daily users (I hear!)
  54. 54. Crowdsourcing continues http://www.chemspider.com/feedbackcurated.aspx
  55. 55. Are there more automated ways?
  56. 56. OPEN SOURCE: Chemical Validation and Standardization
  57. 57. Micropublishing Syntheses
  58. 58. Even some famous syntheses!
  59. 59. Newspapers all over the world
  60. 60. ChemSpider SyntheticPages
  61. 61. What we tried to fix… What’s the structure? What’s the structure? Are they in our file? Are they in our file? What’s similar? What’s similar? What’s the target? What’s the target?Pharmacology data? Pharmacology data? Known Pathways? Known Pathways? Working On Now? Working On Now?Connections to disease? Connections to disease? Expressed in right cell type? Expressed in right cell type? Competitors?Competitors? IP?IP?
  62. 62. Semantic Web Chemistry based on “ODOSOS” - >1.3 BILLION Triples
  63. 63. The Open PHACTS community ecosystem
  64. 64. “National Chemical Database Service”
  65. 65. So what did I learn?? So what did I learn over the years…. •Connecting people, data and systems •Integration of disparate data sources and systems can be so enabling •Data Quality is an overlooked imperative •Crowdsourcing, even when a small crowd, shares the load and speeds progress •Embrace ODOSOS for greater benefit •And so….to EPA-NCCT
  66. 66. What am I involved with at EPA? https://comptox.epa.gov/dashboard
  67. 67. The EPA Chemistry Dashboard
  68. 68. The EPA Chemistry Dashboard
  69. 69. The EPA Chemistry Dashboard
  70. 70. And, while I love Wikipedia… …it is not without issues…
  71. 71. What makes a Scientist Notable? Try asking difficult questions…
  72. 72. What is Impact for a scientist?
  73. 73. Ways to make an impact… • Publish, share, validate and curate data • Publish chemicals, syntheses and data • “Publish” – Papers, Blogs, Reports, Tweets, Presentations, Videos • Contribute to Wikipedia • Participate in chemistry communities • Contribute to the Big Data of Chemistry
  74. 74. A separate workshop soon…
  75. 75. An OLD Monkey on a keyboard... What I helped with… •Drawing software on >1,000,000 desktops •ChemSpider for >50,000 users/day •Cleaned a lot of Wikipedia chemicals •Almost fought with the Olympics Committee •…I hope it’s been useful?
  76. 76. Not just 1 monkey on a keyboard! …so many friends, colleagues, known and unknown that helped…
  77. 77. Thank you Email: tony27587@gmail.com ORCID: 0000-0002-2668-4821 Twitter: @ChemConnector Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams

×