Checking, Curating And Qualifying Chemistry

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    Checking, Curating And Qualifying Chemistry - Presentation Transcript

    1. Checking, Curating and Qualifying Chemistry to Build a Structure Centric Community for Chemists Rutgers University 12/2/2008 Antony Williams
    2. ChemSpider - A Search Engine for Chemists
      • Questions a chemist might ask…
        • What is the melting point of n-butanol?
        • What is the chemical structure of Xanax?
        • Chemically, what is phenolphthalein?
        • What are the stereocenters of cholesterol?
        • Where can I find publications about xylene?
        • What are the different trade names for Ketoconazole?
        • What is the NMR spectrum of Aspirin?
        • What are the safety handling issues for Thymol Blue?
        • ChemSpider can answer all of these questions
    3. Tell Me About Glutathione
    4. Tell Me About Glutathione
    5. Tell Me About Glutathione
    6. Tell Me About Glutathione
    7. Tell Me About Glutathione
    8. Link outs
    9. Links out to KEGG Kyoto Encyclopedia of Genes and Genomes
    10. How many names does a compound have?
    11. ChemSpider Data Content
      • Over 21.5 million unique chemical structures from ca. 150 data sources
        • Online Databases –PubChem, Drugbank, KEGG, Wikipedia
        • Literature – PubMed, J Het Chem, Nature, RSC, Open Access
        • Chemical Vendors – over 40 different vendors and growing
        • Personal Depositions – individual contributions
        • Content database vendors
        • Analytical data collections
        • Patents
        • Web scraping
        • Content is linked back to the original data sources
    12. Complex Search
    13. The Quality of Data Online…
      • Aggregating data opens up quality issues
      • Structure-identifier associations are “dirty”
      • Structures are COMMONLY incorrect
      • Manual curation of small databases is enough work – what about millions of structures?
      • Structures are far from perfect. What is a “correct structure”?
        • Full stereochemistry?
        • Historical timeline of structure?
        • Who is the authority?
    14. Quality is a Major Issue- Search Butanol OLD EXAMPLE..now fixed
    15. Wikipedia Chemistry Curation project
      • Only ca. 5000 organic structures, 7000 total structures
      • Almost a year of work so far for a team of 6 people
      • Many errors removed in the process. Curation process is a daily event for users/depositors
      • Slow and torturous process
      • http://en.wikipedia.org/wiki/Talk:Tacrolimus#IUPAC_Name_and_structure
    16. Wikipedia Curation
      • Looking for self-consistency across a Wikipedia Page
      • Primary key is the article TITLE
      • The chemical shown needs to match the title
      • Cyclic self-consistency – and decisions must get made
    17. Other issues…
    18. Charges
    19. Sugars – Machine Readable vs Aesthetics Haworth Stereo Fischer
    20. Wikipedia – Crowdsourcing Chemistry
    21. Thymol Blue on ChemSpider
      • Data online includes:
        • UV-vis spectrum
        • Measured experimental properties
        • Link to Wikipedia article
        • Links to chromatography details
        • Multiple identifiers/trade names etc.
        • Links to vendors/suppliers/other databases
        • Safety information
        • http://www.chemspider.com/q/thymol%20blue
    22. Crowd-sourcing Curation
      • How to curate data for millions of structures?
      • Robot processes can clean up depositions
        • Search for Chloride and check molecular formula for Cl
        • Check for stereochemistry and remove names with stereo
      • Provide a simple-to-use platform to curate, annotate and tag data
      • Provide curator administration to prevent vandalism (Veropedia)
    23. Post Comments
      • Anyone can “Post Comments” associated with a structure. To curate data we require login to track
    24. Multi-level Curation and Approval
    25. Crowd-sourcing Chemistry
      • Crowd-sourced curation: identify and tag errors, edit names, synonyms, identify records for deprecation
      • ALSO
      • Crowd-sourced deposition: anyone can deposit data (structures, text, images, analytical data)
    26. Vancomycin
      • Originally 12 structures with vancomycin
        • Incomplete stereochemistry
        • Complete but different stereochemistry
        • Different charge states
      • 1 remains after community collaboration with ChEBI
    27. “ Collaboration” with ChEBI
    28. Ginkgolide B
    29. DailyMed
    30. Quality of Structures
    31. Quality of Structures!!!
    32.  
    33. “Entity Extraction”
      • Rule-based recognition of systematic names:
        • Use a lexeme of name fragments
        • Rules for identifying bounds of a name
      • Look-up dictionary:
        • Drug Names
        • Trivial Names
        • Numbers : Registry IDs, EINECS/ELINCS
        • Massive look-up dictionary of validated identifiers on ChemSpider
    34. Name Recognition
      • Azo aldehyde 2   was  synthesized according to a reported  method [17]. To  a stirred  solution  of azo aldehyde 2   (1.08 g, 3.76 mmol )  in  dry CH2Cl2  (30.00 mL) at  0 oC  were  successively  added (3,4-diaminophenyl)phenyl methanone 1 (0.40 g, 1.88 mmol) and a excces of anhydrous MgSO4 (2.00 g,16.67 mmol) .
      • The resulting  mixture  was  stirred  for  6 hours  at room temperature [18]. The mixture was  filtered and washed with dichloromethane . Then the solvent was  evaporated under reduced pressure to  give azo Schiff base 3   as a red solid which was recrystalized from ethanol 95%    (1.28 g, 91 %)
    35. Name Recognition
      • Azo aldehyde 2   was  synthesized according to a reported  method [17]. To  a stirred  solution  of azo aldehyde 2   (1.08 g, 3.76 mmol )  in  dry CH2Cl2   (30.00 mL) at  0 oC  were  successively  added  (3,4-diaminophenyl)phenyl methanone 1 (0.40 g, 1.88 mmol) and a excess of anhydrous MgSO 4 (2.00 g,16.67 mmol) .
      • The resulting  mixture  was  stirred  for  6 hours  at room temperature [18]. The mixture was  filtered and washed with dichloromethane . Then the solvent was  evaporated under reduced pressure to  give azo Schiff base 3   as a red solid which was recrystalized from ethanol 95%    (1.28 g, 91 %)
    36. ChemMantis
      • Chem ical M arkup A nd N omenclature T ransformation I ntegrated S ystem
    37. Document markup
    38. Markup – 3 seconds!
    39. On the fly conversion
    40. Shorthand Formulae Supported
    41. One Click to more Info…
    42. Names and Structures
      • Dichloroacetone
      • Trichloromethylsilane
    43. Ambiguity
    44. Ambiguity in Abbreviations - DPA
    45. IUPAC PAC Articles
    46. Patents
      • Single Configuration File defines entities for markup
      • Algorithms can be built for certain entities but the majority are dictionaries – vendors, Phys Properties, Analytical
      • We can extend our system – should we integrate to PDB somehow?
    47. Nature Publications
    48. Entity Balloons
      • Structures are the language of chemistry
      • Show structures to chemists and search/link from there
      • Link to PDB ?
    49. Other Dictionaries - Species
      • We are considering
        • Bacteria
        • Fungi
        • Enzymes
        • Viruses
        • PDB codes?
    50. Integrations Out to Other Sources
    51. Reactions
    52. Conclusions
      • The quality of structure-based data online should always be questioned – that includes ChemSpider
      • Data on ChemSpider are being added and curated on a daily basis but we need more eyeballs helping always
      • ChemSpider has a large validated structure-name dictionary
      • Chemical name extraction and document markup is very enabling

    + Antony WilliamsAntony Williams, 11 months ago

    custom

    769 views, 0 favs, 0 embeds more stats

    An overview of what we do to curate and annotate sm more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 769
      • 769 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 4
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories