Chem spider as a chemical term resolver

•Download as PPT, PDF•

1 like•682 views

US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

In recent years, in parallel with the general broad trend of information proliferation, many tens of public chemical databases have been created and made available using internet technologies. In many cases fluent data exchange has occurred between these various databases as they source information from one another. While this has the advantages of linking together multiple data sources the results also include the proliferation of errors across the various databases. The lack of a public authority to resolve such errors significantly affects the quality of freely accessible chemical information. While ChemSpider has previously allowed a crowdsourcing approach to curation efforts have now migrated to addressing this problem using a "federated resolver" approach. This presentation will report on our work in this area.

Technology Education

ChemSpider as a Chemical
Term Resolver

Antony Williams, Valery Tkachenko,
Sean Ekins and Andy Fant
ACS San Diego March 2012

It is so difficult to navigate…
IP?
What’s the
structure?
Are they in
our file?
What’s
similar?
What’s the
Pharmacology target?
data?

Known
Pathways?
Competitors?
Working On
Connections Now?
to disease?
Expressed in
right cell type?

Open PHACTS Project
 Develop a set of robust standards…
 Implement the standards in a semantic integration hub
 Deliver services to support drug discovery programs in
pharma and public domain
 22 partners, 8 pharmaceutical companies, 3 biotechs
 36 months project

Guiding principle is open access, open usage, open source
- Key to standards adoption -

MeSH
 A lipid cofactor that is required for normal blood
clotting.

 Several forms of vitamin K have been identified:
 VITAMIN K 1 (phytomenadione) derived from
plants,
 VITAMIN K 2 (menaquinone) from bacteria, and
synthetic naphthoquinone provitamins,
 VITAMIN K 3 (menadione).

Create an Online “Resolver” as a
path to chemistry
 Search all forms of structure IDs

 Systematic name(s)
 Trivial Name(s)
 SMILES
 InChI Strings
 InChIKeys
 Database IDs
 Registry Number

Available Information…
 Linked to vendors, safety data, toxicity, metabolism

Resolving Names for QUALITY
 Searching chemical identifiers should resolve to
the correct chemical as much as possible

Validated Name-Structure Dictionaries

 Chemical name dictionaries are used for:
 Text-mining (publications, patents)
 Used to index PubMed and link to Google Patents

 Linking to other databases – think Biology!
 When structures are not available drug names link

 Searching the web
 Names link to structures link to InChIs

Top 200 Drugs on Wikipedia
http://en.wikipedia.org/wiki/List_of_bestselling_drugs

The Project Challenge PART ONE
 Agree on the set of chemical names to work with

 Independently create an SDF file in each “lab”

 Compare differences and agree on final structures

 Issue “Gold Standard” SDF file to team

Relative accuracy of groups against
final master list

The Project Challenge PART TWO
 Use Gold Standard SDF File to investigate data
quality on these compounds in Internet Databases

 Two checks
 Search chemical name – does it return the
correct compound. If not correct, how is it
different?
 Search “structure” – SMILES, Molfile,
InChIString or InChIKey

Standardize

 Use the SRS as a guidance document for
standardization
 Adjust as necessary to our needs

One dictionary look up is never enough…
 ChemSpider does not contain all chemistry

 We are not the only ones curating data

 New chemistry expands daily and goes online

One dictionary look up is never enough…
 Federation is key….

 Check ChemSpider first, if not found then
 Check PubChem
 Check NCI resolver
 Check ChEBI
 Check ….the “network” of open interfaces

 Each resolver will have its own “quantitative
confidence”.

Chemical Identifier Resolver (CIR)

Converts a given
structure identifier into
another representation
or structure identifier.

Resolve names,
identifiers etc

http://cactus.nci.nih.gov/chemical/structure

We are building….
 A central federated resolver utilizing available
services
 Dictionary lookups, systematic name conversions
(multiple tools – ACD/Labs, Lexichem, OPSIN)
 “Consensus” decisions and guidance BUT
 Chemicals have timelines!!!

Thank you

Email: williamsa@rsc.org
Twitter: ChemConnector
Personal Blog: www.chemconnector.com
SLIDES: www.slideshare.net/AntonyWilliams

What's hot

ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Crawling Across the Web of Chemistry Using ChemSpider US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Citizen Scientists and Their Contributions to Internet Based ChemistryUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Integrating and curating internet based chemistry resources to serve life sci...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

How an Online Resource for Chemistry Can Change Our WorldUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

ChemSpider hosting linking and curating chemistry data for the communityUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Comm...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

How the web has weaved a web of interlinked chemistry data finalUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Navigating the Complex Web of Chemistry Using ChemSpiderUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Enhancing Discoverability Across Royal Society Of Chemistry Content By Integr...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Whitney Symposium Lecture June 2008US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Ebi public meeting on internet chemistry databases november 2010US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

ChemSpider – The Vision and Challenges Associated with Building a Free Online...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Implementing chemistry platform for OpenPHACTSValery Tkachenko

ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Building linked data large-scale chemistry platform - challenges, lessons and...Valery Tkachenko

Crowdsourcing, Collaborations And Text Mining In A World Of Open ChemistryUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

What's hot (19)

ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...

Crawling Across the Web of Chemistry Using ChemSpider

Citizen Scientists and Their Contributions to Internet Based Chemistry

ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...

Integrating and curating internet based chemistry resources to serve life sci...

How an Online Resource for Chemistry Can Change Our World

ChemSpider hosting linking and curating chemistry data for the community

ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Comm...

How the web has weaved a web of interlinked chemistry data final

Navigating the Complex Web of Chemistry Using ChemSpider

Enhancing Discoverability Across Royal Society Of Chemistry Content By Integr...

Whitney Symposium Lecture June 2008

Ebi public meeting on internet chemistry databases november 2010

ChemSpider – The Vision and Challenges Associated with Building a Free Online...

Implementing chemistry platform for OpenPHACTS

ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...

Building linked data large-scale chemistry platform - challenges, lessons and...

Crowdsourcing, Collaborations And Text Mining In A World Of Open Chemistry

ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...

Similar to Chem spider as a chemical term resolver

Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Chemistry Online and The vision and challenges associated with building the c...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

The Great Promise of Online Data for Chemistry and the Life SciencesUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Chemical Database Projects Delivered by RSC eScienceUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Mining public domain data as a basis for drug repurposingUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

ChemSpider – disseminating data and enabling an abundance of chemistry platformsUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Chemspider hosting linking and curating chemistry data for the communityRoyal Society of Chemistry

ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

RSC ChemSpider Science Commons Symposium Pacific Northwest #scspnUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Crowdsourcing Chemistry for the Community – 5 Years of ExperiencesUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Connecting Chemistry Across the Internet Using ChemSpiderUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

ChemSpider – An Online Database and Registration System Linking the WebUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

RSC ChemSpider – Building An Internet Based Community For ChemistsUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Building A Community Resource For The Life SciencesUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

RSC ChemSpider is the online chemistry database where community contributions...US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Connecting Chemists to the Internet Through ChemSpiderUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

ChemSpider Presentation At University Of TorontoUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Chemistry made mobile – the expanding world of chemistry in the handUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Similar to Chem spider as a chemical term resolver (20)

Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...

Chemistry Online and The vision and challenges associated with building the c...

The Great Promise of Online Data for Chemistry and the Life Sciences

Chemical Database Projects Delivered by RSC eScience

Mining public domain data as a basis for drug repurposing

ChemSpider – disseminating data and enabling an abundance of chemistry platforms

Chemspider hosting linking and curating chemistry data for the community

ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...

Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry

RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn

Crowdsourcing Chemistry for the Community – 5 Years of Experiences

RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...

Connecting Chemistry Across the Internet Using ChemSpider

ChemSpider – An Online Database and Registration System Linking the Web

RSC ChemSpider – Building An Internet Based Community For Chemists

Building A Community Resource For The Life Sciences

RSC ChemSpider is the online chemistry database where community contributions...

Connecting Chemists to the Internet Through ChemSpider

ChemSpider Presentation At University Of Toronto

Chemistry made mobile – the expanding world of chemistry in the hand

Recently uploaded

Partners Life - Insurer Innovation Award 2024The Digital Insurer

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10

Real Time Object Detection Using Open CVKhem

GenAI Risks & Security Meetup 01052024.pdflior mazor

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

🐬 The future of MySQL is Postgres 🐘RTylerCroy

Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Recently uploaded (20)

Partners Life - Insurer Innovation Award 2024

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

Advantages of Hiring UIUX Design Service Providers for Your Business

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

Real Time Object Detection Using Open CV

GenAI Risks & Security Meetup 01052024.pdf

Handwritten Text Recognition for manuscripts and early printed texts

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

🐬 The future of MySQL is Postgres 🐘

Strategies for Landing an Oracle DBA Job as a Fresher

Finology Group – Insurtech Innovation Award 2024

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

How to Troubleshoot Apps for the Modern Connected Worker

What Are The Drone Anti-jamming Systems Technology?

Axa Assurance Maroc - Insurer Innovation Award 2024

Boost Fertility New Invention Ups Success Rates.pdf

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Chem spider as a chemical term resolver

1. ChemSpider as a Chemical Term Resolver Antony Williams, Valery Tkachenko, Sean Ekins and Andy Fant ACS San Diego March 2012

2. The Web of Chemistry – VERY BIG!

3. Online Databases are “Linking”

4. It is so difficult to navigate… IP? What’s the structure? Are they in our file? What’s similar? What’s the Pharmacology target? data? Known Pathways? Competitors? Working On Connections Now? to disease? Expressed in right cell type?

5. Open PHACTS Project  Develop a set of robust standards…  Implement the standards in a semantic integration hub  Deliver services to support drug discovery programs in pharma and public domain  22 partners, 8 pharmaceutical companies, 3 biotechs  36 months project Guiding principle is open access, open usage, open source - Key to standards adoption -

7. What is the Structure of Vitamin K?

8. MeSH  A lipid cofactor that is required for normal blood clotting.  Several forms of vitamin K have been identified:  VITAMIN K 1 (phytomenadione) derived from plants,  VITAMIN K 2 (menaquinone) from bacteria, and synthetic naphthoquinone provitamins,  VITAMIN K 3 (menadione).

9. What is the Structure of Vitamin K1?

10.

11.

12. Create an Online “Resolver” as a path to chemistry  Search all forms of structure IDs  Systematic name(s)  Trivial Name(s)  SMILES  InChI Strings  InChIKeys  Database IDs  Registry Number

13. ChemSpider

14. Available Information…  Linked to vendors, safety data, toxicity, metabolism

15. Available Information….

16. Vitamin K1 Names

17. Vitamin K1 on ChemSpider CORRECT

18. Resolving Names for QUALITY  Searching chemical identifiers should resolve to the correct chemical as much as possible

19. Validated Name-Structure Dictionaries  Chemical name dictionaries are used for:  Text-mining (publications, patents)  Used to index PubMed and link to Google Patents  Linking to other databases – think Biology!  When structures are not available drug names link  Searching the web  Names link to structures link to InChIs

20. I want to know about “Vincristine”

21. Vincristine: Identifiers

22. Vincristine: Patents Linked by Name

23. Many Names, One Structure

24. Top 200 Drugs on Wikipedia http://en.wikipedia.org/wiki/List_of_bestselling_drugs

25. The Project Challenge PART ONE  Agree on the set of chemical names to work with  Independently create an SDF file in each “lab”  Compare differences and agree on final structures  Issue “Gold Standard” SDF file to team

26. RSC Process

27. Relative accuracy of groups against final master list

28. The Project Challenge PART TWO  Use Gold Standard SDF File to investigate data quality on these compounds in Internet Databases  Two checks  Search chemical name – does it return the correct compound. If not correct, how is it different?  Search “structure” – SMILES, Molfile, InChIString or InChIKey

29. “The First 10”

30. Performance on 150 Drug Names

31.

32. NPC Browser Set

33. Standardize  Use the SRS as a guidance document for standardization  Adjust as necessary to our needs

34. Nitro groups

35. Salt and Ionic Bonds

36. One dictionary look up is never enough…  ChemSpider does not contain all chemistry  We are not the only ones curating data  New chemistry expands daily and goes online

37. One dictionary look up is never enough…  Federation is key….  Check ChemSpider first, if not found then  Check PubChem  Check NCI resolver  Check ChEBI  Check ….the “network” of open interfaces  Each resolver will have its own “quantitative confidence”.

38. Chemical Identifier Resolver (CIR) Converts a given structure identifier into another representation or structure identifier. Resolve names, identifiers etc http://cactus.nci.nih.gov/chemical/structure

39. What can become a resolver?

40. We are building….  A central federated resolver utilizing available services  Dictionary lookups, systematic name conversions (multiple tools – ACD/Labs, Lexichem, OPSIN)  “Consensus” decisions and guidance BUT  Chemicals have timelines!!!

41. ORIGINAL FINAL

42. Thank you Email: williamsa@rsc.org Twitter: ChemConnector Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams

Chem spider as a chemical term resolver

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Chem spider as a chemical term resolver

Similar to Chem spider as a chemical term resolver (20)

Recently uploaded

Recently uploaded (20)

Chem spider as a chemical term resolver