SlideShare a Scribd company logo
ChemSpider Reactions:
Delivering a free community
resource of chemical syntheses
Valery Tkachenko, Colin Batchelor, Daniel Lowe, Ken
Karapetyan, David Sharpe and Antony Williams
ACS New Orleans April 2013
Overview
• Motivation
• The RSC and chemical reaction data
• New sources of chemical reaction data
• ChemSpider Reactions: bringing it all together
• Experiments with reaction classification
• The National Chemical Database Service
Who needs another reaction
database?
• Those who cannot afford to license access…
• Those who would like to access data that is
not abstracted
• Those who might like to contribute data to a
database
• Anybody wanting to integrate their systems in
and to pull data out.
RSC and chemical reaction data 1
Graphical abstracting journals:
Methods in Organic Synthesis (monthly, 1990 to present)
Catalysts and Catalysed Reactions (monthly, 2005 to
present)
These constitute a backfile of over 50000 novel reactions
RSC and chemical reaction data 2
RSC and chemical reaction data 3
New sources of reaction data
Daniel Lowe’s PhD thesis (Cantab, 2012) was on
extracting reactions from US patent data.
We can apply this technology to the RSC Journal
archive.
ChemSpider Reactions
bringing it all together
http://csr.dev.rsc-us.org/
WORK IN PROGRESS
Reaction classification 1
Project Prospect has text-mined RSC journal
articles for named reactions and molecular
processes, annotated according to Creative
Commons-licensed ontologies:
See http://rxno.googlecode.com/
Reaction classification 2
Classification of Daniel’s US
Patent data
Reaction InChI
To do for reactions what InChI has done for
structures
•Think online searching
•Deduplication and linking
http://www-rinchi.ch.cam.ac.uk/help.html
Reaction InChI
Early work – RInChIs layered on to a few
hundred thousand reactions
•Not generated for a few 10s of thousands of
reactions
•Reaction deduplication results differ based on
algorithm – GGA software versus RInChI
•Under investigation
Other sources
ChemSpider SyntheticPages
•Electronic Lab Notebooks
•University repositories
Please send theses
What will ChemSpider Reactions serve?
• Chemical Database Service
• Linking back to original
publications/supplementary data
• Underpinning other tools e.g. retrosynthetic
analysis (depends on data quality and
mapping)
Chemical Database
Service
National Chemical Database Service
for UK academics
Integrates commercial databases and
services
Chemicals, analytical data, prediction
algorithms
Development of data repository
ARChem from SimBioSys 1
Synthesis planning tool which performs rule-
and precedent-based retrosynthetic analysis
back to commercially available starting
materials.
ARChem from SimBiosys 2
ARChem from SimBioSys 3
But what about data quality?
• Data validation and curation
required
• Encouraging participation with
Rewards and RECOGNITION
Manual curation
• Integrated commenting, curating and validation
platform across ALL eScience and Publishing
platforms
• All integrated to a central RSC profile and
feeding the alt-metrics tools
The other kind of RDF
(made-up example)
Chemical reactions are unusually well-suited to representation. (Donald
Davidson’s event semantics)
_:r1 a obo:RXNO_0000004 ; # Diels–Alder
obo:has_participant_ceasing_to_exist _:m1 ;
# a diene
obo:has_participant_ceasing_to_exist _:m2 ;
# an olefin
obo:has_participant_starting_to_exist _:m3 .
# a substituted cyclohexene
_:m1 a <http://rdf.chemspider.com/233000> .
_:m2 a <http://rdf.chemspider.com/233001> .
_:m3 a <http://rdf.chemspider.com/233002> .
Questions?
E-mail: tkachenkov@rsc.org, batchelorc@rsc.org

More Related Content

What's hot

Sharing chemical structures with peer reviewed publications
Sharing chemical structures with peer reviewed publications Sharing chemical structures with peer reviewed publications
Sharing chemical structures with peer reviewed publications
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...
US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...
US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
New developments in delivering public access to data from the National Center...
New developments in delivering public access to data from the National Center...New developments in delivering public access to data from the National Center...
New developments in delivering public access to data from the National Center...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
The influence of data curation on QSAR Modeling – examining issues of qualit...
 The influence of data curation on QSAR Modeling – examining issues of qualit... The influence of data curation on QSAR Modeling – examining issues of qualit...
The influence of data curation on QSAR Modeling – examining issues of qualit...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 

What's hot (20)

ACRL Trust in Science Talk
ACRL Trust in Science TalkACRL Trust in Science Talk
ACRL Trust in Science Talk
 
Sharing chemical structures with peer reviewed publications
Sharing chemical structures with peer reviewed publications Sharing chemical structures with peer reviewed publications
Sharing chemical structures with peer reviewed publications
 
Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...
 
US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...
US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...
US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...
 
Open Notebook Science Web Services - ACS Spring 2011
Open Notebook Science Web Services - ACS Spring 2011Open Notebook Science Web Services - ACS Spring 2011
Open Notebook Science Web Services - ACS Spring 2011
 
MiniSymp2011 Bradley
MiniSymp2011 BradleyMiniSymp2011 Bradley
MiniSymp2011 Bradley
 
ChemInfo 2011 class1
ChemInfo 2011 class1ChemInfo 2011 class1
ChemInfo 2011 class1
 
The Role of Trust in Science at SLA 2011
The Role of Trust in Science at SLA 2011The Role of Trust in Science at SLA 2011
The Role of Trust in Science at SLA 2011
 
Bradley Opal 2011
Bradley Opal 2011Bradley Opal 2011
Bradley Opal 2011
 
The application of cloud computing to royal society of chemistry data platforms
The application of cloud computing to royal society of chemistry data platformsThe application of cloud computing to royal society of chemistry data platforms
The application of cloud computing to royal society of chemistry data platforms
 
The collection, curation and modeling of Open Melting Point measurements
The collection, curation and modeling of Open Melting Point measurementsThe collection, curation and modeling of Open Melting Point measurements
The collection, curation and modeling of Open Melting Point measurements
 
Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...
 
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...
 
The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...
 
New developments in delivering public access to data from the National Center...
New developments in delivering public access to data from the National Center...New developments in delivering public access to data from the National Center...
New developments in delivering public access to data from the National Center...
 
NBCC Open Notebook Science Talk
NBCC Open Notebook Science TalkNBCC Open Notebook Science Talk
NBCC Open Notebook Science Talk
 
Overview of open resources to support automated structure verification and e...
Overview of open resources to support automated structure verification  and e...Overview of open resources to support automated structure verification  and e...
Overview of open resources to support automated structure verification and e...
 
Royal Society of Chemistry open source cheminformatics platforms and libraries
Royal Society of Chemistry open source cheminformatics platforms and librariesRoyal Society of Chemistry open source cheminformatics platforms and libraries
Royal Society of Chemistry open source cheminformatics platforms and libraries
 
How ACDLabs Software Tools are used by the Royal Society of Chemistry
How ACDLabs Software Tools are used by the Royal Society of ChemistryHow ACDLabs Software Tools are used by the Royal Society of Chemistry
How ACDLabs Software Tools are used by the Royal Society of Chemistry
 
The influence of data curation on QSAR Modeling – examining issues of qualit...
 The influence of data curation on QSAR Modeling – examining issues of qualit... The influence of data curation on QSAR Modeling – examining issues of qualit...
The influence of data curation on QSAR Modeling – examining issues of qualit...
 

Similar to ChemSpider reactions – delivering a free community resource of chemical syntheses

Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
Dr. Haxel Consult
 
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Big data challenges associated with building a national data repository for c...
Big data challenges associated with building a national data repository for c...Big data challenges associated with building a national data repository for c...
Big data challenges associated with building a national data repository for c...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Delivering web-based access to data and algorithms to support computational t...
Delivering web-based access to data and algorithms to support computational t...Delivering web-based access to data and algorithms to support computational t...
Delivering web-based access to data and algorithms to support computational t...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
eScience at the Royal Society of Chemistry and our current initiatives
eScience at the Royal Society of Chemistry and our current initiativeseScience at the Royal Society of Chemistry and our current initiatives
eScience at the Royal Society of Chemistry and our current initiatives
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Chemistry Validation and Standardization Platform v2.0
Chemistry Validation and Standardization Platform v2.0Chemistry Validation and Standardization Platform v2.0
Chemistry Validation and Standardization Platform v2.0
Valery Tkachenko
 
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental scienceUS-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
Ken Karapetyan
 
Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 

Similar to ChemSpider reactions – delivering a free community resource of chemical syntheses (20)

Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...
 
A chemistry data repository to serve them all
A chemistry data repository to serve them allA chemistry data repository to serve them all
A chemistry data repository to serve them all
 
ChemSpider as an integration hub for interlinked chemistry data
ChemSpider as an integration hub for interlinked chemistry dataChemSpider as an integration hub for interlinked chemistry data
ChemSpider as an integration hub for interlinked chemistry data
 
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
 
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
 
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
 
Big data challenges associated with building a national data repository for c...
Big data challenges associated with building a national data repository for c...Big data challenges associated with building a national data repository for c...
Big data challenges associated with building a national data repository for c...
 
Reaxys rmc unified platform_ webinar_
Reaxys rmc unified platform_ webinar_Reaxys rmc unified platform_ webinar_
Reaxys rmc unified platform_ webinar_
 
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
 
Delivering web-based access to data and algorithms to support computational t...
Delivering web-based access to data and algorithms to support computational t...Delivering web-based access to data and algorithms to support computational t...
Delivering web-based access to data and algorithms to support computational t...
 
eScience at the Royal Society of Chemistry and our current initiatives
eScience at the Royal Society of Chemistry and our current initiativeseScience at the Royal Society of Chemistry and our current initiatives
eScience at the Royal Society of Chemistry and our current initiatives
 
Chemistry Validation and Standardization Platform v2.0
Chemistry Validation and Standardization Platform v2.0Chemistry Validation and Standardization Platform v2.0
Chemistry Validation and Standardization Platform v2.0
 
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental scienceUS-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
 
Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...
 
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
 
Data drivenapproach to medicinalchemistry
Data drivenapproach to medicinalchemistryData drivenapproach to medicinalchemistry
Data drivenapproach to medicinalchemistry
 
The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...
 
Building a semantic chemistry platform with the royal society of chemistry
Building a semantic chemistry platform with the royal society of chemistryBuilding a semantic chemistry platform with the royal society of chemistry
Building a semantic chemistry platform with the royal society of chemistry
 

More from Ken Karapetyan

Acs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvspAcs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvsp
Ken Karapetyan
 

More from Ken Karapetyan (13)

The RSC chemical validation and standardization platform, a potential path to...
The RSC chemical validation and standardization platform, a potential path to...The RSC chemical validation and standardization platform, a potential path to...
The RSC chemical validation and standardization platform, a potential path to...
 
Digitally enabling the RSC archive
Digitally enabling the RSC archiveDigitally enabling the RSC archive
Digitally enabling the RSC archive
 
Building support for the semantic web for chemistry at the Royal Society of C...
Building support for the semantic web for chemistry at the Royal Society of C...Building support for the semantic web for chemistry at the Royal Society of C...
Building support for the semantic web for chemistry at the Royal Society of C...
 
Royal society of chemistry developments to support open drug discovery
Royal society of chemistry developments to support open drug discoveryRoyal society of chemistry developments to support open drug discovery
Royal society of chemistry developments to support open drug discovery
 
Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...
 
Data enhancing the royal society of chemistry publication archive
Data enhancing the royal society of chemistry publication archiveData enhancing the royal society of chemistry publication archive
Data enhancing the royal society of chemistry publication archive
 
Applying Royal Society of Chemistry cheminformatics skills to support the Pha...
Applying Royal Society of Chemistry cheminformatics skills to support the Pha...Applying Royal Society of Chemistry cheminformatics skills to support the Pha...
Applying Royal Society of Chemistry cheminformatics skills to support the Pha...
 
How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
 
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...Standardization and Generation of Parents for Open PHACTS Chemical Registry S...
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...
 
SERMACS 2012
SERMACS 2012SERMACS 2012
SERMACS 2012
 
Acs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvspAcs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvsp
 
Data model
Data modelData model
Data model
 

Recently uploaded

FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable Predictions
Michel Dumontier
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 
Anemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditionsAnemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditions
muralinath2
 

Recently uploaded (20)

Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
 
FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable Predictions
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
 
Transport in plants G1.pptx Cambridge IGCSE
Transport in plants G1.pptx Cambridge IGCSETransport in plants G1.pptx Cambridge IGCSE
Transport in plants G1.pptx Cambridge IGCSE
 
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
 
Anemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditionsAnemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditions
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 

ChemSpider reactions – delivering a free community resource of chemical syntheses

  • 1. ChemSpider Reactions: Delivering a free community resource of chemical syntheses Valery Tkachenko, Colin Batchelor, Daniel Lowe, Ken Karapetyan, David Sharpe and Antony Williams ACS New Orleans April 2013
  • 2. Overview • Motivation • The RSC and chemical reaction data • New sources of chemical reaction data • ChemSpider Reactions: bringing it all together • Experiments with reaction classification • The National Chemical Database Service
  • 3. Who needs another reaction database? • Those who cannot afford to license access… • Those who would like to access data that is not abstracted • Those who might like to contribute data to a database • Anybody wanting to integrate their systems in and to pull data out.
  • 4. RSC and chemical reaction data 1 Graphical abstracting journals: Methods in Organic Synthesis (monthly, 1990 to present) Catalysts and Catalysed Reactions (monthly, 2005 to present) These constitute a backfile of over 50000 novel reactions
  • 5. RSC and chemical reaction data 2
  • 6. RSC and chemical reaction data 3
  • 7. New sources of reaction data Daniel Lowe’s PhD thesis (Cantab, 2012) was on extracting reactions from US patent data. We can apply this technology to the RSC Journal archive.
  • 8. ChemSpider Reactions bringing it all together http://csr.dev.rsc-us.org/ WORK IN PROGRESS
  • 9. Reaction classification 1 Project Prospect has text-mined RSC journal articles for named reactions and molecular processes, annotated according to Creative Commons-licensed ontologies: See http://rxno.googlecode.com/
  • 10. Reaction classification 2 Classification of Daniel’s US Patent data
  • 11. Reaction InChI To do for reactions what InChI has done for structures •Think online searching •Deduplication and linking http://www-rinchi.ch.cam.ac.uk/help.html
  • 12. Reaction InChI Early work – RInChIs layered on to a few hundred thousand reactions •Not generated for a few 10s of thousands of reactions •Reaction deduplication results differ based on algorithm – GGA software versus RInChI •Under investigation
  • 13. Other sources ChemSpider SyntheticPages •Electronic Lab Notebooks •University repositories Please send theses
  • 14. What will ChemSpider Reactions serve? • Chemical Database Service • Linking back to original publications/supplementary data • Underpinning other tools e.g. retrosynthetic analysis (depends on data quality and mapping)
  • 15. Chemical Database Service National Chemical Database Service for UK academics Integrates commercial databases and services Chemicals, analytical data, prediction algorithms Development of data repository
  • 16. ARChem from SimBioSys 1 Synthesis planning tool which performs rule- and precedent-based retrosynthetic analysis back to commercially available starting materials.
  • 19. But what about data quality? • Data validation and curation required • Encouraging participation with Rewards and RECOGNITION
  • 20. Manual curation • Integrated commenting, curating and validation platform across ALL eScience and Publishing platforms • All integrated to a central RSC profile and feeding the alt-metrics tools
  • 21. The other kind of RDF (made-up example) Chemical reactions are unusually well-suited to representation. (Donald Davidson’s event semantics) _:r1 a obo:RXNO_0000004 ; # Diels–Alder obo:has_participant_ceasing_to_exist _:m1 ; # a diene obo:has_participant_ceasing_to_exist _:m2 ; # an olefin obo:has_participant_starting_to_exist _:m3 . # a substituted cyclohexene _:m1 a <http://rdf.chemspider.com/233000> . _:m2 a <http://rdf.chemspider.com/233001> . _:m3 a <http://rdf.chemspider.com/233002> .