SlideShare a Scribd company logo
1 of 49
Building a Data Repository to 
Manage Chemistry Research 
Data 
Antony Williams 
University of Connecticut 
10/15/2014
Chemistry for the Community 
• The Royal Society of Chemistry as a 
provider of chemistry for the community: 
• As a charity 
• As a scientific publisher 
• As a host of commercial databases 
• As a partner in grant-based projects 
• As the host of ChemSpider 
• And now in development : the RSC Data 
Repository for Chemistry
• ~30 million chemicals and growing 
• Data sourced from hundreds of sources 
• Crowd sourced curation and annotation 
• Ongoing deposition of data from our 
journals and our collaborators 
• Structure centric hub for web-searching 
• …and a really big dictionary!!!
ChemSpider
ChemSpider
Experimental/Predicted Properties
Literature references
Patents references
Books
Vendors and data sources
Aspirin on ChemSpider
Many Names, One Structure
Crowdsourced “Annotations” 
• Users can add 
• Descriptions, Syntheses and Commentaries 
• Links to PubMed articles 
• Links to articles via DOIs 
• Add spectral data 
• Add Crystallographic Information Files 
• Add photos 
• Add MP3 files 
• Add Videos
Crowdsourced Enhancement 
• The community can clean and enhance the 
database by providing Feedback and direct 
curation 
• Tens of thousands of edits made
Adding data to ChemSpider 
• Users can contribute data and use as a shared 
resource for research group
ChemSpider Spectra
ChemSpider SyntheticPages
Micropublishing with Peer Review 
(a chemical synthesis blog?)
Multi-Step Synthesis
Interactive Data
Creating a CSSP Page – ask 
Trevor or Christopher
Creating a CSSP Page
Creating a CSSP page
Searching CSSP 
• Reaction name or type 
• Reactants or products 
• Author
Search for “Leadbeater”
Search by structure
Anatomy of a Synthetic Page 
Reaction type Product 
Reaction 
scheme 
DOI for easy citation 
Chemicals used, 
including source 
(when important) 
Detailed procedure 
with no length limit 
View the structure 
of a compound – 
and search for it on 
other sites
Anatomy of a Synthetic Page 
Valuable tips 
and tricks 
Downloadable spectra and 
structures 
Leave feedback, 
ask questions, 
contribute your 
own experiences 
Analytical data 
(1H NMR at 
minimum)
What are we building now? 
• We are building the “RSC Data Repository” 
• Containers for compounds, reactions, analytical 
data, tabular data 
• Algorithms for data validation and standardization 
• Flexible indexing and search technologies 
• A platform for modeling data and hosting existing 
models and predictive algorithms
Deposition of Data
Compounds
Reactions
Analytical data
Crystallography data
Deposition of Data 
• Developing systems that provides 
feedback to users regarding data quality 
• Validate/standardize chemical compounds 
• Check for balanced reactions 
• Checks spectral data 
• EXAMPLE Future work 
• Properties – compare experimental to pred. 
• Automated structure verification - NMR
Can we get historical data? 
• Text and data can be mined 
• Spectra can be extracted and converted 
• SO MUCH Open Source Code available
Text Mining 
The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4- 
thiadiazol-5-yl)urea prepared in Example 6 , thionyl chloride 
( 5 ml ) and benzene ( 50 ml ) were charged into a glass 
reaction vessel equipped with a mechanical stirrer , 
thermometer and reflux condenser . 
The reaction mixture was heated at reflux with stirring , for a 
period of about one-half hour . 
After this time the benzene and unreacted thionyl chloride 
were stripped from the reaction mixture under reduced 
pressure to yield the desired product N-(β-chloroethyl)-N-methyl- 
N'-(2-trifluoromethyl-1,3,4-thiaidazol-5-yl)urea as a 
solid residue
Text Mining 
The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4- 
thiadiazol-5-yl)urea prepared in Example 6 , thionyl chloride 
( 5 ml ) and benzene ( 50 ml ) were charged into a glass 
reaction vessel equipped with a mechanical stirrer , 
thermometer and reflux condenser . 
The reaction mixture was heated at reflux with stirring , for a 
period of about one-half hour . 
After this time the benzene and unreacted thionyl chloride 
were stripped from the reaction mixture under reduced 
pressure to yield the desired product N-(β-chloroethyl)-N-methyl- 
N'-(2-trifluoromethyl-1,3,4-thiaidazol-5-yl)urea as a 
solid residue
Text spectra? 
13C NMR (CDCl3, 100 MHz): δ = 14.12 (CH3), 
30.11 (CH, benzylic methane), 30.77 (CH, 
benzylic methane), 66.12 (CH2), 68.49 (CH2), 
117.72, 118.19, 120.29, 122.67, 123.37, 125.69, 
125.84, 129.03, 130.00, 130.53 (ArCH), 99.42, 
123.60, 134.69, 139.23, 147.21, 147.61, 149.41, 
152.62, 154.88 (ArC)
1H NMR (CDCl3, 400 MHz): 
δ = 2.57 (m, 4H, Me, C(5a)H), 4.24 (d, 1H, J = 4.8 Hz, 
C(11b)H), 4.35 (t, 1H, Jb = 10.8 Hz, C(6)H), 4.47 (m, 2H, 
C(5)H), 4.57 (dd, 1H, J = 2.8 Hz, C(6)H), 6.95 (d, 1H, J = 
8.4 Hz, ArH), 7.18–7.94 (m, 11H, ArH)
Turn “Figures” Into Data
Make it interactive
SO MANY reactions!
Extracting our Archive 
• What could we get from our archive? 
• Find chemical names and generate structures 
• Find chemical images and generate structures 
• Find reactions 
• Find data (MP, BP, LogP) and deposit 
• Find figures and database them 
• Find spectra (and link to structures)
Models published from data
Text-mining Data to compare
The Future 
Internet Data 
Commercial Software 
Pre-competitive Data 
Open Science 
Open Data 
Publishers 
Educators 
Open Databases 
Chemical Vendors 
Small organic molecules 
Undefined materials 
Organometallics 
Nanomaterials 
Polymers 
Minerals 
Particle bound 
Links to Biologicals
A Global Chemistry Network 
• The Global Chemistry Network is much bigger 
than just data - scientific networking, 
micro/publishing, integration hub. 
• The data repository as a handler for data, GCN 
as a submission interface, GCN as a profile 
handler, rewards and recognition platform etc. 
• Data repository architecture designed to deliver 
the underpinning data containers and 
visualization widgets etc.
Thank you 
Email: williamsa@rsc.org 
ORCID: 0000-0002-2668-4821 
Twitter: @ChemConnector 
Personal Blog: www.chemconnector.com 
SLIDES: www.slideshare.net/AntonyWilliams

More Related Content

What's hot

Making solubility models with reaxy
Making solubility models with reaxyMaking solubility models with reaxy
Making solubility models with reaxyAnn-Marie Roche
 
Building a semantic chemistry platform with the royal society of chemistry
Building a semantic chemistry platform with the royal society of chemistryBuilding a semantic chemistry platform with the royal society of chemistry
Building a semantic chemistry platform with the royal society of chemistryValery Tkachenko
 
Open Access to Knowledge@ Novartis
Open Access to Knowledge@ NovartisOpen Access to Knowledge@ Novartis
Open Access to Knowledge@ NovartisAref Jdey
 
SciFinder and its utility in Drug discovery
SciFinder and its utility in Drug discoverySciFinder and its utility in Drug discovery
SciFinder and its utility in Drug discoveryAlichy Sowmya
 
2013 CrossRef Annual Meeting, How CrossRef has Accelerated Science and Its Pr...
2013 CrossRef Annual Meeting, How CrossRef has Accelerated Science and Its Pr...2013 CrossRef Annual Meeting, How CrossRef has Accelerated Science and Its Pr...
2013 CrossRef Annual Meeting, How CrossRef has Accelerated Science and Its Pr...Crossref
 

What's hot (19)

A chemistry data repository to serve them all
A chemistry data repository to serve them allA chemistry data repository to serve them all
A chemistry data repository to serve them all
 
Data Mining Dissertations and Adventures and Experiences in the World of Chem...
Data Mining Dissertations and Adventures and Experiences in the World of Chem...Data Mining Dissertations and Adventures and Experiences in the World of Chem...
Data Mining Dissertations and Adventures and Experiences in the World of Chem...
 
Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...
 
The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...
 
Data integration and building a profile for yourself as an online scientist
Data integration and building a profile for yourself as an online scientistData integration and building a profile for yourself as an online scientist
Data integration and building a profile for yourself as an online scientist
 
ChemSpider - building an online database of open spectra
ChemSpider - building an online database of open spectra ChemSpider - building an online database of open spectra
ChemSpider - building an online database of open spectra
 
Cheminformatics and the Structure Elucidation of Natural Products
Cheminformatics and the Structure Elucidation of Natural ProductsCheminformatics and the Structure Elucidation of Natural Products
Cheminformatics and the Structure Elucidation of Natural Products
 
Value of the mediawiki platform for providing content to the chemistry community
Value of the mediawiki platform for providing content to the chemistry communityValue of the mediawiki platform for providing content to the chemistry community
Value of the mediawiki platform for providing content to the chemistry community
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
 
Making solubility models with reaxy
Making solubility models with reaxyMaking solubility models with reaxy
Making solubility models with reaxy
 
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted AnalysisThe US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
 
Building a semantic chemistry platform with the royal society of chemistry
Building a semantic chemistry platform with the royal society of chemistryBuilding a semantic chemistry platform with the royal society of chemistry
Building a semantic chemistry platform with the royal society of chemistry
 
The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...
 
Open Access to Knowledge@ Novartis
Open Access to Knowledge@ NovartisOpen Access to Knowledge@ Novartis
Open Access to Knowledge@ Novartis
 
Web Crawling Chemistry
Web Crawling ChemistryWeb Crawling Chemistry
Web Crawling Chemistry
 
Why Chemistry and the Web Will Benefit from a ChemSpider
Why Chemistry and the Web Will Benefit from a ChemSpiderWhy Chemistry and the Web Will Benefit from a ChemSpider
Why Chemistry and the Web Will Benefit from a ChemSpider
 
Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...
 
SciFinder and its utility in Drug discovery
SciFinder and its utility in Drug discoverySciFinder and its utility in Drug discovery
SciFinder and its utility in Drug discovery
 
2013 CrossRef Annual Meeting, How CrossRef has Accelerated Science and Its Pr...
2013 CrossRef Annual Meeting, How CrossRef has Accelerated Science and Its Pr...2013 CrossRef Annual Meeting, How CrossRef has Accelerated Science and Its Pr...
2013 CrossRef Annual Meeting, How CrossRef has Accelerated Science and Its Pr...
 

Viewers also liked

Genetics power point
Genetics power pointGenetics power point
Genetics power pointoacore2
 
Brian Covello: Diabetes Research Presentation Semester 2
Brian Covello: Diabetes Research Presentation Semester 2Brian Covello: Diabetes Research Presentation Semester 2
Brian Covello: Diabetes Research Presentation Semester 2Brian Covello
 
Sigma Xi Showcase
Sigma Xi ShowcaseSigma Xi Showcase
Sigma Xi Showcasewrhsiang
 
REU Research Poster
REU Research PosterREU Research Poster
REU Research PosterKrista Chew
 
02 The Role of DNA in Protein Synthesis
02 The Role of DNA in Protein Synthesis02 The Role of DNA in Protein Synthesis
02 The Role of DNA in Protein SynthesisJaya Kumar
 
Why Coffee Protects Against Type II Diabetes
Why Coffee Protects Against Type II DiabetesWhy Coffee Protects Against Type II Diabetes
Why Coffee Protects Against Type II DiabetesBuyOrganicCoffee
 
Pnirs Poster Final June 10 2005
Pnirs Poster Final June 10 2005Pnirs Poster Final June 10 2005
Pnirs Poster Final June 10 2005jdecarli
 
Polymerase chain reaction (pcr)
Polymerase chain reaction (pcr)Polymerase chain reaction (pcr)
Polymerase chain reaction (pcr)Raju Bishnoi
 
polymerase chain reaction
polymerase chain reactionpolymerase chain reaction
polymerase chain reactionVipin Kannan
 
DNA replication, transcription, and translation
DNA replication, transcription, and translationDNA replication, transcription, and translation
DNA replication, transcription, and translationjun de la Ceruz
 
Obesity & adipokines
Obesity & adipokinesObesity & adipokines
Obesity & adipokinesRazavi Nader
 
Current Pharmacology & Toxicology Guidlines For Pharmaceutical Industry
Current Pharmacology & Toxicology Guidlines For Pharmaceutical IndustryCurrent Pharmacology & Toxicology Guidlines For Pharmaceutical Industry
Current Pharmacology & Toxicology Guidlines For Pharmaceutical IndustryProf. Dr. Basavaraj Nanjwade
 
BCIs and DNA Nanotechnology
BCIs and DNA NanotechnologyBCIs and DNA Nanotechnology
BCIs and DNA NanotechnologyMelanie Swan
 
Lecture on nucleic acid and proteins
Lecture on nucleic acid and proteinsLecture on nucleic acid and proteins
Lecture on nucleic acid and proteinsMarilen Parungao
 

Viewers also liked (20)

Genetics power point
Genetics power pointGenetics power point
Genetics power point
 
Brian Covello: Diabetes Research Presentation Semester 2
Brian Covello: Diabetes Research Presentation Semester 2Brian Covello: Diabetes Research Presentation Semester 2
Brian Covello: Diabetes Research Presentation Semester 2
 
ORS Abstract
ORS AbstractORS Abstract
ORS Abstract
 
Sigma Xi Showcase
Sigma Xi ShowcaseSigma Xi Showcase
Sigma Xi Showcase
 
JL Poster
JL PosterJL Poster
JL Poster
 
REU Research Poster
REU Research PosterREU Research Poster
REU Research Poster
 
02 The Role of DNA in Protein Synthesis
02 The Role of DNA in Protein Synthesis02 The Role of DNA in Protein Synthesis
02 The Role of DNA in Protein Synthesis
 
Why Coffee Protects Against Type II Diabetes
Why Coffee Protects Against Type II DiabetesWhy Coffee Protects Against Type II Diabetes
Why Coffee Protects Against Type II Diabetes
 
Pnirs Poster Final June 10 2005
Pnirs Poster Final June 10 2005Pnirs Poster Final June 10 2005
Pnirs Poster Final June 10 2005
 
Polymerase chain reaction (pcr)
Polymerase chain reaction (pcr)Polymerase chain reaction (pcr)
Polymerase chain reaction (pcr)
 
Genetics
GeneticsGenetics
Genetics
 
polymerase chain reaction
polymerase chain reactionpolymerase chain reaction
polymerase chain reaction
 
DNA replication, transcription, and translation
DNA replication, transcription, and translationDNA replication, transcription, and translation
DNA replication, transcription, and translation
 
Fenugreek final
Fenugreek finalFenugreek final
Fenugreek final
 
Obesity & adipokines
Obesity & adipokinesObesity & adipokines
Obesity & adipokines
 
Current Pharmacology & Toxicology Guidlines For Pharmaceutical Industry
Current Pharmacology & Toxicology Guidlines For Pharmaceutical IndustryCurrent Pharmacology & Toxicology Guidlines For Pharmaceutical Industry
Current Pharmacology & Toxicology Guidlines For Pharmaceutical Industry
 
BCIs and DNA Nanotechnology
BCIs and DNA NanotechnologyBCIs and DNA Nanotechnology
BCIs and DNA Nanotechnology
 
Pcr
Pcr Pcr
Pcr
 
Human genome
Human genomeHuman genome
Human genome
 
Lecture on nucleic acid and proteins
Lecture on nucleic acid and proteinsLecture on nucleic acid and proteins
Lecture on nucleic acid and proteins
 

Similar to Building a data repository to manage chemistry research data

How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...Ken Karapetyan
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineKen Karapetyan
 
Data enhancing the royal society of chemistry publication archive
Data enhancing the royal society of chemistry publication archiveData enhancing the royal society of chemistry publication archive
Data enhancing the royal society of chemistry publication archiveKen Karapetyan
 
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryDr. Haxel Consult
 
A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...
A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...
A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...guest01a117
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectKen Karapetyan
 
The application of cloud computing to royal society of chemistry data platforms
The application of cloud computing to royal society of chemistry data platformsThe application of cloud computing to royal society of chemistry data platforms
The application of cloud computing to royal society of chemistry data platformsValery Tkachenko
 

Similar to Building a data repository to manage chemistry research data (20)

Current initiatives in developing research data repositories at the Royal Soc...
Current initiatives in developing research data repositories at the Royal Soc...Current initiatives in developing research data repositories at the Royal Soc...
Current initiatives in developing research data repositories at the Royal Soc...
 
How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...
 
The importance of the InChI identifier as a foundation technology for eScienc...
The importance of the InChI identifier as a foundation technology for eScienc...The importance of the InChI identifier as a foundation technology for eScienc...
The importance of the InChI identifier as a foundation technology for eScienc...
 
Using online chemistry databases to facilitate structure identification in ma...
Using online chemistry databases to facilitate structure identification in ma...Using online chemistry databases to facilitate structure identification in ma...
Using online chemistry databases to facilitate structure identification in ma...
 
ChemSpider as an integration hub for interlinked chemistry data
ChemSpider as an integration hub for interlinked chemistry dataChemSpider as an integration hub for interlinked chemistry data
ChemSpider as an integration hub for interlinked chemistry data
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
 
Hosting public domain chemicals data online for the community – the challenge...
Hosting public domain chemicals data online for the community – the challenge...Hosting public domain chemicals data online for the community – the challenge...
Hosting public domain chemicals data online for the community – the challenge...
 
Data enhancing the royal society of chemistry publication archive
Data enhancing the royal society of chemistry publication archiveData enhancing the royal society of chemistry publication archive
Data enhancing the royal society of chemistry publication archive
 
Data enhancing the royal society of chemistry publication archive
Data enhancing the royal society of chemistry publication archiveData enhancing the royal society of chemistry publication archive
Data enhancing the royal society of chemistry publication archive
 
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
 
Ontology work at the Royal Society of Chemistry
Ontology work at the Royal Society of ChemistryOntology work at the Royal Society of Chemistry
Ontology work at the Royal Society of Chemistry
 
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
 
Our dire need to mandate data standards and expectations for scientific publi...
Our dire need to mandate data standards and expectations for scientific publi...Our dire need to mandate data standards and expectations for scientific publi...
Our dire need to mandate data standards and expectations for scientific publi...
 
A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...
A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...
A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
 
Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data Dashboards Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data Dashboards
 
The application of cloud computing to royal society of chemistry data platforms
The application of cloud computing to royal society of chemistry data platformsThe application of cloud computing to royal society of chemistry data platforms
The application of cloud computing to royal society of chemistry data platforms
 
Apps and approaches to mobilizing chemistry from the Royal Society of Chemistry
Apps and approaches to mobilizing chemistry from the Royal Society of ChemistryApps and approaches to mobilizing chemistry from the Royal Society of Chemistry
Apps and approaches to mobilizing chemistry from the Royal Society of Chemistry
 
US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...
US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...
US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...
 

Recently uploaded

Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfWadeK3
 

Recently uploaded (20)

Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
 

Building a data repository to manage chemistry research data

  • 1. Building a Data Repository to Manage Chemistry Research Data Antony Williams University of Connecticut 10/15/2014
  • 2. Chemistry for the Community • The Royal Society of Chemistry as a provider of chemistry for the community: • As a charity • As a scientific publisher • As a host of commercial databases • As a partner in grant-based projects • As the host of ChemSpider • And now in development : the RSC Data Repository for Chemistry
  • 3. • ~30 million chemicals and growing • Data sourced from hundreds of sources • Crowd sourced curation and annotation • Ongoing deposition of data from our journals and our collaborators • Structure centric hub for web-searching • …and a really big dictionary!!!
  • 10. Vendors and data sources
  • 12. Many Names, One Structure
  • 13. Crowdsourced “Annotations” • Users can add • Descriptions, Syntheses and Commentaries • Links to PubMed articles • Links to articles via DOIs • Add spectral data • Add Crystallographic Information Files • Add photos • Add MP3 files • Add Videos
  • 14. Crowdsourced Enhancement • The community can clean and enhance the database by providing Feedback and direct curation • Tens of thousands of edits made
  • 15. Adding data to ChemSpider • Users can contribute data and use as a shared resource for research group
  • 18. Micropublishing with Peer Review (a chemical synthesis blog?)
  • 21. Creating a CSSP Page – ask Trevor or Christopher
  • 24. Searching CSSP • Reaction name or type • Reactants or products • Author
  • 27. Anatomy of a Synthetic Page Reaction type Product Reaction scheme DOI for easy citation Chemicals used, including source (when important) Detailed procedure with no length limit View the structure of a compound – and search for it on other sites
  • 28. Anatomy of a Synthetic Page Valuable tips and tricks Downloadable spectra and structures Leave feedback, ask questions, contribute your own experiences Analytical data (1H NMR at minimum)
  • 29. What are we building now? • We are building the “RSC Data Repository” • Containers for compounds, reactions, analytical data, tabular data • Algorithms for data validation and standardization • Flexible indexing and search technologies • A platform for modeling data and hosting existing models and predictive algorithms
  • 35. Deposition of Data • Developing systems that provides feedback to users regarding data quality • Validate/standardize chemical compounds • Check for balanced reactions • Checks spectral data • EXAMPLE Future work • Properties – compare experimental to pred. • Automated structure verification - NMR
  • 36. Can we get historical data? • Text and data can be mined • Spectra can be extracted and converted • SO MUCH Open Source Code available
  • 37. Text Mining The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4- thiadiazol-5-yl)urea prepared in Example 6 , thionyl chloride ( 5 ml ) and benzene ( 50 ml ) were charged into a glass reaction vessel equipped with a mechanical stirrer , thermometer and reflux condenser . The reaction mixture was heated at reflux with stirring , for a period of about one-half hour . After this time the benzene and unreacted thionyl chloride were stripped from the reaction mixture under reduced pressure to yield the desired product N-(β-chloroethyl)-N-methyl- N'-(2-trifluoromethyl-1,3,4-thiaidazol-5-yl)urea as a solid residue
  • 38. Text Mining The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4- thiadiazol-5-yl)urea prepared in Example 6 , thionyl chloride ( 5 ml ) and benzene ( 50 ml ) were charged into a glass reaction vessel equipped with a mechanical stirrer , thermometer and reflux condenser . The reaction mixture was heated at reflux with stirring , for a period of about one-half hour . After this time the benzene and unreacted thionyl chloride were stripped from the reaction mixture under reduced pressure to yield the desired product N-(β-chloroethyl)-N-methyl- N'-(2-trifluoromethyl-1,3,4-thiaidazol-5-yl)urea as a solid residue
  • 39. Text spectra? 13C NMR (CDCl3, 100 MHz): δ = 14.12 (CH3), 30.11 (CH, benzylic methane), 30.77 (CH, benzylic methane), 66.12 (CH2), 68.49 (CH2), 117.72, 118.19, 120.29, 122.67, 123.37, 125.69, 125.84, 129.03, 130.00, 130.53 (ArCH), 99.42, 123.60, 134.69, 139.23, 147.21, 147.61, 149.41, 152.62, 154.88 (ArC)
  • 40. 1H NMR (CDCl3, 400 MHz): δ = 2.57 (m, 4H, Me, C(5a)H), 4.24 (d, 1H, J = 4.8 Hz, C(11b)H), 4.35 (t, 1H, Jb = 10.8 Hz, C(6)H), 4.47 (m, 2H, C(5)H), 4.57 (dd, 1H, J = 2.8 Hz, C(6)H), 6.95 (d, 1H, J = 8.4 Hz, ArH), 7.18–7.94 (m, 11H, ArH)
  • 44. Extracting our Archive • What could we get from our archive? • Find chemical names and generate structures • Find chemical images and generate structures • Find reactions • Find data (MP, BP, LogP) and deposit • Find figures and database them • Find spectra (and link to structures)
  • 47. The Future Internet Data Commercial Software Pre-competitive Data Open Science Open Data Publishers Educators Open Databases Chemical Vendors Small organic molecules Undefined materials Organometallics Nanomaterials Polymers Minerals Particle bound Links to Biologicals
  • 48. A Global Chemistry Network • The Global Chemistry Network is much bigger than just data - scientific networking, micro/publishing, integration hub. • The data repository as a handler for data, GCN as a submission interface, GCN as a profile handler, rewards and recognition platform etc. • Data repository architecture designed to deliver the underpinning data containers and visualization widgets etc.
  • 49. Thank you Email: williamsa@rsc.org ORCID: 0000-0002-2668-4821 Twitter: @ChemConnector Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams

Editor's Notes

  1. Search with a chemical name, common name, tradename or other identifier.
  2. There are several structure drawing applets available.