A Standard Data Format for Computational Chemistry: CSX

Stuart Chalk
Stuart ChalkAssociate Professor of Chemistry at University of North Florida
A Standard Data Format for
Computational Chemistry: CSX
Stuart J. Chalk1,2, Neil Ostlund1, Mirek Sopek1, Bing Wang1
1) Chemical Semantics Inc., Gainesville FL
2) Department of Chemistry, University of North Florida
schalk@unf.edu
249th ACS Meeting, Denver, CO – March 2015
 Semantic Annotation of Data
 Current DOE Project
 Data Transformations
 Common Standard for eXchange (CSX)
 CSX a Standard Data Format
 The CSX Schema
 CSX - Publishing Information
 CSX - Molecular System Information
 CSX - Calculated Result Information
 Future Plans
 Conclusion
Outline
 Create a way to ‘teach’ computers what information
means – contextualize the data
 Example
 What is this? 904-620-1938
 A computer just sees it as…
 … a string
 By using an appropriate semantic definition in RDF (the
Resource Description Framework) we can identify to the
computer that the text is a phone number (using the
Friend of a Friend (FOAF) specification), i.e.
Semantic Annotation of Data
RDF Specification http://www.w3.org/RDF/
FOAF Specification http://xmlns.com/foaf/spec/
<foaf:phone rdf:datatype=“#string">904-620-1938</foaf:phone>
 RDF can be use to relate information as well as
annotate it
 The following RDF/XML shows how some information is
related (XML is the eXtensible Markup Language)
 Applying this technology to computational chemistry
calculations will allow integration of the calculation and
results with data about chemicals from other sources
Semantic Annotation of Data
<rdf:Description rdf:about=http://example.org/StuartChalk>
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>
<foaf:knows rdf:resource="http://example.org/NeilOstlund"/>
<foaf:phone rdf:datatype=”…#string”>904-620-
1938</foaf:phone>
</rdf:Description>
 Chemical Semantics is funded by DOE to create a web
portal to collect, organize and make searchable the
results output from computational chemistry (CC)
calculations
 This will be freely available and will accept output
from all CC software packages
 The intent is to capture calculation results and…
 Software used to calculate the results
 Input parameters used in the calculation
 Methodology by which the calculation was done
 Details of the molecular system studied
DOE SBIR Grant
 The approach Chemical Semantics is taking is to
1. Add code to software packages to generate an XML file
alongside the normal output file –OR–
Parse an existing output file (using a free application) and
generate XML file
2. Send the XML file into the web portal
3. Convert the XML file into RDF into turtle format (TTL)
4. Finally, ingest TTL into a triplestore (Virtuoso)
 All the data in Virtuoso can then be search using SPARQL
(SPARQL Protocol and RDF Query Language)
Data Transformations
Virtuoso http://virtuoso.openlinksw.com/
SPARQL http://www.w3.org/TR/sparql11-query/
 Why XML?
 Human readable (plain text - UTF-8)
 Platform neutral
 Archivable
 Validatable
 Why not use CML?
 Inability to represent complex structures e.g. residues
 No standard way to add CC results
Intermediate XML File
 A CSX file is a text based file written in XML
 It is a structured data container design to hold CC
result data and additional metadata
 Version 0.x was developed by Neil Ostlund
 Version 1.0 is the current stable release developed as
part of Phase 1 of the SBIR grant (limited scope)
 Version 2.0 is currently under development as part of
Phase 2 of the SBIR grant
Common Standard for eXchange (CSX)
 It is well know that the formats in which data is
reported in CC output files is:
 Highly variable (software specific)
 Sometimes difficult to interpret
 Standardization would:
 Allow data from different packages to be more easily
compared
 Open up opportunities for software development to
display and reuse data for different applications
 This mirrors movement in the CC community toward a
common driver base for CC software packages
CSX as a Standard Data Format
 In order to describe the layout and allowed names of
elements and attributes, and values for both, a schema
document is available for the CSX specification
 This can be used to help new users write valid CSX files
(using XML editing applications such as XML Spy and
oxygenXML) and…
 … validate existing CSX files using any of a number of
XML validators (e.g. Xerces) …
 … and understand the structure of the data especially
for less frequently calculated results
The CSX Schema
CSX Schema v1.0
CSX Schema v1.0
CSX Schema v1.0
CSX Schema v1.0
CSX – Publication Information
CSX – Molecular System Information
CSX – Calculated Result Information
 Work on CSX 2.0 is ongoing – expand to multiple systems
and sets of calculated results
 Develop CSX focused website with converter
functionality, libraries, and documentation
 Engage CC software users/programmers to get involved
with the project
 Organize a community developer workshop over
summer 2015
 Publish version 2.0 of CSX in Fall 2015
Future Plans
 CSX started out as a stepping stone to transfer
information to the CS portal
 Having a data standard for CC is an important
development in of itself
 The CC community can do more with their data
 Leverage XML tools to visualize, process etc…
 Compare results across CC packages
 Validate results
 Reference basis sets (https://bse.pnl.gov/)
Conclusion
 schalk@unf.edu
 Phone: 904-620-1938
 Skype: stuartchalk
 LinkedIn/Slidehare: https://www.linkedin.com/in/stuchalk
 ORCID: http://orcid.org/0000-0002-0703-7776
 ResearcherID: http://www.researcherid.com/rid/D-8577-2013
Questions?
1 of 20

Recommended

RDF-Gen: Generating RDF from streaming and archival data by
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataGiorgos Santipantakis
70 views22 slides
Technical Background by
Technical BackgroundTechnical Background
Technical BackgroundNikolaos Konstantinou
2.5K views120 slides
An Approach for the Incremental Export of Relational Databases into RDF Graphs by
An Approach for the Incremental Export of Relational Databases into RDF GraphsAn Approach for the Incremental Export of Relational Databases into RDF Graphs
An Approach for the Incremental Export of Relational Databases into RDF GraphsNikolaos Konstantinou
845 views34 slides
Bionimbus - An Overview (2010-v6) by
Bionimbus - An Overview (2010-v6)Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)Robert Grossman
872 views25 slides
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert... by
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...Robert Grossman
991 views21 slides
A Look into the Apache OODT Ecosystem by
A Look into the Apache OODT EcosystemA Look into the Apache OODT Ecosystem
A Look into the Apache OODT EcosystemChris Mattmann
4.3K views41 slides

More Related Content

What's hot

Efficient Record De-Duplication Identifying Using Febrl Framework by
Efficient Record De-Duplication Identifying Using Febrl FrameworkEfficient Record De-Duplication Identifying Using Febrl Framework
Efficient Record De-Duplication Identifying Using Febrl FrameworkIOSR Journals
368 views6 slides
Thesis presentation by
Thesis presentationThesis presentation
Thesis presentationConcordia university
1.2K views47 slides
STAT Requirement Analysis by
STAT Requirement AnalysisSTAT Requirement Analysis
STAT Requirement Analysisstat
391 views28 slides
Towards an Incremental Schema-level Index for Distributed Linked Open Data G... by
Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...Till Blume
185 views17 slides
Grid1 by
Grid1Grid1
Grid1Sonia Sharma
653 views58 slides
Effiziente Verarbeitung von grossen Datenmengen by
Effiziente Verarbeitung von grossen DatenmengenEffiziente Verarbeitung von grossen Datenmengen
Effiziente Verarbeitung von grossen DatenmengenFlorian Stegmaier
848 views39 slides

What's hot(20)

Efficient Record De-Duplication Identifying Using Febrl Framework by IOSR Journals
Efficient Record De-Duplication Identifying Using Febrl FrameworkEfficient Record De-Duplication Identifying Using Febrl Framework
Efficient Record De-Duplication Identifying Using Febrl Framework
IOSR Journals368 views
STAT Requirement Analysis by stat
STAT Requirement AnalysisSTAT Requirement Analysis
STAT Requirement Analysis
stat391 views
Towards an Incremental Schema-level Index for Distributed Linked Open Data G... by Till Blume
Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
Till Blume185 views
Effiziente Verarbeitung von grossen Datenmengen by Florian Stegmaier
Effiziente Verarbeitung von grossen DatenmengenEffiziente Verarbeitung von grossen Datenmengen
Effiziente Verarbeitung von grossen Datenmengen
Florian Stegmaier848 views
Linked Open Data and DANS by vty
Linked Open Data and DANSLinked Open Data and DANS
Linked Open Data and DANS
vty559 views
DataverseNL as structured data hub by vty
DataverseNL as structured data hubDataverseNL as structured data hub
DataverseNL as structured data hub
vty199 views
Linked Data Notifications Distributed Update Notification and Propagation on ... by Aksw Group
Linked Data Notifications Distributed Update Notification and Propagation on ...Linked Data Notifications Distributed Update Notification and Propagation on ...
Linked Data Notifications Distributed Update Notification and Propagation on ...
Aksw Group214 views
Effective Data Retrieval in XML using TreeMatch Algorithm by IRJET Journal
Effective Data Retrieval in XML using TreeMatch AlgorithmEffective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch Algorithm
IRJET Journal118 views
Analyzing the Evolution of Vocabulary Terms and Their Impact on the LOD Cloud by MOVING Project
Analyzing the Evolution of Vocabulary Terms and Their Impact on the LOD CloudAnalyzing the Evolution of Vocabulary Terms and Their Impact on the LOD Cloud
Analyzing the Evolution of Vocabulary Terms and Their Impact on the LOD Cloud
MOVING Project254 views
The FLuID Meta Model: Incrementally Compute Schema-level Indices for the Web... by Till Blume
The FLuID Meta Model: Incrementally Compute  Schema-level Indices for the Web...The FLuID Meta Model: Incrementally Compute  Schema-level Indices for the Web...
The FLuID Meta Model: Incrementally Compute Schema-level Indices for the Web...
Till Blume141 views
ASP.NET Session 7 by Sisir Ghosh
ASP.NET Session 7ASP.NET Session 7
ASP.NET Session 7
Sisir Ghosh564 views
Using Free and Open Source GIS to Automatically Create Standards-Based Spatia... by Patrick Rickles
Using Free and Open Source GIS to Automatically Create Standards-Based Spatia...Using Free and Open Source GIS to Automatically Create Standards-Based Spatia...
Using Free and Open Source GIS to Automatically Create Standards-Based Spatia...
Patrick Rickles646 views

Viewers also liked

An Ontology for K-12 Education and the NIEM by
An Ontology for K-12 Education and the NIEMAn Ontology for K-12 Education and the NIEM
An Ontology for K-12 Education and the NIEMOptum
2.4K views48 slides
Representation of molecular structures and related computations on the Sema... by
Representation of molecular structures and related computations on the Sema...Representation of molecular structures and related computations on the Sema...
Representation of molecular structures and related computations on the Sema...sopekmir
1.5K views34 slides
How Can Blockchain amplify Digital Identifiers? Improving Data Persistence, O... by
How Can Blockchain amplify Digital Identifiers? Improving Data Persistence, O...How Can Blockchain amplify Digital Identifiers? Improving Data Persistence, O...
How Can Blockchain amplify Digital Identifiers? Improving Data Persistence, O...sopekmir
3.1K views33 slides
Drug dna interaction by
Drug dna interactionDrug dna interaction
Drug dna interactionAnamika Banerjee
8.6K views18 slides
Project Ppt by
Project PptProject Ppt
Project PptPousali Mukherjee
3.6K views24 slides
Б.И. Нигматулин в РНЦ КИ 14.05.2010 by
Б.И. Нигматулин в РНЦ КИ 14.05.2010Б.И. Нигматулин в РНЦ КИ 14.05.2010
Б.И. Нигматулин в РНЦ КИ 14.05.2010myatom
670 views47 slides

Viewers also liked(20)

An Ontology for K-12 Education and the NIEM by Optum
An Ontology for K-12 Education and the NIEMAn Ontology for K-12 Education and the NIEM
An Ontology for K-12 Education and the NIEM
Optum2.4K views
Representation of molecular structures and related computations on the Sema... by sopekmir
Representation of molecular structures and related computations on the Sema...Representation of molecular structures and related computations on the Sema...
Representation of molecular structures and related computations on the Sema...
sopekmir1.5K views
How Can Blockchain amplify Digital Identifiers? Improving Data Persistence, O... by sopekmir
How Can Blockchain amplify Digital Identifiers? Improving Data Persistence, O...How Can Blockchain amplify Digital Identifiers? Improving Data Persistence, O...
How Can Blockchain amplify Digital Identifiers? Improving Data Persistence, O...
sopekmir3.1K views
Б.И. Нигматулин в РНЦ КИ 14.05.2010 by myatom
Б.И. Нигматулин в РНЦ КИ 14.05.2010Б.И. Нигматулин в РНЦ КИ 14.05.2010
Б.И. Нигматулин в РНЦ КИ 14.05.2010
myatom670 views
JANTI Fukushima report part 4 5 6 by myatom
JANTI Fukushima report part 4 5 6JANTI Fukushima report part 4 5 6
JANTI Fukushima report part 4 5 6
myatom1.6K views
Advanced Computational Materials Science: Application to Fusion and Generatio... by myatom
Advanced Computational Materials Science: Application to Fusion and Generatio...Advanced Computational Materials Science: Application to Fusion and Generatio...
Advanced Computational Materials Science: Application to Fusion and Generatio...
myatom1.2K views
Density Functional Theory by Wesley Chen
Density Functional TheoryDensity Functional Theory
Density Functional Theory
Wesley Chen2.4K views
B sc_I_General chemistry U-IV Ligands and chelates by Rai University
B sc_I_General chemistry U-IV Ligands and chelates  B sc_I_General chemistry U-IV Ligands and chelates
B sc_I_General chemistry U-IV Ligands and chelates
Rai University5.2K views
10.637 Lecture 1: Introduction by Heather Kulik
10.637 Lecture 1: Introduction10.637 Lecture 1: Introduction
10.637 Lecture 1: Introduction
Heather Kulik4.9K views
Python for Scientific Computing by Albert DeFusco
Python for Scientific ComputingPython for Scientific Computing
Python for Scientific Computing
Albert DeFusco2.9K views
Introduction to the phenomenology of HiTc superconductors. by ABDERRAHMANE REGGAD
Introduction to  the phenomenology of HiTc superconductors.Introduction to  the phenomenology of HiTc superconductors.
Introduction to the phenomenology of HiTc superconductors.
ABDERRAHMANE REGGAD1.3K views
Application of density functional theory (dft), by Katerina Makarova
Application of density functional theory (dft),Application of density functional theory (dft),
Application of density functional theory (dft),
Katerina Makarova1.1K views
The all-electron GW method based on WIEN2k: Implementation and applications. by ABDERRAHMANE REGGAD
The all-electron GW method based on WIEN2k: Implementation and applications.The all-electron GW method based on WIEN2k: Implementation and applications.
The all-electron GW method based on WIEN2k: Implementation and applications.
ABDERRAHMANE REGGAD5.1K views

Similar to A Standard Data Format for Computational Chemistry: CSX

Environment Canada's Data Management Service by
Environment Canada's Data Management ServiceEnvironment Canada's Data Management Service
Environment Canada's Data Management ServiceSafe Software
1.3K views28 slides
ACS 248th Paper 108 NIST-IUPAC Solubility Data by
ACS 248th Paper 108 NIST-IUPAC Solubility DataACS 248th Paper 108 NIST-IUPAC Solubility Data
ACS 248th Paper 108 NIST-IUPAC Solubility DataStuart Chalk
665 views20 slides
Source-to-source transformations: Supporting tools and infrastructure by
Source-to-source transformations: Supporting tools and infrastructureSource-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructurekaveirious
347 views15 slides
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using... by
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...Stuart Chalk
362 views23 slides
Combining and easing the access of the eswc semantic web data 0 by
Combining and easing the access of the eswc semantic web data 0Combining and easing the access of the eswc semantic web data 0
Combining and easing the access of the eswc semantic web data 0STIinnsbruck
168 views7 slides
IEEE 2014 JAVA DATA MINING PROJECTS Xs path navigation on xml schemas made easy by
IEEE 2014 JAVA DATA MINING PROJECTS Xs path navigation on xml schemas made easyIEEE 2014 JAVA DATA MINING PROJECTS Xs path navigation on xml schemas made easy
IEEE 2014 JAVA DATA MINING PROJECTS Xs path navigation on xml schemas made easyIEEEFINALYEARSTUDENTPROJECTS
579 views7 slides

Similar to A Standard Data Format for Computational Chemistry: CSX(20)

Environment Canada's Data Management Service by Safe Software
Environment Canada's Data Management ServiceEnvironment Canada's Data Management Service
Environment Canada's Data Management Service
Safe Software1.3K views
ACS 248th Paper 108 NIST-IUPAC Solubility Data by Stuart Chalk
ACS 248th Paper 108 NIST-IUPAC Solubility DataACS 248th Paper 108 NIST-IUPAC Solubility Data
ACS 248th Paper 108 NIST-IUPAC Solubility Data
Stuart Chalk665 views
Source-to-source transformations: Supporting tools and infrastructure by kaveirious
Source-to-source transformations: Supporting tools and infrastructureSource-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructure
kaveirious347 views
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using... by Stuart Chalk
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Stuart Chalk362 views
Combining and easing the access of the eswc semantic web data 0 by STIinnsbruck
Combining and easing the access of the eswc semantic web data 0Combining and easing the access of the eswc semantic web data 0
Combining and easing the access of the eswc semantic web data 0
STIinnsbruck168 views
Supercharging your Apache OODT deployments with the Process Control System by Chris Mattmann
Supercharging your Apache OODT deployments with the Process Control SystemSupercharging your Apache OODT deployments with the Process Control System
Supercharging your Apache OODT deployments with the Process Control System
Chris Mattmann1.6K views
A General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics by Flurry, Inc.
A General Purpose Extensible Scanning Query Architecture for Ad Hoc AnalyticsA General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics
A General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics
Flurry, Inc.724 views
Distributed Systems: How to connect your real-time applications by Jaime Martin Losa
Distributed Systems: How to connect your real-time applicationsDistributed Systems: How to connect your real-time applications
Distributed Systems: How to connect your real-time applications
Jaime Martin Losa1.3K views
Real time data-pipeline from inception to production by Shreya Mukhopadhyay
Real time data-pipeline from inception to productionReal time data-pipeline from inception to production
Real time data-pipeline from inception to production
Euclid Data Model 101 - Episode 01: Overview by euc-dm-test
Euclid Data Model 101 - Episode 01: OverviewEuclid Data Model 101 - Episode 01: Overview
Euclid Data Model 101 - Episode 01: Overview
euc-dm-test540 views
XML, XML Databases and MPEG-7 by Deniz Kılınç
XML, XML Databases and MPEG-7XML, XML Databases and MPEG-7
XML, XML Databases and MPEG-7
Deniz Kılınç3.3K views
A Gen3 Perspective of Disparate Data by Robert Grossman
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
Robert Grossman1.2K views

More from Stuart Chalk

Semantic properties and units by
Semantic properties and unitsSemantic properties and units
Semantic properties and unitsStuart Chalk
265 views25 slides
Open semantic chemical structures by
Open semantic chemical structuresOpen semantic chemical structures
Open semantic chemical structuresStuart Chalk
109 views29 slides
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr... by
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...Stuart Chalk
215 views27 slides
AnIML: A New Analytical Data Standard by
AnIML: A New Analytical Data StandardAnIML: A New Analytical Data Standard
AnIML: A New Analytical Data StandardStuart Chalk
673 views25 slides
A Generic Scientific Data Model and Ontology for Representation of Chemical Data by
A Generic Scientific Data Model and Ontology for Representation of Chemical DataA Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataStuart Chalk
614 views21 slides
Scientific Units in the Electronic Age by
Scientific Units in the Electronic AgeScientific Units in the Electronic Age
Scientific Units in the Electronic AgeStuart Chalk
465 views32 slides

More from Stuart Chalk(20)

Semantic properties and units by Stuart Chalk
Semantic properties and unitsSemantic properties and units
Semantic properties and units
Stuart Chalk265 views
Open semantic chemical structures by Stuart Chalk
Open semantic chemical structuresOpen semantic chemical structures
Open semantic chemical structures
Stuart Chalk109 views
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr... by Stuart Chalk
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...
Stuart Chalk215 views
AnIML: A New Analytical Data Standard by Stuart Chalk
AnIML: A New Analytical Data StandardAnIML: A New Analytical Data Standard
AnIML: A New Analytical Data Standard
Stuart Chalk673 views
A Generic Scientific Data Model and Ontology for Representation of Chemical Data by Stuart Chalk
A Generic Scientific Data Model and Ontology for Representation of Chemical DataA Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
Stuart Chalk614 views
Scientific Units in the Electronic Age by Stuart Chalk
Scientific Units in the Electronic AgeScientific Units in the Electronic Age
Scientific Units in the Electronic Age
Stuart Chalk465 views
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ... by Stuart Chalk
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Stuart Chalk387 views
The Electronic Notebook Ontology by Stuart Chalk
The Electronic Notebook OntologyThe Electronic Notebook Ontology
The Electronic Notebook Ontology
Stuart Chalk277 views
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data by Stuart Chalk
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series DataSharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Stuart Chalk252 views
Bringing Flow injection Analysis to the Semantic Web by Stuart Chalk
Bringing Flow injection Analysis to the Semantic WebBringing Flow injection Analysis to the Semantic Web
Bringing Flow injection Analysis to the Semantic Web
Stuart Chalk651 views
Reactions to the Open Spectral Database by Stuart Chalk
Reactions to the Open Spectral DatabaseReactions to the Open Spectral Database
Reactions to the Open Spectral Database
Stuart Chalk287 views
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015 by Stuart Chalk
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
Stuart Chalk457 views
Building a Standard for Standards: The ChAMP Project by Stuart Chalk
Building a Standard for Standards: The ChAMP ProjectBuilding a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP Project
Stuart Chalk1.1K views
Overview of the Analytical Information Markup Language (AnIML) by Stuart Chalk
Overview of the Analytical Information Markup Language (AnIML)Overview of the Analytical Information Markup Language (AnIML)
Overview of the Analytical Information Markup Language (AnIML)
Stuart Chalk1.4K views
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka by Stuart Chalk
ACS 248th Paper 146 VIVO/ScientistsDB Integration into EurekaACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
Stuart Chalk388 views
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration by Stuart Chalk
ACS 248th Paper 136 JSmol/JSpecView Eureka IntegrationACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
Stuart Chalk970 views
ACS 248th Paper 104 ChemData Project by Stuart Chalk
ACS 248th Paper 104 ChemData ProjectACS 248th Paper 104 ChemData Project
ACS 248th Paper 104 ChemData Project
Stuart Chalk453 views
ACS 248th Paper 71 ChAMP Project by Stuart Chalk
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP Project
Stuart Chalk1.2K views
ACS 248th Paper 67 Eureka Collaboration by Stuart Chalk
ACS 248th Paper 67 Eureka CollaborationACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka Collaboration
Stuart Chalk394 views
247th ACS Meeting: The Eureka Research Workbench by Stuart Chalk
247th ACS Meeting: The Eureka Research Workbench247th ACS Meeting: The Eureka Research Workbench
247th ACS Meeting: The Eureka Research Workbench
Stuart Chalk1K views

Recently uploaded

Presentation on experimental laboratory animal- Hamster by
Presentation on experimental laboratory animal- HamsterPresentation on experimental laboratory animal- Hamster
Presentation on experimental laboratory animal- HamsterKanika13641
6 views8 slides
IMMUNODIAGNOSTICS KITS.pdf by
IMMUNODIAGNOSTICS KITS.pdfIMMUNODIAGNOSTICS KITS.pdf
IMMUNODIAGNOSTICS KITS.pdfvetrivel303632
31 views10 slides
Ellagic Acid and Its Metabolites as Potent and Selective Allosteric Inhibitor... by
Ellagic Acid and Its Metabolites as Potent and Selective Allosteric Inhibitor...Ellagic Acid and Its Metabolites as Potent and Selective Allosteric Inhibitor...
Ellagic Acid and Its Metabolites as Potent and Selective Allosteric Inhibitor...Trustlife
184 views17 slides
Note on the Riemann Hypothesis by
Note on the Riemann HypothesisNote on the Riemann Hypothesis
Note on the Riemann Hypothesisvegafrank2
9 views20 slides
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F... by
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...SwagatBehera9
6 views36 slides
Worldviews and their (im)plausibility: Science and Holism by
Worldviews and their (im)plausibility: Science and HolismWorldviews and their (im)plausibility: Science and Holism
Worldviews and their (im)plausibility: Science and HolismJohnWilkins48
44 views19 slides

Recently uploaded(20)

Presentation on experimental laboratory animal- Hamster by Kanika13641
Presentation on experimental laboratory animal- HamsterPresentation on experimental laboratory animal- Hamster
Presentation on experimental laboratory animal- Hamster
Kanika136416 views
Ellagic Acid and Its Metabolites as Potent and Selective Allosteric Inhibitor... by Trustlife
Ellagic Acid and Its Metabolites as Potent and Selective Allosteric Inhibitor...Ellagic Acid and Its Metabolites as Potent and Selective Allosteric Inhibitor...
Ellagic Acid and Its Metabolites as Potent and Selective Allosteric Inhibitor...
Trustlife184 views
Note on the Riemann Hypothesis by vegafrank2
Note on the Riemann HypothesisNote on the Riemann Hypothesis
Note on the Riemann Hypothesis
vegafrank29 views
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F... by SwagatBehera9
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...
SwagatBehera96 views
Worldviews and their (im)plausibility: Science and Holism by JohnWilkins48
Worldviews and their (im)plausibility: Science and HolismWorldviews and their (im)plausibility: Science and Holism
Worldviews and their (im)plausibility: Science and Holism
JohnWilkins4844 views
ELECTRON TRANSPORT CHAIN by DEEKSHA RANI
ELECTRON TRANSPORT CHAINELECTRON TRANSPORT CHAIN
ELECTRON TRANSPORT CHAIN
DEEKSHA RANI18 views
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ... by ILRI
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
ILRI7 views
Cyanobacteria as a Biofertilizer (BY- Ayushi).pptx by AyushiKardam
Cyanobacteria as a Biofertilizer (BY- Ayushi).pptxCyanobacteria as a Biofertilizer (BY- Ayushi).pptx
Cyanobacteria as a Biofertilizer (BY- Ayushi).pptx
AyushiKardam5 views
Determination of color fastness to rubbing(wet and dry condition) by crockmeter. by ShadmanSakib63
Determination of color fastness to rubbing(wet and dry condition) by crockmeter.Determination of color fastness to rubbing(wet and dry condition) by crockmeter.
Determination of color fastness to rubbing(wet and dry condition) by crockmeter.
ShadmanSakib638 views
Generative AI to Accelerate Discovery of Materials by Deakin University
Generative AI to Accelerate Discovery of MaterialsGenerative AI to Accelerate Discovery of Materials
Generative AI to Accelerate Discovery of Materials
Evaluation and Standardization of the Marketed Polyherbal drug Patanjali Divy... by Anmol Vishnu Gupta
Evaluation and Standardization of the Marketed Polyherbal drug Patanjali Divy...Evaluation and Standardization of the Marketed Polyherbal drug Patanjali Divy...
Evaluation and Standardization of the Marketed Polyherbal drug Patanjali Divy...
2. Natural Sciences and Technology Author Siyavula.pdf by ssuser821efa
2. Natural Sciences and Technology Author Siyavula.pdf2. Natural Sciences and Technology Author Siyavula.pdf
2. Natural Sciences and Technology Author Siyavula.pdf
ssuser821efa13 views
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe... by Anmol Vishnu Gupta
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...
Exploring the nature and synchronicity of early cluster formation in the Larg... by Sérgio Sacani
Exploring the nature and synchronicity of early cluster formation in the Larg...Exploring the nature and synchronicity of early cluster formation in the Larg...
Exploring the nature and synchronicity of early cluster formation in the Larg...
Sérgio Sacani1.5K views
Best Hybrid Event Platform.pptx by Harriet Davis
Best Hybrid Event Platform.pptxBest Hybrid Event Platform.pptx
Best Hybrid Event Platform.pptx
Harriet Davis11 views

A Standard Data Format for Computational Chemistry: CSX

  • 1. A Standard Data Format for Computational Chemistry: CSX Stuart J. Chalk1,2, Neil Ostlund1, Mirek Sopek1, Bing Wang1 1) Chemical Semantics Inc., Gainesville FL 2) Department of Chemistry, University of North Florida schalk@unf.edu 249th ACS Meeting, Denver, CO – March 2015
  • 2.  Semantic Annotation of Data  Current DOE Project  Data Transformations  Common Standard for eXchange (CSX)  CSX a Standard Data Format  The CSX Schema  CSX - Publishing Information  CSX - Molecular System Information  CSX - Calculated Result Information  Future Plans  Conclusion Outline
  • 3.  Create a way to ‘teach’ computers what information means – contextualize the data  Example  What is this? 904-620-1938  A computer just sees it as…  … a string  By using an appropriate semantic definition in RDF (the Resource Description Framework) we can identify to the computer that the text is a phone number (using the Friend of a Friend (FOAF) specification), i.e. Semantic Annotation of Data RDF Specification http://www.w3.org/RDF/ FOAF Specification http://xmlns.com/foaf/spec/ <foaf:phone rdf:datatype=“#string">904-620-1938</foaf:phone>
  • 4.  RDF can be use to relate information as well as annotate it  The following RDF/XML shows how some information is related (XML is the eXtensible Markup Language)  Applying this technology to computational chemistry calculations will allow integration of the calculation and results with data about chemicals from other sources Semantic Annotation of Data <rdf:Description rdf:about=http://example.org/StuartChalk> <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/> <foaf:knows rdf:resource="http://example.org/NeilOstlund"/> <foaf:phone rdf:datatype=”…#string”>904-620- 1938</foaf:phone> </rdf:Description>
  • 5.  Chemical Semantics is funded by DOE to create a web portal to collect, organize and make searchable the results output from computational chemistry (CC) calculations  This will be freely available and will accept output from all CC software packages  The intent is to capture calculation results and…  Software used to calculate the results  Input parameters used in the calculation  Methodology by which the calculation was done  Details of the molecular system studied DOE SBIR Grant
  • 6.  The approach Chemical Semantics is taking is to 1. Add code to software packages to generate an XML file alongside the normal output file –OR– Parse an existing output file (using a free application) and generate XML file 2. Send the XML file into the web portal 3. Convert the XML file into RDF into turtle format (TTL) 4. Finally, ingest TTL into a triplestore (Virtuoso)  All the data in Virtuoso can then be search using SPARQL (SPARQL Protocol and RDF Query Language) Data Transformations Virtuoso http://virtuoso.openlinksw.com/ SPARQL http://www.w3.org/TR/sparql11-query/
  • 7.  Why XML?  Human readable (plain text - UTF-8)  Platform neutral  Archivable  Validatable  Why not use CML?  Inability to represent complex structures e.g. residues  No standard way to add CC results Intermediate XML File
  • 8.  A CSX file is a text based file written in XML  It is a structured data container design to hold CC result data and additional metadata  Version 0.x was developed by Neil Ostlund  Version 1.0 is the current stable release developed as part of Phase 1 of the SBIR grant (limited scope)  Version 2.0 is currently under development as part of Phase 2 of the SBIR grant Common Standard for eXchange (CSX)
  • 9.  It is well know that the formats in which data is reported in CC output files is:  Highly variable (software specific)  Sometimes difficult to interpret  Standardization would:  Allow data from different packages to be more easily compared  Open up opportunities for software development to display and reuse data for different applications  This mirrors movement in the CC community toward a common driver base for CC software packages CSX as a Standard Data Format
  • 10.  In order to describe the layout and allowed names of elements and attributes, and values for both, a schema document is available for the CSX specification  This can be used to help new users write valid CSX files (using XML editing applications such as XML Spy and oxygenXML) and…  … validate existing CSX files using any of a number of XML validators (e.g. Xerces) …  … and understand the structure of the data especially for less frequently calculated results The CSX Schema
  • 15. CSX – Publication Information
  • 16. CSX – Molecular System Information
  • 17. CSX – Calculated Result Information
  • 18.  Work on CSX 2.0 is ongoing – expand to multiple systems and sets of calculated results  Develop CSX focused website with converter functionality, libraries, and documentation  Engage CC software users/programmers to get involved with the project  Organize a community developer workshop over summer 2015  Publish version 2.0 of CSX in Fall 2015 Future Plans
  • 19.  CSX started out as a stepping stone to transfer information to the CS portal  Having a data standard for CC is an important development in of itself  The CC community can do more with their data  Leverage XML tools to visualize, process etc…  Compare results across CC packages  Validate results  Reference basis sets (https://bse.pnl.gov/) Conclusion
  • 20.  schalk@unf.edu  Phone: 904-620-1938  Skype: stuartchalk  LinkedIn/Slidehare: https://www.linkedin.com/in/stuchalk  ORCID: http://orcid.org/0000-0002-0703-7776  ResearcherID: http://www.researcherid.com/rid/D-8577-2013 Questions?