Overview of the Analytical
Information Markup Language
Stuart J. Chalk, Department of Chemistry, University of North Florida
schalk@unf.edu
ACS Meeting Denver 2015
 Data Formats
 Goals for Data Handling
 Introduction to AnIML
 Sections of an AnIML file
 AnIML Schemas and Files
 AnIML Technique Definitions
 Publishing Instrument Data
 Referencing Data Elements
 Calculations on Data
 Future Developments
 Conclusion
Overview
 Native Data Formats
 Proprietary formats
 "Metadata" separated from result data
 Metadata and data in multiple files
 Metadata not available electronically
 No way to link metadata with result data
 Interchange Data Formats
 Available for only a few techniques
 ANDI — GC, LC, MS
 JCAMP-DX — IR/FTIR, NMR, UV/Vis, IMS
 Fixed order, fixed syntax, immutable formats
 Content limitations
 Inconsistent implementations
Data Formats
 Extensible
 Easy to add new elements without breaking existing
applications
 Flexible
 Useful for diverse needs: Interchange, Interconversion,
Archiving...
 Useable & Maintainable
 Easy to create, use, adapt, maintain...
 Readily available tools
 Acceptable
 Use standard mechanisms accepted by mainstream
computing
 Human readable
 eXtensible Markup Language
Goals for Data Handling
 Extensible Markup Language (XML) specification
 Development under ASTM E13.15 ‘AnIML Task Group’
 Data standard to:
“Develop an analytical data standard that can
be used to store data from any analytical instrument”
Introduction to AnIML
http://animl.sourceforge.net
 JCAMP-DX
 http://www.jcamp-dx.org/
 ANDI (netCDF)
 ThermoML (NIST)
 SpectroML
 Nguyen, A. D. T., Arslan, A., Travis, J., Smith, M., Schafer, R., &
Kramer, G. W. (2004) ‘Molecular Spectrometry Data
Interchange Applications for NIST's SpectroML’, JALA 9 (6),
346-354. doi:10.1016/j.jala.2004.09.001
 Generalized Analytical Markup Language (GAML)
 http://www.gaml.org/
 First official meeting March 23, 2003 @ ASTM
Brief History of Time AnIML
 Broad scope
 Different types of data
 Size of data sets
 Everyone calls ‘widgit’ something different
 Need for metadata dictionaries
 One size does not fit all
 Getting broad community involvement
 Domain experts
 User communities
 What format?
Challenges for AnIML
 AnIML XML elements are ‘pigeon holes’ for metadata
 Minimal ‘required’ information
 If it’s not required you don’t have to include the element
 Extensible
 Store raw data not processed data
(except for FT techniques)
 Support for legacy data
 Record of changes
 Validatable
 Signable (digital sense)
AnIML Design Philosophy
AnIML Schemas and Files
Sections of an AnIML File
AnIML Technique Definitions
AnIML - Sample
AnIML - Sample
AnIML
-
Experiment
AnIML - Result
 Access
 Reference
 Search
 Visualize
 Export
 Manipulate
 Process
 Contextualize
 Leverage XML
tools/formats
AnIML in an ELN
 AnIML Viewer -> Jmol/JSpecView (http://jmol.sourceforge.net)
Viewing Instrument Data
 Conversion of AnIML data to SVG using XSLT
Publishing Instrument Data
 Expose an AnIML file at a URL
 Optional: Define a DOI for that URL
 Use XPath to reference a specific data point in an AnIML file
 //ExperimentStepSet[1]/ExperimentStep[1]/Method[1]/Auth
or[1]/Name[1]
 Encode the XPath expression so it can be part of the URL
Referencing Instrument Data
Calculations with Instrument Data
 Extract data from files using XPath
 XML data to JSON conversion using XSLT*
 Browser based JavaScript functions to
 Smooth: moving window, Savitsky-Golay
 Integrate: summation
 Conversion: Absorbance <-> %T
 Linear regression
*http://www.bramstein.com/projects/xsltjson/
 AnIML 1.0 Deliverables
 Core Schema - Fundamental framework for AnIML documents
 Technique Schema - Fundamental framework for technique definition and
extension documents
 AnIML Technique Definition Documents (ATDD) - Rules for content of
specific technique file
 AnIML Naming and Design Rules - Specifies rules about data element
structure for interoperability
 Standard Practice for AnIML Files - Describes how the specification is
supposed to work
 How to Create a Technique Definition Document - Guidelines for creating
new technique definition documents
 Other documents
 Draft Requirements Specification for AnIML Version 1.0
 Requirements and Goals of the Analytical Information Markup Language
AnIML Specification
http://animl.sourceforge.net
 Documentation
 Core specification
 Technique and extension specification
 Naming and design rules
 Annotated technique definitions
(UV/Vis, IR, 1D NMR, MS, Chromatography)
 Balloting through ASTM (end of 2015)
 Vendor, User, Developer extensions
 Semantic extension
 Ontological reference to AnIML metadata items
Future Developments
Conclusion
 AnIML is a great solution for storing instrument data
 Human readable (plain text - UTF-8)
 Platform neutral
 Archivable
 Validatable
 Being XML based leverages the extensive XML
ecosystem of tools that are mostly free
 Software designers are familiar with dealing with XML
due to its well defined and stable architecture
 schalk@unf.edu
 Phone: 904-620-1938
 Skype: stuartchalk
 LinkedIn/Slidehare: https://www.linkedin.com/in/stuchalk
 ORCID: http://orcid.org/0000-0002-0703-7776
 ResearcherID: http://www.researcherid.com/rid/D-8577-2013
Questions?

Overview of the Analytical Information Markup Language (AnIML)

  • 1.
    Overview of theAnalytical Information Markup Language Stuart J. Chalk, Department of Chemistry, University of North Florida schalk@unf.edu ACS Meeting Denver 2015
  • 2.
     Data Formats Goals for Data Handling  Introduction to AnIML  Sections of an AnIML file  AnIML Schemas and Files  AnIML Technique Definitions  Publishing Instrument Data  Referencing Data Elements  Calculations on Data  Future Developments  Conclusion Overview
  • 3.
     Native DataFormats  Proprietary formats  "Metadata" separated from result data  Metadata and data in multiple files  Metadata not available electronically  No way to link metadata with result data  Interchange Data Formats  Available for only a few techniques  ANDI — GC, LC, MS  JCAMP-DX — IR/FTIR, NMR, UV/Vis, IMS  Fixed order, fixed syntax, immutable formats  Content limitations  Inconsistent implementations Data Formats
  • 4.
     Extensible  Easyto add new elements without breaking existing applications  Flexible  Useful for diverse needs: Interchange, Interconversion, Archiving...  Useable & Maintainable  Easy to create, use, adapt, maintain...  Readily available tools  Acceptable  Use standard mechanisms accepted by mainstream computing  Human readable  eXtensible Markup Language Goals for Data Handling
  • 5.
     Extensible MarkupLanguage (XML) specification  Development under ASTM E13.15 ‘AnIML Task Group’  Data standard to: “Develop an analytical data standard that can be used to store data from any analytical instrument” Introduction to AnIML http://animl.sourceforge.net
  • 6.
     JCAMP-DX  http://www.jcamp-dx.org/ ANDI (netCDF)  ThermoML (NIST)  SpectroML  Nguyen, A. D. T., Arslan, A., Travis, J., Smith, M., Schafer, R., & Kramer, G. W. (2004) ‘Molecular Spectrometry Data Interchange Applications for NIST's SpectroML’, JALA 9 (6), 346-354. doi:10.1016/j.jala.2004.09.001  Generalized Analytical Markup Language (GAML)  http://www.gaml.org/  First official meeting March 23, 2003 @ ASTM Brief History of Time AnIML
  • 7.
     Broad scope Different types of data  Size of data sets  Everyone calls ‘widgit’ something different  Need for metadata dictionaries  One size does not fit all  Getting broad community involvement  Domain experts  User communities  What format? Challenges for AnIML
  • 8.
     AnIML XMLelements are ‘pigeon holes’ for metadata  Minimal ‘required’ information  If it’s not required you don’t have to include the element  Extensible  Store raw data not processed data (except for FT techniques)  Support for legacy data  Record of changes  Validatable  Signable (digital sense) AnIML Design Philosophy
  • 9.
  • 10.
    Sections of anAnIML File
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
     Access  Reference Search  Visualize  Export  Manipulate  Process  Contextualize  Leverage XML tools/formats AnIML in an ELN
  • 17.
     AnIML Viewer-> Jmol/JSpecView (http://jmol.sourceforge.net) Viewing Instrument Data
  • 18.
     Conversion ofAnIML data to SVG using XSLT Publishing Instrument Data
  • 19.
     Expose anAnIML file at a URL  Optional: Define a DOI for that URL  Use XPath to reference a specific data point in an AnIML file  //ExperimentStepSet[1]/ExperimentStep[1]/Method[1]/Auth or[1]/Name[1]  Encode the XPath expression so it can be part of the URL Referencing Instrument Data
  • 20.
    Calculations with InstrumentData  Extract data from files using XPath  XML data to JSON conversion using XSLT*  Browser based JavaScript functions to  Smooth: moving window, Savitsky-Golay  Integrate: summation  Conversion: Absorbance <-> %T  Linear regression *http://www.bramstein.com/projects/xsltjson/
  • 21.
     AnIML 1.0Deliverables  Core Schema - Fundamental framework for AnIML documents  Technique Schema - Fundamental framework for technique definition and extension documents  AnIML Technique Definition Documents (ATDD) - Rules for content of specific technique file  AnIML Naming and Design Rules - Specifies rules about data element structure for interoperability  Standard Practice for AnIML Files - Describes how the specification is supposed to work  How to Create a Technique Definition Document - Guidelines for creating new technique definition documents  Other documents  Draft Requirements Specification for AnIML Version 1.0  Requirements and Goals of the Analytical Information Markup Language AnIML Specification http://animl.sourceforge.net
  • 22.
     Documentation  Corespecification  Technique and extension specification  Naming and design rules  Annotated technique definitions (UV/Vis, IR, 1D NMR, MS, Chromatography)  Balloting through ASTM (end of 2015)  Vendor, User, Developer extensions  Semantic extension  Ontological reference to AnIML metadata items Future Developments
  • 23.
    Conclusion  AnIML isa great solution for storing instrument data  Human readable (plain text - UTF-8)  Platform neutral  Archivable  Validatable  Being XML based leverages the extensive XML ecosystem of tools that are mostly free  Software designers are familiar with dealing with XML due to its well defined and stable architecture
  • 24.
     schalk@unf.edu  Phone:904-620-1938  Skype: stuartchalk  LinkedIn/Slidehare: https://www.linkedin.com/in/stuchalk  ORCID: http://orcid.org/0000-0002-0703-7776  ResearcherID: http://www.researcherid.com/rid/D-8577-2013 Questions?