Eureka Research Workbench:
A Semantic Approach
to an Open Source
Electronic Laboratory Notebook
Stuart J. Chalk
Department of Chemistry
University of North Florida
schalk@unf.edu
2013 Fall ACS Meeting – CINF Paper 116
Big Data
Electronic Notebooks
The Eureka Research Workbench
Experiment Markup Language
ExptML Schema and Files
Semantic Data and Ontologies
File Storage
Eureka Interface
Web Interface
Conclusion
Outline
Current buzz word for “this bring together lots of data and
build tools on top to extract knowledge”
This is great, except…
How do we do that for science?
Platform, data structures, and exchange protocols to
capture, identify, and disseminate scientific information
Research Data Alliance (https://rd-alliance.org/)
http://www.nytimes.com/2013/08/13/science/how-to-share-scientific-data.html
Big Data
Electronic Notebooks (ELNs) very common in industry
Not appropriate for academics doing science
Expensive
Overly complicated (regulations)
Data sharing not easy
We need an electronic notebook for faculty/students
LabArchives http://www.labarchives.com
eCAT http://www.researchspace.com/electronic-lab-notebook/
LabTrove http://www.labtrove.org/
Dryad data publishing http://datadryad.org/
Electronic Notebooks
Started in 2006 as an offshoot of getting involved in the
Analytical Information Markup Language (AnIML) project
through ASTM
No way to store all research notes in a digital format
No way to capture the workflow of scientists
Realized writing in a lab notebook is equivalent to “multi-
type” blogging in the digital world
How to capture information? Many datatypes -> ExptML
How to store files and make them available through web
interface? (Fedora-Commons)
How to link data together? RDF (in Fedora-Commons)
Eureka Research Workbench
A specification (written in XML) that describes
different types of information recorded during the
scientific process (http://exptml.sourceforge.net)
Many datatypes (will expand…)
Experiment Markup Language (ExptML)
 Sample
 Solution
 Space
 Specimen
 Substance
 Task
 Template
 Timeline
 User
 Vendor
 Annotation
 Api
 Calculation
 Chemical
 Citation
 Communication
 Customer
 Data
 Dataset
 Definition
 Element
 Equipment
 Event
 Experiment
 Group
 Project
 Protocol
 Quote
 Report
 Result
ExptML Chemical Schema
ExptML Chemical Schema
ExptML Chemical Instance
ExptML Chemical Instance
Files that represent the data need to be ‘linked’ together to
allow the user to see the context of the data
The ‘Semantic Web’ is a big push to contextualize data
Proposed storage of ‘relationships’ between data is the
Resource Description Format (RDF - http://www.w3.org/RDF/)
Semantic Data
From http://www.w3.org/TR/2004/REC-rdf-primer-20040210/
In computer science and ontology
“formally represents knowledge as a set of concepts within
a domain, and the relationships between those concepts. It
can be used to model a domain and support reasoning about
concepts.”*
In essence, an ontology allows us to define the
relationships and assertions about concepts
For substances represented in ExptML we define
isSubstance (assertion)
hasSubstance
isSubstanceOf
ExptML Ontology
*https://en.wikipedia.org/wiki/Ontology_(information_science)
ExptML Ontology
Digital repository software for creating and managing
online digital libraries
Stores the ExptML files
Stores any other files (PDFs, Images, Word etc.)
Stores relationships as RDF
Version control
Checksumming
Built in search of content and relationships
Fedora Commons
Fedora-Commons treats each ExptML file as an object
In the definition of a fedora object the file is just one
stream of many. By default each object also has a “DC”
stream of metadata and an “RELS-EXT” stream of
relationships
Each Fedora object can have any number of additional
streams for
Paper PDFs, product/sample pictures, original file formats (if a
conversion has been done)
Video, audio, anything
You can export individual streams or the whole Fedora
object with streams binary encoded (Sharing/archiving)
File Storage
File Storage
So, finally to the Eureka Research Workbench!
Web interface written in PHP using the CakePHP Framework
Communicates with Fedora-Commons API to create,
retrieve, update and delete (CRUD) ExptML and other files
Representational State Transfer (REST) format for URLs
E.g. http://web.server/chemicals/view/exptml:chm1
Allows for searching of all files in Fedora
Can also search based on relationships
Can extract data out of XML files
Can gather data from other websites (via API controller) and
add it to ExptML files
Eureka Interface
Eureka Website - Group
Onlydatatypesrelatedtothe
researchgroupshowuponleft

Eureka Website – Lab Bench
Typesofinformationthatarethingsyou
wouldhaveonyourlabbenchareonleft

Clicking on the “Add” menu on the right
Allows you add a comment to this solution
Eureka Website – Notebook
Typicalthingswerecord
inournotebook

Eureka Website - Laboratory
Informationaboutresourcesthat
youuseinyourlaboratory

The “Rel” menu shows you the information related to this instrument
Eureka Website - Library
Papersandprotocols
relatedtoyourwork

You can add the PDF
of the paper to the
citation.
The contents of the
PDF is searchable in
the system
Eureka Website - StockroomChemicalandSubstance
Informationisrelatedtogether

Robust markup language for representing science data
(ExptML)
Reliable storage system for ExptML files (Fedora)
Method for storage of relationships (RDF in Fedora)
Web application to create ExptML files (Eureka)
TODO
Provide web functionality to process data
Provide mechanism for sharing of data (different levels)
Integration into the RDA model for sharing research data
Get the word out and test system with many users
Conclusion
References
Eureka – http://sourceforge.net/projects/eureka
Fedora-Commons – http://fedora-commons.org
XML – http://www.w3.org/standards/xml
ExptML – http://exptml.sourceforge.net/
JSON – http://www.json.org/
UnitsML – http://unitsml.nist.gov/
RDF – http://www.w3.org/RDF/
CIR – http://cactus.nci.nih.gov/chemical/structure
RDA – http://rd-alliance.org

Eureka Research Workbench: A Semantic Approach to an Open Source Electronic Laboratory Notebook

  • 1.
    Eureka Research Workbench: ASemantic Approach to an Open Source Electronic Laboratory Notebook Stuart J. Chalk Department of Chemistry University of North Florida schalk@unf.edu 2013 Fall ACS Meeting – CINF Paper 116
  • 2.
    Big Data Electronic Notebooks TheEureka Research Workbench Experiment Markup Language ExptML Schema and Files Semantic Data and Ontologies File Storage Eureka Interface Web Interface Conclusion Outline
  • 3.
    Current buzz wordfor “this bring together lots of data and build tools on top to extract knowledge” This is great, except… How do we do that for science? Platform, data structures, and exchange protocols to capture, identify, and disseminate scientific information Research Data Alliance (https://rd-alliance.org/) http://www.nytimes.com/2013/08/13/science/how-to-share-scientific-data.html Big Data
  • 4.
    Electronic Notebooks (ELNs)very common in industry Not appropriate for academics doing science Expensive Overly complicated (regulations) Data sharing not easy We need an electronic notebook for faculty/students LabArchives http://www.labarchives.com eCAT http://www.researchspace.com/electronic-lab-notebook/ LabTrove http://www.labtrove.org/ Dryad data publishing http://datadryad.org/ Electronic Notebooks
  • 5.
    Started in 2006as an offshoot of getting involved in the Analytical Information Markup Language (AnIML) project through ASTM No way to store all research notes in a digital format No way to capture the workflow of scientists Realized writing in a lab notebook is equivalent to “multi- type” blogging in the digital world How to capture information? Many datatypes -> ExptML How to store files and make them available through web interface? (Fedora-Commons) How to link data together? RDF (in Fedora-Commons) Eureka Research Workbench
  • 6.
    A specification (writtenin XML) that describes different types of information recorded during the scientific process (http://exptml.sourceforge.net) Many datatypes (will expand…) Experiment Markup Language (ExptML)  Sample  Solution  Space  Specimen  Substance  Task  Template  Timeline  User  Vendor  Annotation  Api  Calculation  Chemical  Citation  Communication  Customer  Data  Dataset  Definition  Element  Equipment  Event  Experiment  Group  Project  Protocol  Quote  Report  Result
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
    Files that representthe data need to be ‘linked’ together to allow the user to see the context of the data The ‘Semantic Web’ is a big push to contextualize data Proposed storage of ‘relationships’ between data is the Resource Description Format (RDF - http://www.w3.org/RDF/) Semantic Data From http://www.w3.org/TR/2004/REC-rdf-primer-20040210/
  • 12.
    In computer scienceand ontology “formally represents knowledge as a set of concepts within a domain, and the relationships between those concepts. It can be used to model a domain and support reasoning about concepts.”* In essence, an ontology allows us to define the relationships and assertions about concepts For substances represented in ExptML we define isSubstance (assertion) hasSubstance isSubstanceOf ExptML Ontology *https://en.wikipedia.org/wiki/Ontology_(information_science)
  • 13.
  • 14.
    Digital repository softwarefor creating and managing online digital libraries Stores the ExptML files Stores any other files (PDFs, Images, Word etc.) Stores relationships as RDF Version control Checksumming Built in search of content and relationships Fedora Commons
  • 15.
    Fedora-Commons treats eachExptML file as an object In the definition of a fedora object the file is just one stream of many. By default each object also has a “DC” stream of metadata and an “RELS-EXT” stream of relationships Each Fedora object can have any number of additional streams for Paper PDFs, product/sample pictures, original file formats (if a conversion has been done) Video, audio, anything You can export individual streams or the whole Fedora object with streams binary encoded (Sharing/archiving) File Storage
  • 16.
  • 17.
    So, finally tothe Eureka Research Workbench! Web interface written in PHP using the CakePHP Framework Communicates with Fedora-Commons API to create, retrieve, update and delete (CRUD) ExptML and other files Representational State Transfer (REST) format for URLs E.g. http://web.server/chemicals/view/exptml:chm1 Allows for searching of all files in Fedora Can also search based on relationships Can extract data out of XML files Can gather data from other websites (via API controller) and add it to ExptML files Eureka Interface
  • 18.
    Eureka Website -Group Onlydatatypesrelatedtothe researchgroupshowuponleft 
  • 19.
    Eureka Website –Lab Bench Typesofinformationthatarethingsyou wouldhaveonyourlabbenchareonleft  Clicking on the “Add” menu on the right Allows you add a comment to this solution
  • 20.
    Eureka Website –Notebook Typicalthingswerecord inournotebook 
  • 21.
    Eureka Website -Laboratory Informationaboutresourcesthat youuseinyourlaboratory  The “Rel” menu shows you the information related to this instrument
  • 22.
    Eureka Website -Library Papersandprotocols relatedtoyourwork  You can add the PDF of the paper to the citation. The contents of the PDF is searchable in the system
  • 23.
    Eureka Website -StockroomChemicalandSubstance Informationisrelatedtogether 
  • 24.
    Robust markup languagefor representing science data (ExptML) Reliable storage system for ExptML files (Fedora) Method for storage of relationships (RDF in Fedora) Web application to create ExptML files (Eureka) TODO Provide web functionality to process data Provide mechanism for sharing of data (different levels) Integration into the RDA model for sharing research data Get the word out and test system with many users Conclusion
  • 25.
    References Eureka – http://sourceforge.net/projects/eureka Fedora-Commons– http://fedora-commons.org XML – http://www.w3.org/standards/xml ExptML – http://exptml.sourceforge.net/ JSON – http://www.json.org/ UnitsML – http://unitsml.nist.gov/ RDF – http://www.w3.org/RDF/ CIR – http://cactus.nci.nih.gov/chemical/structure RDA – http://rd-alliance.org