Eureka Research Workbench:
An Open Source eScience
Laboratory Notebook
Stuart J. Chalk
Department of Chemistry
University ...
 Big Data
 Electronic Notebooks
 The Eureka Research Workbench
 Experiment Markup Language
 ExptML Schema and Files
...
 Current buzz word for “this bring together lots of data and
build tools on top to extract knowledge”
 This is great, ex...
 Scientists need to move to
digital notebooks…
 ...and record not just the data
but the flow and context
 How science i...
 Started in 2006 after getting involved in the
Analytical Information Markup Language (AnIML) project
 Store all researc...
 A specification (written in XML) that describes different
types of information recorded during the scientific process
(h...
ExptML Chemical Schema
ExptML
Chemical
Schema
ExptML Chemical (Instance)
 Data are connected to other data – ‘Linked Data’
(http://www.w3.org/standards/semanticweb/data)
 The ‘Semantic Web’ app...
 Digital repository software http://fedora-commons.org/
 Creation and management of online digital libraries
 Fedora ‘D...
 Fedora-Commons defines and works on digital objects
 In the definition of a Fedora object an ExptML file is just
one st...
Fedora Object Storage
 Web interface written in PHP using the CakePHP Framework
 Communicates with Fedora-Commons API to create,
retrieve, upd...
Eureka Website – Group View
Only data types related to the research group show up on left
Eureka Website – Bench View
Clicking on the “Add” menu on the right
allows you add a comment or link to data
Eureka Website – Notebook View
Eureka Website – Laboratory View
The “Rel” menu shows you the information related to this instrument
Eureka Website – Library View
You can add the PDF of the paper to the citation. The contents of the PDF are searchable in ...
Eureka Website – Stockroom View
 Web Application
 Server: Fedora 4, JSON-LD, ElasticSearch
 Client: CakePHP 3/HTML5, Recline.js, Annotator, JQuery
 St...
 Implement ingest of all data types, file (if appropriate) and web based
 In browser processing of data -> dataset -> re...
 Eureka: Web application to create ExptML files
 Built on ExptML to capture data/resources/workflows
 Reliable storage/...
References
 Eureka – http://sourceforge.net/projects/eureka
 Fedora-Commons – http://fedora-commons.org
 XML – http://w...
Upcoming SlideShare
Loading in …5
×

247th ACS Meeting: The Eureka Research Workbench

662
-1

Published on

Academic scientists need a tool to capture the science they do so that it can be shared in open science, integrated with linked data, and shared/searched. Eureka is an evolving platform to do this.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
662
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

247th ACS Meeting: The Eureka Research Workbench

  1. 1. Eureka Research Workbench: An Open Source eScience Laboratory Notebook Stuart J. Chalk Department of Chemistry University of North Florida schalk@unf.edu 2014 Spring ACS Meeting – CINF Paper 38
  2. 2.  Big Data  Electronic Notebooks  The Eureka Research Workbench  Experiment Markup Language  ExptML Schema and Files  Semantic Data and Ontologies  File Storage  Eureka Interface  Web Interface  Conclusion Outline
  3. 3.  Current buzz word for “this bring together lots of data and build tools on top to extract knowledge”  This is great, except…  …how do we do that for science?  Platform, data structures, and exchange protocols to capture, identify, and disseminate scientific information  Research Data Alliance (https://rd-alliance.org/)  “Research Data Sharing without barriers”  Fran Berman at RPI is NSF funded co-chair of RDA Big Data
  4. 4.  Scientists need to move to digital notebooks…  ...and record not just the data but the flow and context  How science is done is important for searching, aggregation, meta-analysis  We need more than an electronic version of a notebook  We need a science version of “Second Life” (SciLife?) Electronic Notebooks
  5. 5.  Started in 2006 after getting involved in the Analytical Information Markup Language (AnIML) project  Store all research notes/data in a digital format  Capture the workflow of scientists  Writing in a lab notebook is equivalent to “multi-type” blogging in the digital world  How to capture information? Many data types! (ExptML)  How to store files “online”? (Fedora-Commons)  How to access files in the browser? (CakePHP)  How to represent laboratory resources? (ExptML)  How to link data together? RDF (in Fedora-Commons) Eureka Research Workbench
  6. 6.  A specification (written in XML) that describes different types of information recorded during the scientific process (http://exptml.sourceforge.net) Experiment Markup Language (ExptML)  Sample  Solution  Space  Specimen  Substance  Task  Template  Timeline  User  Vendor  Annotation  Api  Calculation  Chemical  Citation  Customer  Data  Dataset  Definition  Element  Equipment  Event  Experiment  Group  Message  Project  Protocol  Quote  Report  Result
  7. 7. ExptML Chemical Schema
  8. 8. ExptML Chemical Schema
  9. 9. ExptML Chemical (Instance)
  10. 10.  Data are connected to other data – ‘Linked Data’ (http://www.w3.org/standards/semanticweb/data)  The ‘Semantic Web’ approach to contextualize data  Proposed storage of ‘relationships’ between data is the Resource Description Format (RDF - http://www.w3.org/RDF/) Semantic Data
  11. 11.  Digital repository software http://fedora-commons.org/  Creation and management of online digital libraries  Fedora ‘Digital Object’ consists of metadata + streams  Metadata stored as Dublin Core (DC stream)  ExptML file stored as EXPTML stream  Other files (PDFs, Images, Word etc.) stored as streams  Relationships stored as RDF (RELS-EXT stream)  Features: Version control, Checksumming, Archiving  Built-in search of objects and relationships  Add-on for file content search (Fedora GSearch) Fedora Commons
  12. 12.  Fedora-Commons defines and works on digital objects  In the definition of a Fedora object an ExptML file is just one stream of many. By default each object also has a “DC” stream of metadata and an “RELS-EXT” stream of relationships  Each Fedora object can have any number of additional streams for  Paper PDFs, product/sample pictures, binary file formats (if a conversion has been done)  Video, audio, RDF, anything…  You can export individual streams or the whole Fedora object with streams binary encoded (Sharing/archiving) Fedora for File Storage
  13. 13. Fedora Object Storage
  14. 14.  Web interface written in PHP using the CakePHP Framework  Communicates with Fedora-Commons API to create, retrieve, update and delete (CRUD) ExptML and other files  Representational State Transfer (REST) format for URLs  E.g. http://example.com/chemicals/view/exptml:chm1  Creation of ExptML via interface  Provides search via Fedora and Gsearch  Can extract data out of XML files  Can gather data from other websites (via API controller) and integrate into ExptML files Eureka Web Application
  15. 15. Eureka Website – Group View Only data types related to the research group show up on left
  16. 16. Eureka Website – Bench View Clicking on the “Add” menu on the right allows you add a comment or link to data
  17. 17. Eureka Website – Notebook View
  18. 18. Eureka Website – Laboratory View The “Rel” menu shows you the information related to this instrument
  19. 19. Eureka Website – Library View You can add the PDF of the paper to the citation. The contents of the PDF are searchable in the system
  20. 20. Eureka Website – Stockroom View
  21. 21.  Web Application  Server: Fedora 4, JSON-LD, ElasticSearch  Client: CakePHP 3/HTML5, Recline.js, Annotator, JQuery  Standards  Linked Data Platform (http://www.w3.org/TR/ldp/)  Datapackage/Simple Data Format (http://dataprotocols.org/)  Markup Languages: AnIML, UnitsML, CML  Other Molecular File Formats: MOL/SDF/CDX/CIF/PDB etc.  Open Framework for Laboratory Data (Allotrope Foundation)  Datasources  ChemSpider, CIR, PubChem, Google Scholar, CrossRef, VIVO  ExchangeNetwork (EPA), NIST, SDBS (no API’s yet)  Tools  Marvin for JS, JSXGraph, JSpecView, Chemicalize.org Eureka Technology Stack
  22. 22.  Implement ingest of all data types, file (if appropriate) and web based  In browser processing of data -> dataset -> result, report writing  Extraction of file based legacy data -> ExptML format data  Open access to data/spectra, ‘available data’ page (browser only)  Access to data/spectra via linked data server (discovery/indexing)  Publishing of packaged datasets with authenticated download option  Automated ingestion of data from instruments/sensors  Collaborative research: authentication and data exchange  Timeframe? Depends on securing funding Eureka Roadmap
  23. 23.  Eureka: Web application to create ExptML files  Built on ExptML to capture data/resources/workflows  Reliable storage/archiving system for ExptML files (Fedora)  Storage of relationships between data (RDF)  TODO  Provide mechanism for sharing of data (different levels)  Add tools to find, visualize and work on science data  Integration into the RDA model for sharing research data  Get the word out and test system with many users Conclusion
  24. 24. References  Eureka – http://sourceforge.net/projects/eureka  Fedora-Commons – http://fedora-commons.org  XML – http://www.w3.org/standards/xml  AnIML – http://animl.sourceforge.net  ExptML – http://exptml.sourceforge.net/  UnitsML – http://unitsml.nist.gov/  CML – http://www.xml-cml.org/  JSON-LD – http://www.w3.org/TR/json-ld/  RDF – http://www.w3.org/RDF/  CIR – http://cactus.nci.nih.gov/chemical/structure  RDA – http://rd-alliance.org  ChemSpider – http://www.chemspider.com/  Allotrope Foundation – http://allotrope.org
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×