SlideShare a Scribd company logo
ALEX HENDERSON & PETER GARDNER MANCHESTER INSTITUTE OF BIOTECHNOLOGY UNIVERSITY OF MANCHESTER, UK HTTP://GARDNER-LAB.COM & HTTP://CLIRSPEC.ORG 
SPEC 2014 Shedding New Light on Disease 
Kraków, Poland. 17-22 August 2014 
WHAT’S MINE IS YOURS (AND VICE VERSA): DATA SHARING IN VIBRATIONAL SPECTROSCOPY
Sharing…
Why share? 
Technique validation 
Round-robins 
Standard spectra for unknown identification 
Standard operating procedure validation 
Test visualisation schemes 
Remote location of special samples 
Remote location of special equipment
What to share? 
Raw data files 
Eg. For testing data processing procedures 
Metadata for sample preparation 
Sample SOP 
Metadata for experimental procedure/protocol 
Acquisition SOP 
Processed data to save doing it yourself
What to share? 
Raw data files 
Eg. For testing data processing procedures 
Metadata for sample preparation 
Sample SOP 
Metadata for experimental procedure/protocol 
Acquisition SOP 
Processed data to save doing it yourself
How to give? 
Pen drive 
CD 
Email 
Dropbox 
ftp server 
Data repository 
One-to-one 
One-to-few 
One-to-more 
One-to-all 
Best solution
How to receive? 
Data in different file formats introduces a barrier to end user 
Disconnect between analysis software and file format 
Incorrectly/poorly coded formats require additional information 
(hyper)Spectral data disconnected from sample treatments or acquisition protocols
Third-party data analysis suites 
Package 
Author 
Platform 
CytoSpec 
Peter Lasch 
MATLAB 
hyperSpec 
Claudia Beleites 
R 
ProSpect 
Paul Bassan 
MATLAB 
SpecToolbox 
Matt Baker (and friends) 
MATLAB 
… 
Not an exhaustive list, email me your package info 
Author must write import filter for each version of each vendor’s formats
Writing import filters 
Slow 
Laborious 
Steep learning curve 
Potential for error 
Incomplete filter without sufficient test data 
No access to file format specification/detail 
IP issues with proprietary formats (NDA) 
Some limited to (32-bit) Windows (eg. DLL or DDE)
Objectives 2014 – 2017 
Developing 
Understanding of interaction of light with clinical samples 
Strategies for pre-processing and statistical analysis in clinical spectroscopy 
Protocols 
Preparation of cells, tissue and biofluids for clinical spectroscopy 
Inter-group data sharing 
Evidence 
Power of spectroscopy for use in the clinical arena 
Requirements of instrumentation suitable for use in the clinic 
Clinical Infrared and Raman Spectroscopy for Medical Diagnosis 
PARTNERS 
ACADEMIC 
Peter Gardner 
Matthew J Baker 
Nicholas Stone 
Julian Moger 
Josep Sulé-Suso 
Francis Martin 
Sergei G Kazarian 
Hugh J Byrne 
Roy Goodacre 
John M Chalmers 
Alex Henderson 
Peter Lasch 
Ganesh Sockalingum 
Bayden Wood 
Peter Weightman 
Gianfelice Cinque 
Peter Rich 
CLINICAL 
Noel Clarke 
Jonathan Shanks 
Timothy Dawson 
Charles Davis 
Pierre Martin-Hirsch 
Hugh Barr 
Neil Shepherd 
John McGrath 
Jim Brown 
Sam Janes 
INDUSTRIAL 
Agilent 
Bruker 
Cobalt Light Systems 
Coherent UK 
Perkin Elmer 
Renishaw 
@clirspec 
http://clirspec.org/
CLIRSPEC Work Package 6 
Assess current spectral and image data attributes from the range of currently employed network instrumentation 
Develop a standard data transfer format to allow free and easy dissemination of data between network members enhancing collaboration and efficiency of research funding 
Provide a single software target, easing the development of third party software and its uptake within the clinical arena 
Investigate the utility of standard spectra for specific diseases 
Investigate the technological, cultural, ethical and IP issues in order to enable data sharing and reuse
CLIRSPEC Work Package 6 
Assess current spectral and image data attributes from the range of currently employed network instrumentation 
Develop a standard data transfer format to allow free and easy dissemination of data between network members enhancing collaboration and efficiency of research funding 
Provide a single software target, easing the development of third party software and its uptake within the clinical arena 
Investigate the utility of standard spectra for specific diseases 
Investigate the technological, cultural, ethical and IP issues in order to enable data sharing and reuse
Data format requirements 
Operating system neutral 
Scalable to large file sizes (futureproof) 
Random access (don’t unzip before reading) 
File format description available (NDA open) 
Other software available that can read it 
Quick to write and, more importantly, quick to read 
Able to hold (encrypted) instrumental parameters 
Enables round-tripping, no information loss 
…
Open data formats – Spectra 
JCAMP-DX 
Over 4 compression systems 
Some code available 
Grams SPC 
Understands spectroscopy types and units 
Some import filters available 
CSV/text 
Simple to read 
Not scalable 
Not suitable for images 
Loss of metadata
Hyperspectral images 
Grams SPC 
Pixel indexing issues, needs help 
ENVI 
Manual spectrum-centric or image-centric access 
May require IDL library 
NetCDF-4 
Self-describing, accessed via libraries 
Compression and streaming available
3D confocal and tomographic 
NetCDF-4 
Unlimited dimensionality 
Optimised spectrum-centric or image-centric access through ‘chunking’ 
Supported
Community input required 
Data types that need to be supported 
Irregularly shaped images 
Collections of spectra 
Discrete wavelength data (multispectral not hyperspectral) 
Time course (multiple dependent variables) 
Software 
Filters written, format testing etc. 
THINKING and PLANNING!!
Registration at http://clirspec.org
Groups at http://clirspec.org
Updates at http://clirspec.org
Remember…

More Related Content

What's hot

Predict Conference: Data Analytics for Digital Forensics and Cybersecurity
Predict Conference: Data Analytics for Digital Forensics and CybersecurityPredict Conference: Data Analytics for Digital Forensics and Cybersecurity
Predict Conference: Data Analytics for Digital Forensics and Cybersecurity
Mark Scanlon
 
Smith T Bio Hdf Bosc2008
Smith T Bio Hdf Bosc2008Smith T Bio Hdf Bosc2008
Smith T Bio Hdf Bosc2008bosc_2008
 
Open PHACTS for BDE SC1.1
Open PHACTS for BDE SC1.1Open PHACTS for BDE SC1.1
Open PHACTS for BDE SC1.1
BigData_Europe
 
IEEE 2014 DOTNET PARALLEL DISTRIBUTED PROJECTS Signature searching in a netwo...
IEEE 2014 DOTNET PARALLEL DISTRIBUTED PROJECTS Signature searching in a netwo...IEEE 2014 DOTNET PARALLEL DISTRIBUTED PROJECTS Signature searching in a netwo...
IEEE 2014 DOTNET PARALLEL DISTRIBUTED PROJECTS Signature searching in a netwo...
IEEEMEMTECHSTUDENTPROJECTS
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how
Carole Goble
 
A practical guide to practicing open science
A practical guide to practicing open scienceA practical guide to practicing open science
A practical guide to practicing open science
Krzysztof Gorgolewski
 
Evaluation of the importance of standards for data and metadata exchange for ...
Evaluation of the importance of standards for data and metadata exchange for ...Evaluation of the importance of standards for data and metadata exchange for ...
Evaluation of the importance of standards for data and metadata exchange for ...
Wolfgang Kuchinke
 
Parallel and Distributed Information Retrieval System
Parallel and Distributed Information Retrieval SystemParallel and Distributed Information Retrieval System
Parallel and Distributed Information Retrieval System
vimalsura
 
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...
Kathleen Jagodnik
 
Where is the opportunity for libraries in the collaborative data infrastructure?
Where is the opportunity for libraries in the collaborative data infrastructure?Where is the opportunity for libraries in the collaborative data infrastructure?
Where is the opportunity for libraries in the collaborative data infrastructure?
LIBER Europe
 
The Role of OAIS Representation Information in the Digital Curation of Crysta...
The Role of OAIS Representation Information in the Digital Curation of Crysta...The Role of OAIS Representation Information in the Digital Curation of Crysta...
The Role of OAIS Representation Information in the Digital Curation of Crysta...
ManjulaPatel
 
2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinar2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinar
Pistoia Alliance
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
Carole Goble
 
Take a Lesson From the Research World - Strata OLC
Take a Lesson From the Research World - Strata OLCTake a Lesson From the Research World - Strata OLC
Take a Lesson From the Research World - Strata OLCKaitlin Thaney
 
SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...
Natalie Stanford
 
Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth Science
Raul Palma
 
Tools and approaches for data deposition into nanomaterial databases
Tools and approaches for data deposition into nanomaterial databasesTools and approaches for data deposition into nanomaterial databases
Tools and approaches for data deposition into nanomaterial databases
Valery Tkachenko
 
USUGM 2014 - Zhengwei Peng (Merck): In-depth analysis of patent molecular spa...
USUGM 2014 - Zhengwei Peng (Merck): In-depth analysis of patent molecular spa...USUGM 2014 - Zhengwei Peng (Merck): In-depth analysis of patent molecular spa...
USUGM 2014 - Zhengwei Peng (Merck): In-depth analysis of patent molecular spa...
ChemAxon
 
RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research Objects
Carole Goble
 

What's hot (19)

Predict Conference: Data Analytics for Digital Forensics and Cybersecurity
Predict Conference: Data Analytics for Digital Forensics and CybersecurityPredict Conference: Data Analytics for Digital Forensics and Cybersecurity
Predict Conference: Data Analytics for Digital Forensics and Cybersecurity
 
Smith T Bio Hdf Bosc2008
Smith T Bio Hdf Bosc2008Smith T Bio Hdf Bosc2008
Smith T Bio Hdf Bosc2008
 
Open PHACTS for BDE SC1.1
Open PHACTS for BDE SC1.1Open PHACTS for BDE SC1.1
Open PHACTS for BDE SC1.1
 
IEEE 2014 DOTNET PARALLEL DISTRIBUTED PROJECTS Signature searching in a netwo...
IEEE 2014 DOTNET PARALLEL DISTRIBUTED PROJECTS Signature searching in a netwo...IEEE 2014 DOTNET PARALLEL DISTRIBUTED PROJECTS Signature searching in a netwo...
IEEE 2014 DOTNET PARALLEL DISTRIBUTED PROJECTS Signature searching in a netwo...
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how
 
A practical guide to practicing open science
A practical guide to practicing open scienceA practical guide to practicing open science
A practical guide to practicing open science
 
Evaluation of the importance of standards for data and metadata exchange for ...
Evaluation of the importance of standards for data and metadata exchange for ...Evaluation of the importance of standards for data and metadata exchange for ...
Evaluation of the importance of standards for data and metadata exchange for ...
 
Parallel and Distributed Information Retrieval System
Parallel and Distributed Information Retrieval SystemParallel and Distributed Information Retrieval System
Parallel and Distributed Information Retrieval System
 
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...
 
Where is the opportunity for libraries in the collaborative data infrastructure?
Where is the opportunity for libraries in the collaborative data infrastructure?Where is the opportunity for libraries in the collaborative data infrastructure?
Where is the opportunity for libraries in the collaborative data infrastructure?
 
The Role of OAIS Representation Information in the Digital Curation of Crysta...
The Role of OAIS Representation Information in the Digital Curation of Crysta...The Role of OAIS Representation Information in the Digital Curation of Crysta...
The Role of OAIS Representation Information in the Digital Curation of Crysta...
 
2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinar2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinar
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
 
Take a Lesson From the Research World - Strata OLC
Take a Lesson From the Research World - Strata OLCTake a Lesson From the Research World - Strata OLC
Take a Lesson From the Research World - Strata OLC
 
SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...
 
Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth Science
 
Tools and approaches for data deposition into nanomaterial databases
Tools and approaches for data deposition into nanomaterial databasesTools and approaches for data deposition into nanomaterial databases
Tools and approaches for data deposition into nanomaterial databases
 
USUGM 2014 - Zhengwei Peng (Merck): In-depth analysis of patent molecular spa...
USUGM 2014 - Zhengwei Peng (Merck): In-depth analysis of patent molecular spa...USUGM 2014 - Zhengwei Peng (Merck): In-depth analysis of patent molecular spa...
USUGM 2014 - Zhengwei Peng (Merck): In-depth analysis of patent molecular spa...
 
RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research Objects
 

Similar to What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy

On the large scale of studying dynamics with MEG: Lessons learned from the Hu...
On the large scale of studying dynamics with MEG: Lessons learned from the Hu...On the large scale of studying dynamics with MEG: Lessons learned from the Hu...
On the large scale of studying dynamics with MEG: Lessons learned from the Hu...
Robert Oostenveld
 
Integrated research data management in the Structural Sciences
Integrated research data management in the Structural SciencesIntegrated research data management in the Structural Sciences
Integrated research data management in the Structural Sciences
ManjulaPatel
 
LABORATORY INFORMATION SYSTEM RADIOLOGY INFORMATION SYSTEM
LABORATORY INFORMATION SYSTEM RADIOLOGY INFORMATION SYSTEMLABORATORY INFORMATION SYSTEM RADIOLOGY INFORMATION SYSTEM
LABORATORY INFORMATION SYSTEM RADIOLOGY INFORMATION SYSTEM
Aj Raj
 
FAIRer Research
FAIRer ResearchFAIRer Research
FAIRer Research
Carole Goble
 
pacs picture archeving comunication system instrumentation
pacs picture archeving comunication system instrumentationpacs picture archeving comunication system instrumentation
pacs picture archeving comunication system instrumentation
Darshan Reddy
 
eTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service PlatformeTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service Platformibemam
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptbutest
 
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
Syed Ahmad Chan Bukhari, PhD
 
Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...
Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...
Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...Ahmad C. Bukhari
 
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BigData_Europe
 
Importance of data standards and system validation of software for clinical r...
Importance of data standards and system validation of software for clinical r...Importance of data standards and system validation of software for clinical r...
Importance of data standards and system validation of software for clinical r...
Wolfgang Kuchinke
 
Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!
Carole Goble
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Ian Foster
 
ProteomeXchange: data deposition and data retrieval made easy
ProteomeXchange: data deposition and data retrieval made easyProteomeXchange: data deposition and data retrieval made easy
ProteomeXchange: data deposition and data retrieval made easy
Juan Antonio Vizcaino
 
UK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceUK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalface
LizLyon
 
Donders neuroimage toolkit - open science and good practices
Donders neuroimage toolkit -  open science and good practicesDonders neuroimage toolkit -  open science and good practices
Donders neuroimage toolkit - open science and good practices
Robert Oostenveld
 
Semantically-Enabled Digital Investigations - Research Overview
Semantically-Enabled Digital Investigations - Research OverviewSemantically-Enabled Digital Investigations - Research Overview
Semantically-Enabled Digital Investigations - Research Overview
inbroker
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data Commons
Vivien Bonazzi
 
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
OSTHUS
 
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
aceas13tern
 

Similar to What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy (20)

On the large scale of studying dynamics with MEG: Lessons learned from the Hu...
On the large scale of studying dynamics with MEG: Lessons learned from the Hu...On the large scale of studying dynamics with MEG: Lessons learned from the Hu...
On the large scale of studying dynamics with MEG: Lessons learned from the Hu...
 
Integrated research data management in the Structural Sciences
Integrated research data management in the Structural SciencesIntegrated research data management in the Structural Sciences
Integrated research data management in the Structural Sciences
 
LABORATORY INFORMATION SYSTEM RADIOLOGY INFORMATION SYSTEM
LABORATORY INFORMATION SYSTEM RADIOLOGY INFORMATION SYSTEMLABORATORY INFORMATION SYSTEM RADIOLOGY INFORMATION SYSTEM
LABORATORY INFORMATION SYSTEM RADIOLOGY INFORMATION SYSTEM
 
FAIRer Research
FAIRer ResearchFAIRer Research
FAIRer Research
 
pacs picture archeving comunication system instrumentation
pacs picture archeving comunication system instrumentationpacs picture archeving comunication system instrumentation
pacs picture archeving comunication system instrumentation
 
eTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service PlatformeTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service Platform
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
 
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
 
Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...
Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...
Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...
 
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
 
Importance of data standards and system validation of software for clinical r...
Importance of data standards and system validation of software for clinical r...Importance of data standards and system validation of software for clinical r...
Importance of data standards and system validation of software for clinical r...
 
Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
 
ProteomeXchange: data deposition and data retrieval made easy
ProteomeXchange: data deposition and data retrieval made easyProteomeXchange: data deposition and data retrieval made easy
ProteomeXchange: data deposition and data retrieval made easy
 
UK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceUK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalface
 
Donders neuroimage toolkit - open science and good practices
Donders neuroimage toolkit -  open science and good practicesDonders neuroimage toolkit -  open science and good practices
Donders neuroimage toolkit - open science and good practices
 
Semantically-Enabled Digital Investigations - Research Overview
Semantically-Enabled Digital Investigations - Research OverviewSemantically-Enabled Digital Investigations - Research Overview
Semantically-Enabled Digital Investigations - Research Overview
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data Commons
 
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
 
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
 

More from Alex Henderson

FAIRSpectra - Towards a common data file format for SIMS images
FAIRSpectra - Towards a common data file format for SIMS imagesFAIRSpectra - Towards a common data file format for SIMS images
FAIRSpectra - Towards a common data file format for SIMS images
Alex Henderson
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
Alex Henderson
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
Alex Henderson
 
Hyperspectral Data Issues
Hyperspectral Data IssuesHyperspectral Data Issues
Hyperspectral Data Issues
Alex Henderson
 
The Class Imbalance Problem: AdaBoost to the Rescue?
The Class Imbalance Problem: AdaBoost to the Rescue?The Class Imbalance Problem: AdaBoost to the Rescue?
The Class Imbalance Problem: AdaBoost to the Rescue?
Alex Henderson
 
Getting started with chemometric classification
Getting started with chemometric classificationGetting started with chemometric classification
Getting started with chemometric classification
Alex Henderson
 
Too good to be true? How validate your data
Too good to be true? How validate your dataToo good to be true? How validate your data
Too good to be true? How validate your data
Alex Henderson
 
2020 Vision (Dubious Design Decisions)
2020 Vision (Dubious Design Decisions)2020 Vision (Dubious Design Decisions)
2020 Vision (Dubious Design Decisions)
Alex Henderson
 
To bag, or to boost? A question of balance
To bag, or to boost? A question of balanceTo bag, or to boost? A question of balance
To bag, or to boost? A question of balance
Alex Henderson
 
Digging into Data: Analysis and Visualisation in 3D
Digging into Data: Analysis and Visualisation in 3DDigging into Data: Analysis and Visualisation in 3D
Digging into Data: Analysis and Visualisation in 3D
Alex Henderson
 
Rise of the Machines: The Use of Machine Learning in SIMS Data Analysis
Rise of the Machines: The Use of Machine Learning in SIMS Data AnalysisRise of the Machines: The Use of Machine Learning in SIMS Data Analysis
Rise of the Machines: The Use of Machine Learning in SIMS Data Analysis
Alex Henderson
 
How to validate your model
How to validate your modelHow to validate your model
How to validate your model
Alex Henderson
 
Interpretation of Static SIMS Spectra
Interpretation of Static SIMS SpectraInterpretation of Static SIMS Spectra
Interpretation of Static SIMS Spectra
Alex Henderson
 
Secondary Ion Mass Spectrometry
Secondary Ion Mass SpectrometrySecondary Ion Mass Spectrometry
Secondary Ion Mass Spectrometry
Alex Henderson
 

More from Alex Henderson (14)

FAIRSpectra - Towards a common data file format for SIMS images
FAIRSpectra - Towards a common data file format for SIMS imagesFAIRSpectra - Towards a common data file format for SIMS images
FAIRSpectra - Towards a common data file format for SIMS images
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Hyperspectral Data Issues
Hyperspectral Data IssuesHyperspectral Data Issues
Hyperspectral Data Issues
 
The Class Imbalance Problem: AdaBoost to the Rescue?
The Class Imbalance Problem: AdaBoost to the Rescue?The Class Imbalance Problem: AdaBoost to the Rescue?
The Class Imbalance Problem: AdaBoost to the Rescue?
 
Getting started with chemometric classification
Getting started with chemometric classificationGetting started with chemometric classification
Getting started with chemometric classification
 
Too good to be true? How validate your data
Too good to be true? How validate your dataToo good to be true? How validate your data
Too good to be true? How validate your data
 
2020 Vision (Dubious Design Decisions)
2020 Vision (Dubious Design Decisions)2020 Vision (Dubious Design Decisions)
2020 Vision (Dubious Design Decisions)
 
To bag, or to boost? A question of balance
To bag, or to boost? A question of balanceTo bag, or to boost? A question of balance
To bag, or to boost? A question of balance
 
Digging into Data: Analysis and Visualisation in 3D
Digging into Data: Analysis and Visualisation in 3DDigging into Data: Analysis and Visualisation in 3D
Digging into Data: Analysis and Visualisation in 3D
 
Rise of the Machines: The Use of Machine Learning in SIMS Data Analysis
Rise of the Machines: The Use of Machine Learning in SIMS Data AnalysisRise of the Machines: The Use of Machine Learning in SIMS Data Analysis
Rise of the Machines: The Use of Machine Learning in SIMS Data Analysis
 
How to validate your model
How to validate your modelHow to validate your model
How to validate your model
 
Interpretation of Static SIMS Spectra
Interpretation of Static SIMS SpectraInterpretation of Static SIMS Spectra
Interpretation of Static SIMS Spectra
 
Secondary Ion Mass Spectrometry
Secondary Ion Mass SpectrometrySecondary Ion Mass Spectrometry
Secondary Ion Mass Spectrometry
 

Recently uploaded

Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
kumarmathi863
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
muralinath2
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
ssuserbfdca9
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
sonaliswain16
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
IvanMallco1
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
subedisuryaofficial
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
ossaicprecious19
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
AlguinaldoKong
 

Recently uploaded (20)

Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
 

What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy

  • 1. ALEX HENDERSON & PETER GARDNER MANCHESTER INSTITUTE OF BIOTECHNOLOGY UNIVERSITY OF MANCHESTER, UK HTTP://GARDNER-LAB.COM & HTTP://CLIRSPEC.ORG SPEC 2014 Shedding New Light on Disease Kraków, Poland. 17-22 August 2014 WHAT’S MINE IS YOURS (AND VICE VERSA): DATA SHARING IN VIBRATIONAL SPECTROSCOPY
  • 3. Why share? Technique validation Round-robins Standard spectra for unknown identification Standard operating procedure validation Test visualisation schemes Remote location of special samples Remote location of special equipment
  • 4. What to share? Raw data files Eg. For testing data processing procedures Metadata for sample preparation Sample SOP Metadata for experimental procedure/protocol Acquisition SOP Processed data to save doing it yourself
  • 5. What to share? Raw data files Eg. For testing data processing procedures Metadata for sample preparation Sample SOP Metadata for experimental procedure/protocol Acquisition SOP Processed data to save doing it yourself
  • 6. How to give? Pen drive CD Email Dropbox ftp server Data repository One-to-one One-to-few One-to-more One-to-all Best solution
  • 7. How to receive? Data in different file formats introduces a barrier to end user Disconnect between analysis software and file format Incorrectly/poorly coded formats require additional information (hyper)Spectral data disconnected from sample treatments or acquisition protocols
  • 8. Third-party data analysis suites Package Author Platform CytoSpec Peter Lasch MATLAB hyperSpec Claudia Beleites R ProSpect Paul Bassan MATLAB SpecToolbox Matt Baker (and friends) MATLAB … Not an exhaustive list, email me your package info Author must write import filter for each version of each vendor’s formats
  • 9. Writing import filters Slow Laborious Steep learning curve Potential for error Incomplete filter without sufficient test data No access to file format specification/detail IP issues with proprietary formats (NDA) Some limited to (32-bit) Windows (eg. DLL or DDE)
  • 10. Objectives 2014 – 2017 Developing Understanding of interaction of light with clinical samples Strategies for pre-processing and statistical analysis in clinical spectroscopy Protocols Preparation of cells, tissue and biofluids for clinical spectroscopy Inter-group data sharing Evidence Power of spectroscopy for use in the clinical arena Requirements of instrumentation suitable for use in the clinic Clinical Infrared and Raman Spectroscopy for Medical Diagnosis PARTNERS ACADEMIC Peter Gardner Matthew J Baker Nicholas Stone Julian Moger Josep Sulé-Suso Francis Martin Sergei G Kazarian Hugh J Byrne Roy Goodacre John M Chalmers Alex Henderson Peter Lasch Ganesh Sockalingum Bayden Wood Peter Weightman Gianfelice Cinque Peter Rich CLINICAL Noel Clarke Jonathan Shanks Timothy Dawson Charles Davis Pierre Martin-Hirsch Hugh Barr Neil Shepherd John McGrath Jim Brown Sam Janes INDUSTRIAL Agilent Bruker Cobalt Light Systems Coherent UK Perkin Elmer Renishaw @clirspec http://clirspec.org/
  • 11. CLIRSPEC Work Package 6 Assess current spectral and image data attributes from the range of currently employed network instrumentation Develop a standard data transfer format to allow free and easy dissemination of data between network members enhancing collaboration and efficiency of research funding Provide a single software target, easing the development of third party software and its uptake within the clinical arena Investigate the utility of standard spectra for specific diseases Investigate the technological, cultural, ethical and IP issues in order to enable data sharing and reuse
  • 12. CLIRSPEC Work Package 6 Assess current spectral and image data attributes from the range of currently employed network instrumentation Develop a standard data transfer format to allow free and easy dissemination of data between network members enhancing collaboration and efficiency of research funding Provide a single software target, easing the development of third party software and its uptake within the clinical arena Investigate the utility of standard spectra for specific diseases Investigate the technological, cultural, ethical and IP issues in order to enable data sharing and reuse
  • 13. Data format requirements Operating system neutral Scalable to large file sizes (futureproof) Random access (don’t unzip before reading) File format description available (NDA open) Other software available that can read it Quick to write and, more importantly, quick to read Able to hold (encrypted) instrumental parameters Enables round-tripping, no information loss …
  • 14. Open data formats – Spectra JCAMP-DX Over 4 compression systems Some code available Grams SPC Understands spectroscopy types and units Some import filters available CSV/text Simple to read Not scalable Not suitable for images Loss of metadata
  • 15. Hyperspectral images Grams SPC Pixel indexing issues, needs help ENVI Manual spectrum-centric or image-centric access May require IDL library NetCDF-4 Self-describing, accessed via libraries Compression and streaming available
  • 16. 3D confocal and tomographic NetCDF-4 Unlimited dimensionality Optimised spectrum-centric or image-centric access through ‘chunking’ Supported
  • 17. Community input required Data types that need to be supported Irregularly shaped images Collections of spectra Discrete wavelength data (multispectral not hyperspectral) Time course (multiple dependent variables) Software Filters written, format testing etc. THINKING and PLANNING!!