SlideShare a Scribd company logo
2020 VISION
ALEX HENDERSON
UNIVERSITY OF MANCHESTER
THE INTERNATIONAL SOCIETY FOR CLINICAL SPECTROSCOPY
SURFACESPECTRA LIMITED alexhenderson.info
@AlexHenderson00
@ChiToolbox
Kick-off Meeting
and Hackathon
3-6 Feb, 2020
Ruhr-University Bochum
DUBIOUS
DESIGN
DECISIONS
ALEX HENDERSON
UNIVERSITY OF MANCHESTER
THE INTERNATIONAL SOCIETY FOR CLINICAL SPECTROSCOPY
SURFACESPECTRA LIMITED alexhenderson.info
@AlexHenderson00
@ChiToolbox
Kick-off Meeting
and Hackathon
3-6 Feb, 2020
Ruhr-University Bochum
http://clirspec.org
@clirspec
34 members on
steering council
Travel bursaries available
http://spec2020.com
17–22 May, 2020
Summer School
7–10 July, 2020
https://springscix.org/
6–9 April, 2020
Travel bursaries available
22–27 August, 2021
CLIRSPEC DATA
Online community for us to share
algorithms, code and ideas
Hosted on Slack
Request an invitation to join
Any member can add anyone else
http:// tiny.cc / clirspec-data
CHITOOLBOX
•https://bitbucket.org/AlexHenderson/chitoolbox
• Open source MATLAB toolbox
• Infrared, Raman, secondary ion mass spectrometry (SIMS)
• Spectra and hyperspectral images
• Library, not a GUI (only 1-2 dialog boxes)
• Object oriented design
@ChiToolbox
DESIGN DECISIONS
• What worked?
• What did not work?
• What were the compromises?
• What would I do differently?
OBJECT ORIENTED PROGRAMMING (OOP)
• Abstract base classes for spectra, spectral collections and images
• Concrete classes for above
• ‘Interface’ classes for ‘Raman character’, ‘IR character’ etc.
• Multiple inheritance to define technique specific classes
• eg. IRSpectrum, RamanImage, (ToF)MSSpectralCollection
• Separate classes for pictures, RMieS options, PCA or RF models etc.
• Model using classes where possible
• Provides type-identification and bespoke functionality
FILE FORMATS
• Agilent (FTIR)
• Single FTIR images and mosaicked FTIR images
• Biotof (ToFSIMS)
• Spectra, hyperspectral image files
• Bruker (FTIR)
• Opus files and multiple spectra exported as a MAT file
• Ionoptika (ToFSIMS)
• Hyperspectral image files exported in HDF5 format
• Mettler Toledo (FTIR)
• Spectra exported in ASCII
• Renishaw (Raman)
• WiRE Version 4, spectral and hyperspectral images
• Thermo Fisher Scientific GRAMS SPC (Generic)
• Data stored in spc files
Single files can be read using ChiFile. This works out the file format automatically.
In addition
Readable, but unreleased
• Photothermal (FTIR and Raman)
• mIRage spectra and hyperspectral images
• IONTOF (ToFSIMS)
• Hyperspectral images in grd format
FILE FORMAT ISSUES
• Some formats were hacked eg. Agilent
• What if example files were specific to certain instrumentation?
• Some formats are multi-purpose
• Some formats hold only one data type (spectrum, line scan, image etc)
• Eg. Agilent single tile format
• Some formats can contain any of these data types
• Eg. Renishaw
• If we read multiple files, what should we do if their contents are of different types?
REUSE OF EXTERNAL CODE (CREDITED)
• Perceptually uniform
colormaps
• error_ellipse
• GSTools
• shadedErrorBar
• mksqlite
• sql_object
• cividis
• sgolayfilt
• RMieS
• cluster-toolbox
• getSubclasses
• GUI Layout Toolbox
• DataHash
• GetFullPath
• m2html
• ImportOpus
• ME-EMSC
• Thresholding Tool
• ENVI file reader/writer
• read_envihdr
SOFTWARE LICENCE
• ChiToolbox released under GNU General Public License 3.0 (GPL)
• External code is GPL, or more liberal (eg. MIT)
• GPL ‘infects’ the codebase
• User must release any code that intrinsically links to this code
• Prefer GNU Lesser General Public License (LGPL)
• Your codebase is not affected, but changes must be shared
• Unfortunately, LGPL and GPL are not compatible
MATLAB ISSUES
• Tried to make backwardly compatible with R2009a
• Too painful
• Roughly compatible with R2016a
• Trying to reduce toolbox dependencies (eg. Statistics toolbox)
• MATLAB OOP not great
• Variables pass by value, but handle classes pass by reference. Makes copying difficult
• Rolled my own deep copy mechanism (clone)
DATA TYPES
• Single spectrum, spectral collection, hyperspectral image
• Continuous data
• Did not consider multispectral data (discrete wavenumber)
• Discontinuous, cannot take first derivative etc.
• Data is a property of the object, not a pointer/function to a data storage type
METADATA
• Separate class from data type
• Automatically label plots (eg PCA scores)
• Build lists of labels manually
labels = ChiClassMembership('mylabels','beta',1, 'gamma',2, 'beta',3, 'alpha',2);
• Automatically read from specially designed Excel spreadsheet
• Handles logical, category and numeric types
• Need to remove label from metadata if removing spectrum from collection
Users not sure of difference between numeric and
category types, when using numbered samples
DEFAULTS
• Try to provide ‘reasonable’ default values
• PCA denoising defaults to 30% of PCs retained
• Random Forest defaults to 80% training and 20% test sets
• Should default to 5-fold cross validation, but takes time
• Random Forest defaults to using parallel processing if data set is large
• MATLAB is slow to initialise worker pool
• All parameters are user-configurable
VISUALISATION
• Graphics use perceptually neutral colormaps
• Caters for colour vision deficiency (colour blindness)
• Colour-mapped PCA image scores and loadings plots
• Dialog box for Raman baseline removal
• Asymmetric least squares baseline modelling requires user input
• Confidence limits on PCA/CVA* scores plots
• Default = 95%, but user variable
• RMieS iteration change plot
*Canonical variates analysis
PERCEPTUALLY NEUTRAL COLORMAPS
ORIGINAL, FULL COLOUR MATLAB JET COLORMAP
PERCEPTUALLY NEUTRAL COLORMAPS
ORIGINAL, FULL COLOUR MATLAB PARULA COLORMAP
PERCEPTUALLY NEUTRAL COLORMAPS
ORIGINAL, FULL COLOUR PYTHON VIRIDIS COLORMAP
Python’s Inferno, Magma, Plasma and Viridis colormaps implemented in MATLAB
2020 VISION
(IF I HAD A TIME MACHINE)
• Developed more tests
• Added support for discrete wavenumber data
• Separated data storage from data manipulation
• Used database (SQLite) to manage metadata
• Considered OOP for data storage, but functional programming for operations
2020 VISION
(IF I HAD A TIME MACHINE)
Write it all in Python!
…or C++

More Related Content

Similar to 2020 Vision (Dubious Design Decisions)

Matplotlib_Complete review_2021_abridged_version
Matplotlib_Complete review_2021_abridged_versionMatplotlib_Complete review_2021_abridged_version
Matplotlib_Complete review_2021_abridged_version
Bhaskar J.Roy
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companies
DataWorks Summit
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
Jose Quesada (hiring)
 
Etosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road mapEtosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road map
Dr. Mirko Kämpf
 
ACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data FramesACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data Frames
Wes McKinney
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...
DataWorks Summit
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Spark Summit
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Michael Rys
 
Engineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsEngineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platforms
Hisham Arafat
 
Real time monitoring-alerting: storing 2Tb of logs a day in Elasticsearch
Real time monitoring-alerting: storing 2Tb of logs a day in ElasticsearchReal time monitoring-alerting: storing 2Tb of logs a day in Elasticsearch
Real time monitoring-alerting: storing 2Tb of logs a day in Elasticsearch
Ali Kheyrollahi
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
Eric Rodriguez (Hiring in Lex)
 
Machine learning with R
Machine learning with RMachine learning with R
Machine learning with R
Maarten Smeets
 
Ontologies & linked open data
Ontologies & linked open dataOntologies & linked open data
Ontologies & linked open data
João Rocha da Silva
 
Don’t make me think: biodiversity data publishing made easy
Don’t make me think: biodiversity data publishing made easyDon’t make me think: biodiversity data publishing made easy
Don’t make me think: biodiversity data publishing made easy
Vince Smith
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for Developers
Michael Rys
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Jason Dai
 
Don't make me think: biodiversity data publishing made easy
Don't make me think: biodiversity data publishing made easyDon't make me think: biodiversity data publishing made easy
Don't make me think: biodiversity data publishing made easy
Vince Smith
 
Nisha talagala keynote_inflow_2016
Nisha talagala keynote_inflow_2016Nisha talagala keynote_inflow_2016
Nisha talagala keynote_inflow_2016
Nisha Talagala
 
Big data berlin
Big data berlinBig data berlin
Big data berlin
kammeyer
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
Richard Garris
 

Similar to 2020 Vision (Dubious Design Decisions) (20)

Matplotlib_Complete review_2021_abridged_version
Matplotlib_Complete review_2021_abridged_versionMatplotlib_Complete review_2021_abridged_version
Matplotlib_Complete review_2021_abridged_version
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companies
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
 
Etosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road mapEtosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road map
 
ACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data FramesACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data Frames
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
 
Engineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsEngineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platforms
 
Real time monitoring-alerting: storing 2Tb of logs a day in Elasticsearch
Real time monitoring-alerting: storing 2Tb of logs a day in ElasticsearchReal time monitoring-alerting: storing 2Tb of logs a day in Elasticsearch
Real time monitoring-alerting: storing 2Tb of logs a day in Elasticsearch
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 
Machine learning with R
Machine learning with RMachine learning with R
Machine learning with R
 
Ontologies & linked open data
Ontologies & linked open dataOntologies & linked open data
Ontologies & linked open data
 
Don’t make me think: biodiversity data publishing made easy
Don’t make me think: biodiversity data publishing made easyDon’t make me think: biodiversity data publishing made easy
Don’t make me think: biodiversity data publishing made easy
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for Developers
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
 
Don't make me think: biodiversity data publishing made easy
Don't make me think: biodiversity data publishing made easyDon't make me think: biodiversity data publishing made easy
Don't make me think: biodiversity data publishing made easy
 
Nisha talagala keynote_inflow_2016
Nisha talagala keynote_inflow_2016Nisha talagala keynote_inflow_2016
Nisha talagala keynote_inflow_2016
 
Big data berlin
Big data berlinBig data berlin
Big data berlin
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
 

More from Alex Henderson

FAIRSpectra - Towards a common data file format for SIMS images
FAIRSpectra - Towards a common data file format for SIMS imagesFAIRSpectra - Towards a common data file format for SIMS images
FAIRSpectra - Towards a common data file format for SIMS images
Alex Henderson
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
Alex Henderson
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
Alex Henderson
 
Hyperspectral Data Issues
Hyperspectral Data IssuesHyperspectral Data Issues
Hyperspectral Data Issues
Alex Henderson
 
The Class Imbalance Problem: AdaBoost to the Rescue?
The Class Imbalance Problem: AdaBoost to the Rescue?The Class Imbalance Problem: AdaBoost to the Rescue?
The Class Imbalance Problem: AdaBoost to the Rescue?
Alex Henderson
 
Getting started with chemometric classification
Getting started with chemometric classificationGetting started with chemometric classification
Getting started with chemometric classification
Alex Henderson
 
Too good to be true? How validate your data
Too good to be true? How validate your dataToo good to be true? How validate your data
Too good to be true? How validate your data
Alex Henderson
 
To bag, or to boost? A question of balance
To bag, or to boost? A question of balanceTo bag, or to boost? A question of balance
To bag, or to boost? A question of balance
Alex Henderson
 
Digging into Data: Analysis and Visualisation in 3D
Digging into Data: Analysis and Visualisation in 3DDigging into Data: Analysis and Visualisation in 3D
Digging into Data: Analysis and Visualisation in 3D
Alex Henderson
 
Rise of the Machines: The Use of Machine Learning in SIMS Data Analysis
Rise of the Machines: The Use of Machine Learning in SIMS Data AnalysisRise of the Machines: The Use of Machine Learning in SIMS Data Analysis
Rise of the Machines: The Use of Machine Learning in SIMS Data Analysis
Alex Henderson
 
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopyWhat's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
Alex Henderson
 
How to validate your model
How to validate your modelHow to validate your model
How to validate your model
Alex Henderson
 
Interpretation of Static SIMS Spectra
Interpretation of Static SIMS SpectraInterpretation of Static SIMS Spectra
Interpretation of Static SIMS Spectra
Alex Henderson
 
Secondary Ion Mass Spectrometry
Secondary Ion Mass SpectrometrySecondary Ion Mass Spectrometry
Secondary Ion Mass Spectrometry
Alex Henderson
 

More from Alex Henderson (14)

FAIRSpectra - Towards a common data file format for SIMS images
FAIRSpectra - Towards a common data file format for SIMS imagesFAIRSpectra - Towards a common data file format for SIMS images
FAIRSpectra - Towards a common data file format for SIMS images
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Hyperspectral Data Issues
Hyperspectral Data IssuesHyperspectral Data Issues
Hyperspectral Data Issues
 
The Class Imbalance Problem: AdaBoost to the Rescue?
The Class Imbalance Problem: AdaBoost to the Rescue?The Class Imbalance Problem: AdaBoost to the Rescue?
The Class Imbalance Problem: AdaBoost to the Rescue?
 
Getting started with chemometric classification
Getting started with chemometric classificationGetting started with chemometric classification
Getting started with chemometric classification
 
Too good to be true? How validate your data
Too good to be true? How validate your dataToo good to be true? How validate your data
Too good to be true? How validate your data
 
To bag, or to boost? A question of balance
To bag, or to boost? A question of balanceTo bag, or to boost? A question of balance
To bag, or to boost? A question of balance
 
Digging into Data: Analysis and Visualisation in 3D
Digging into Data: Analysis and Visualisation in 3DDigging into Data: Analysis and Visualisation in 3D
Digging into Data: Analysis and Visualisation in 3D
 
Rise of the Machines: The Use of Machine Learning in SIMS Data Analysis
Rise of the Machines: The Use of Machine Learning in SIMS Data AnalysisRise of the Machines: The Use of Machine Learning in SIMS Data Analysis
Rise of the Machines: The Use of Machine Learning in SIMS Data Analysis
 
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopyWhat's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
 
How to validate your model
How to validate your modelHow to validate your model
How to validate your model
 
Interpretation of Static SIMS Spectra
Interpretation of Static SIMS SpectraInterpretation of Static SIMS Spectra
Interpretation of Static SIMS Spectra
 
Secondary Ion Mass Spectrometry
Secondary Ion Mass SpectrometrySecondary Ion Mass Spectrometry
Secondary Ion Mass Spectrometry
 

Recently uploaded

Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Po-Chuan Chen
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
timhan337
 

Recently uploaded (20)

Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
 

2020 Vision (Dubious Design Decisions)

  • 1. 2020 VISION ALEX HENDERSON UNIVERSITY OF MANCHESTER THE INTERNATIONAL SOCIETY FOR CLINICAL SPECTROSCOPY SURFACESPECTRA LIMITED alexhenderson.info @AlexHenderson00 @ChiToolbox Kick-off Meeting and Hackathon 3-6 Feb, 2020 Ruhr-University Bochum
  • 2. DUBIOUS DESIGN DECISIONS ALEX HENDERSON UNIVERSITY OF MANCHESTER THE INTERNATIONAL SOCIETY FOR CLINICAL SPECTROSCOPY SURFACESPECTRA LIMITED alexhenderson.info @AlexHenderson00 @ChiToolbox Kick-off Meeting and Hackathon 3-6 Feb, 2020 Ruhr-University Bochum
  • 9. CLIRSPEC DATA Online community for us to share algorithms, code and ideas Hosted on Slack Request an invitation to join Any member can add anyone else http:// tiny.cc / clirspec-data
  • 10. CHITOOLBOX •https://bitbucket.org/AlexHenderson/chitoolbox • Open source MATLAB toolbox • Infrared, Raman, secondary ion mass spectrometry (SIMS) • Spectra and hyperspectral images • Library, not a GUI (only 1-2 dialog boxes) • Object oriented design @ChiToolbox
  • 11. DESIGN DECISIONS • What worked? • What did not work? • What were the compromises? • What would I do differently?
  • 12. OBJECT ORIENTED PROGRAMMING (OOP) • Abstract base classes for spectra, spectral collections and images • Concrete classes for above • ‘Interface’ classes for ‘Raman character’, ‘IR character’ etc. • Multiple inheritance to define technique specific classes • eg. IRSpectrum, RamanImage, (ToF)MSSpectralCollection • Separate classes for pictures, RMieS options, PCA or RF models etc. • Model using classes where possible • Provides type-identification and bespoke functionality
  • 13. FILE FORMATS • Agilent (FTIR) • Single FTIR images and mosaicked FTIR images • Biotof (ToFSIMS) • Spectra, hyperspectral image files • Bruker (FTIR) • Opus files and multiple spectra exported as a MAT file • Ionoptika (ToFSIMS) • Hyperspectral image files exported in HDF5 format • Mettler Toledo (FTIR) • Spectra exported in ASCII • Renishaw (Raman) • WiRE Version 4, spectral and hyperspectral images • Thermo Fisher Scientific GRAMS SPC (Generic) • Data stored in spc files Single files can be read using ChiFile. This works out the file format automatically. In addition Readable, but unreleased • Photothermal (FTIR and Raman) • mIRage spectra and hyperspectral images • IONTOF (ToFSIMS) • Hyperspectral images in grd format
  • 14. FILE FORMAT ISSUES • Some formats were hacked eg. Agilent • What if example files were specific to certain instrumentation? • Some formats are multi-purpose • Some formats hold only one data type (spectrum, line scan, image etc) • Eg. Agilent single tile format • Some formats can contain any of these data types • Eg. Renishaw • If we read multiple files, what should we do if their contents are of different types?
  • 15. REUSE OF EXTERNAL CODE (CREDITED) • Perceptually uniform colormaps • error_ellipse • GSTools • shadedErrorBar • mksqlite • sql_object • cividis • sgolayfilt • RMieS • cluster-toolbox • getSubclasses • GUI Layout Toolbox • DataHash • GetFullPath • m2html • ImportOpus • ME-EMSC • Thresholding Tool • ENVI file reader/writer • read_envihdr
  • 16. SOFTWARE LICENCE • ChiToolbox released under GNU General Public License 3.0 (GPL) • External code is GPL, or more liberal (eg. MIT) • GPL ‘infects’ the codebase • User must release any code that intrinsically links to this code • Prefer GNU Lesser General Public License (LGPL) • Your codebase is not affected, but changes must be shared • Unfortunately, LGPL and GPL are not compatible
  • 17. MATLAB ISSUES • Tried to make backwardly compatible with R2009a • Too painful • Roughly compatible with R2016a • Trying to reduce toolbox dependencies (eg. Statistics toolbox) • MATLAB OOP not great • Variables pass by value, but handle classes pass by reference. Makes copying difficult • Rolled my own deep copy mechanism (clone)
  • 18. DATA TYPES • Single spectrum, spectral collection, hyperspectral image • Continuous data • Did not consider multispectral data (discrete wavenumber) • Discontinuous, cannot take first derivative etc. • Data is a property of the object, not a pointer/function to a data storage type
  • 19. METADATA • Separate class from data type • Automatically label plots (eg PCA scores) • Build lists of labels manually labels = ChiClassMembership('mylabels','beta',1, 'gamma',2, 'beta',3, 'alpha',2); • Automatically read from specially designed Excel spreadsheet • Handles logical, category and numeric types • Need to remove label from metadata if removing spectrum from collection
  • 20. Users not sure of difference between numeric and category types, when using numbered samples
  • 21. DEFAULTS • Try to provide ‘reasonable’ default values • PCA denoising defaults to 30% of PCs retained • Random Forest defaults to 80% training and 20% test sets • Should default to 5-fold cross validation, but takes time • Random Forest defaults to using parallel processing if data set is large • MATLAB is slow to initialise worker pool • All parameters are user-configurable
  • 22. VISUALISATION • Graphics use perceptually neutral colormaps • Caters for colour vision deficiency (colour blindness) • Colour-mapped PCA image scores and loadings plots • Dialog box for Raman baseline removal • Asymmetric least squares baseline modelling requires user input • Confidence limits on PCA/CVA* scores plots • Default = 95%, but user variable • RMieS iteration change plot *Canonical variates analysis
  • 23. PERCEPTUALLY NEUTRAL COLORMAPS ORIGINAL, FULL COLOUR MATLAB JET COLORMAP
  • 24. PERCEPTUALLY NEUTRAL COLORMAPS ORIGINAL, FULL COLOUR MATLAB PARULA COLORMAP
  • 25. PERCEPTUALLY NEUTRAL COLORMAPS ORIGINAL, FULL COLOUR PYTHON VIRIDIS COLORMAP Python’s Inferno, Magma, Plasma and Viridis colormaps implemented in MATLAB
  • 26. 2020 VISION (IF I HAD A TIME MACHINE) • Developed more tests • Added support for discrete wavenumber data • Separated data storage from data manipulation • Used database (SQLite) to manage metadata • Considered OOP for data storage, but functional programming for operations
  • 27. 2020 VISION (IF I HAD A TIME MACHINE) Write it all in Python! …or C++