SlideShare a Scribd company logo
RISE OF THE MACHINES
THE USE OF MACHINE LEARNING
IN SIMS DATA ANALYSIS
Alex Henderson
University of Manchester
SurfaceSpectra Ltd
http://about.me/henderson.alex
Twitter: @AlexHenderson00
LOOK OUT!
THE MACHINES ARE COMING!!
NO NEED TO BE AFRAID…
Result!
QUESTIONS WE MIGHT ASK
• Exploratory data analysis
• What can we find out about these samples?
• No prior knowledge required
• Differences in chemical or physical state between groups of samples
• Highlights spectral changes as function of group membership
• Need to know which group each spectrum belongs to
• Trend analysis
• Spectral changes as a function of dependent variable:
time, concentration, disease state etc.
• Classification of samples
• Spectral characteristics of groups
• Prediction of unseen samples into known groups
DATA ANALYSIS APPROACHES
CLASSICAL ANALYSIS
Hypothesis driven
Assumes a distribution
of spectral response
MACHINE LEARNING
Data driven
Interrogation of data
leads to hypothesis
Validation always required when
building a predictive model
CLASSICAL ANALYSIS
Assumes data obey the Central Limit Theorem
Data is Normally distributed
(Gaussian or Bell shaped curve)
Mathematically we can derive 4 ‘moments’
• Mean (average)
• Variance (standard deviation)
• Kurtosis (pointedness)
• Skewness (asymmetry)
Other descriptions lead from these parameters
eg Student’s t-test, ANOVA, MANOVA
MACHINE LEARNING
No underlying assumption
Need to generate a description of data
HISTORY OF MVA
Classical multivariate analysis dates from 1930s
Harold Hotelling, Ronald Fisher, Herman Wold and others
• Principal components analysis (PCA)
• Partial least squares (PLS)
• Fisher’s discriminant analysis
• Linear discriminant analysis (LDA), etc.
Slide rule is King!
HISTORY OF MVA CONTINUED
Computers become generally available in 1950s
Speed and reproducibility of calculations becomes easier
New approaches are developed
Term ‘Machine Learning’ coined in 1990’s, describing a
branch of computer science
BRAVE NEW WORLD!
Mechanical Turk plays chess in 1770
NEW?
Mechanical Turk CHEATS at chess in 1770
WELL…
WHAT FITS WHERE?
Classical Analysis Machine Learning
Exploratory
analysis
PCA K-means, HCA
Differences in state
between groups of
samples
Discriminant analysis
LDA, QDA, CVA
Random Forest
Classification
Trend analysis Regression analysis,
MCR
Random Forest
Regression
Classification of
samples
LDA, QDA Random Forest
Classification
SVM
MACHINE LEARNING
CHEAT SHEET
RANDOM FOREST
Ensemble method
Collect lots of weak classifiers to build one strong one
Collection of Decision Trees
Computationally intensive
Developed 1995 – 2001
MATLAB: TreeBagger
Python: scikit.RandomForestClassifier
DECISION TREE
An expression of an algorithm
Weak classifier
Move through each step in turn
Boss
around?
Weather?
Beer
Pay
day?
Beer
Work
Work
WorkYes
No
Sunny
Rainy
Windy
Recent
Long ago
ENSEMBLE OF TREES
Randomly select subsets of variables (m/z intensities)
Train multiple (few hundred) decision trees, each with
different variables
Each tree does the
best it can with only
a portion of the data
See which trees are
best and weight them
higher
VARIABLE ZOO
Ratio measurements taken
for many animals
For example:
• Length of leg
• Number of legs
• Number of wings
• Has horns/antlers?
• Length of neck
• Length of tail
Many examples of each
animal used
No tree gets all measurements
TOO EASY?
The giraffe is easily recognised by the number of legs and
length of neck. Oh, and it’s not a bird…
If any tree has those variables it would always identify the
animal as a giraffe. No need for anything else.
A Gerenuk is a four-legged
mammal with a long neck.
The decision tree was good,
but not good enough
It needed to be tamed by
other trees
The Random Forest model
prevents some trees from
dominating the overall result
WRONG!
GERENUKS. WHO KNEW!?
Polystrene beads
Each bead coated with a
different amino acid
SIMS image using Biotof
256 × 256 pixels
1000 amu
bin-summed to 1 amu
Data courtesy of Nick
Winograd, Penn State
University, USA, ~1999
CLASSIFICATION
EXAMPLE
Two regions on each
bead and also the
substrate selected
One region to train, the
other to test
Each region 400 pixels
Square root taken
Vector normalised
TRAINING AND TEST
REGIONS
Test
Test
Test
RANDOM FOREST MODEL
Training data (3 × 400 spectra)
Each spectrum labelled: bead1, bead 2, or substrate
Random Forest model constructed using
scikit.RandomForestClassifier in Python 3.5
300 trees selected.
Other parameters left as default
Code executed in PyCharm 2017.1.2
Bead 1 Bead 2 Substrate
Bead 1 97.5 1.5 1.0
Bead 2 3.0 96.0 1.0
Substrate 8.3 3.3 88.5
Percentage of correctly predicted values
Diagonal (trace) indicates > 88% of test spectra correctly classified
Caution: Result should be verified by cross-validation or bootstrap
CONFUSION MATRIX
Truth
Prediction
Each decision tree
uses different
combination of mass
values
Determine which m/z
values were used by
the most accurate
trees
This is a measure of
the importance of
those variables: m/z
VARIABLE IMPORTANCE
Trained Random
Forest model used
to predict class for
each pixel in
original image
Render the image
using result of
classification
Total time 15 sec
PREDICTION OF
ENTIRE IMAGE
FTIR – CANCER TISSUE
Epithelium 24.3%
Smooth muscle 50.7%
Lymphocytes 2.5%
Blood 0.2%
Concretion 0.0%
Fibrous stroma 12.3%
ECM 10.0%
Random forest classifier
Trained on exemplars with pathologist
6 hour data acquisition
2.5 million spectra classified in < 60 sec
No staining required, or de-waxing of the sample
Proc. SPIE 9041, Medical Imaging 2014: Digital Pathology, 90410D doi:10.1117/12.2043290
SUMMARY
Machine Learning
methods appear to be
useful tools that we
should consider for
adoption
Unsupervised,
supervised classification
and supervised
regression options are
all available
Increased computer
power may be required,
but Moore’s Law is on
our side here
IMAGE CREDITS
Mechagodzilla: http://list25.com/25-famous-fictional-robots-history/
Simply explained: http://geekandpoke.typepad.com/geekandpoke/2012/01/simply-explained-dp.html
Slide rule: https://commons.wikimedia.org/wiki/File:Slide_rule_scales_back.jpg
Brave new world: https://commons.wikimedia.org/wiki/File:IBM_150_Extra_Engineers_1951.jpg
Mechanical Turk 1: https://commons.wikimedia.org/wiki/File:Tuerkischer_schachspieler_windisch4.jpg
Mechanical Turk 2: https://commons.wikimedia.org/wiki/File:Tuerkischer_schachspieler_racknitz3.jpg
Scikit-learn cheat sheet: http://scikit-learn.org/stable/tutorial/machine_learning_map/
Forest: https://commons.wikimedia.org/wiki/File:Forest_Osaka_Japan.jpg
Animal silhouettes: https://clipartfest.com/categories/view/4c03d8ea8a4bc1ffca947c8b8dab48af25908403/african-
animal-silhouettes-clipart.html
Giraffe: https://img.clipartfest.com/4294d3fb2739e14cec3845ef668dcdc0_life-size-african-animal-wall-african-animal-
silhouettes-clipart_221-203.gif
Gerenuk 1: https://500px.com/cindy_wheeler
Gerenuk 2: http://wordwomanpartialellipsisofthesun.blogspot.co.uk/2015/01/gerenuk-giraffe-necked-gazelle-with.html
Many Gerenuks: http://wordwomanpartialellipsisofthesun.blogspot.co.uk/2015/01/gerenuk-giraffe-necked-gazelle-
with.html
Beads: Nick Winograd, Penn State University, USA
Cancer tissue: Proc. SPIE 9041, Medical Imaging 2014: Digital Pathology, 90410D doi:10.1117/12.2043290
Tetsujin 28: http://goldenani.blogspot.co.uk/2013/01/1963-part-1-on-outside-looking-in.html

More Related Content

Similar to Rise of the Machines: The Use of Machine Learning in SIMS Data Analysis

Phylogenomic Supertrees. ORP Bininda-Emond
Phylogenomic Supertrees. ORP Bininda-EmondPhylogenomic Supertrees. ORP Bininda-Emond
Phylogenomic Supertrees. ORP Bininda-EmondRoderic Page
 
Survival Analysis Superlearner
Survival Analysis SuperlearnerSurvival Analysis Superlearner
Survival Analysis Superlearner
Colleen Farrelly
 
Ml7 bagging
Ml7 baggingMl7 bagging
Ml7 bagging
ankit_ppt
 
decisiontrees (3).ppt
decisiontrees (3).pptdecisiontrees (3).ppt
decisiontrees (3).ppt
LvlShivaNagendra
 
decisiontrees.ppt
decisiontrees.pptdecisiontrees.ppt
decisiontrees.ppt
LvlShivaNagendra
 
decisiontrees.ppt
decisiontrees.pptdecisiontrees.ppt
decisiontrees.ppt
PriyadharshiniG41
 
Predictive analytics
Predictive analyticsPredictive analytics
Predictive analytics
Dinakar nk
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
Abhimanyu Dwivedi
 
Machine Learning
Machine LearningMachine Learning
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
Spark Summit
 
Data mining
Data mining Data mining
Data mining
Jhadesunil
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautzbutest
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautzbutest
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and Visualization
Dmitry Grapov
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
Stefan Duprey
 
Introduction to RandomForests 2004
Introduction to RandomForests 2004Introduction to RandomForests 2004
Introduction to RandomForests 2004Salford Systems
 
IGARSS2011-I-Ling.ppt
IGARSS2011-I-Ling.pptIGARSS2011-I-Ling.ppt
IGARSS2011-I-Ling.pptgrssieee
 
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Artificial Intelligence Institute at UofSC
 
Moviereview prjct
Moviereview prjctMoviereview prjct
Moviereview prjct
ShubhamSiddhartha
 
ensemble learning
ensemble learningensemble learning
ensemble learningbutest
 

Similar to Rise of the Machines: The Use of Machine Learning in SIMS Data Analysis (20)

Phylogenomic Supertrees. ORP Bininda-Emond
Phylogenomic Supertrees. ORP Bininda-EmondPhylogenomic Supertrees. ORP Bininda-Emond
Phylogenomic Supertrees. ORP Bininda-Emond
 
Survival Analysis Superlearner
Survival Analysis SuperlearnerSurvival Analysis Superlearner
Survival Analysis Superlearner
 
Ml7 bagging
Ml7 baggingMl7 bagging
Ml7 bagging
 
decisiontrees (3).ppt
decisiontrees (3).pptdecisiontrees (3).ppt
decisiontrees (3).ppt
 
decisiontrees.ppt
decisiontrees.pptdecisiontrees.ppt
decisiontrees.ppt
 
decisiontrees.ppt
decisiontrees.pptdecisiontrees.ppt
decisiontrees.ppt
 
Predictive analytics
Predictive analyticsPredictive analytics
Predictive analytics
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
 
Data mining
Data mining Data mining
Data mining
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautz
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautz
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and Visualization
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
 
Introduction to RandomForests 2004
Introduction to RandomForests 2004Introduction to RandomForests 2004
Introduction to RandomForests 2004
 
IGARSS2011-I-Ling.ppt
IGARSS2011-I-Ling.pptIGARSS2011-I-Ling.ppt
IGARSS2011-I-Ling.ppt
 
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
 
Moviereview prjct
Moviereview prjctMoviereview prjct
Moviereview prjct
 
ensemble learning
ensemble learningensemble learning
ensemble learning
 

More from Alex Henderson

FAIRSpectra - Towards a common data file format for SIMS images
FAIRSpectra - Towards a common data file format for SIMS imagesFAIRSpectra - Towards a common data file format for SIMS images
FAIRSpectra - Towards a common data file format for SIMS images
Alex Henderson
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
Alex Henderson
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
Alex Henderson
 
Hyperspectral Data Issues
Hyperspectral Data IssuesHyperspectral Data Issues
Hyperspectral Data Issues
Alex Henderson
 
The Class Imbalance Problem: AdaBoost to the Rescue?
The Class Imbalance Problem: AdaBoost to the Rescue?The Class Imbalance Problem: AdaBoost to the Rescue?
The Class Imbalance Problem: AdaBoost to the Rescue?
Alex Henderson
 
Getting started with chemometric classification
Getting started with chemometric classificationGetting started with chemometric classification
Getting started with chemometric classification
Alex Henderson
 
Too good to be true? How validate your data
Too good to be true? How validate your dataToo good to be true? How validate your data
Too good to be true? How validate your data
Alex Henderson
 
2020 Vision (Dubious Design Decisions)
2020 Vision (Dubious Design Decisions)2020 Vision (Dubious Design Decisions)
2020 Vision (Dubious Design Decisions)
Alex Henderson
 
To bag, or to boost? A question of balance
To bag, or to boost? A question of balanceTo bag, or to boost? A question of balance
To bag, or to boost? A question of balance
Alex Henderson
 
Digging into Data: Analysis and Visualisation in 3D
Digging into Data: Analysis and Visualisation in 3DDigging into Data: Analysis and Visualisation in 3D
Digging into Data: Analysis and Visualisation in 3D
Alex Henderson
 
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopyWhat's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
Alex Henderson
 
Interpretation of Static SIMS Spectra
Interpretation of Static SIMS SpectraInterpretation of Static SIMS Spectra
Interpretation of Static SIMS Spectra
Alex Henderson
 
Secondary Ion Mass Spectrometry
Secondary Ion Mass SpectrometrySecondary Ion Mass Spectrometry
Secondary Ion Mass Spectrometry
Alex Henderson
 

More from Alex Henderson (13)

FAIRSpectra - Towards a common data file format for SIMS images
FAIRSpectra - Towards a common data file format for SIMS imagesFAIRSpectra - Towards a common data file format for SIMS images
FAIRSpectra - Towards a common data file format for SIMS images
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Hyperspectral Data Issues
Hyperspectral Data IssuesHyperspectral Data Issues
Hyperspectral Data Issues
 
The Class Imbalance Problem: AdaBoost to the Rescue?
The Class Imbalance Problem: AdaBoost to the Rescue?The Class Imbalance Problem: AdaBoost to the Rescue?
The Class Imbalance Problem: AdaBoost to the Rescue?
 
Getting started with chemometric classification
Getting started with chemometric classificationGetting started with chemometric classification
Getting started with chemometric classification
 
Too good to be true? How validate your data
Too good to be true? How validate your dataToo good to be true? How validate your data
Too good to be true? How validate your data
 
2020 Vision (Dubious Design Decisions)
2020 Vision (Dubious Design Decisions)2020 Vision (Dubious Design Decisions)
2020 Vision (Dubious Design Decisions)
 
To bag, or to boost? A question of balance
To bag, or to boost? A question of balanceTo bag, or to boost? A question of balance
To bag, or to boost? A question of balance
 
Digging into Data: Analysis and Visualisation in 3D
Digging into Data: Analysis and Visualisation in 3DDigging into Data: Analysis and Visualisation in 3D
Digging into Data: Analysis and Visualisation in 3D
 
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopyWhat's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
 
Interpretation of Static SIMS Spectra
Interpretation of Static SIMS SpectraInterpretation of Static SIMS Spectra
Interpretation of Static SIMS Spectra
 
Secondary Ion Mass Spectrometry
Secondary Ion Mass SpectrometrySecondary Ion Mass Spectrometry
Secondary Ion Mass Spectrometry
 

Recently uploaded

GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
Areesha Ahmad
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
Wasswaderrick3
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
fafyfskhan251kmf
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 

Recently uploaded (20)

GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 

Rise of the Machines: The Use of Machine Learning in SIMS Data Analysis

  • 1. RISE OF THE MACHINES THE USE OF MACHINE LEARNING IN SIMS DATA ANALYSIS Alex Henderson University of Manchester SurfaceSpectra Ltd http://about.me/henderson.alex Twitter: @AlexHenderson00
  • 2. LOOK OUT! THE MACHINES ARE COMING!!
  • 3. NO NEED TO BE AFRAID… Result!
  • 4. QUESTIONS WE MIGHT ASK • Exploratory data analysis • What can we find out about these samples? • No prior knowledge required • Differences in chemical or physical state between groups of samples • Highlights spectral changes as function of group membership • Need to know which group each spectrum belongs to • Trend analysis • Spectral changes as a function of dependent variable: time, concentration, disease state etc. • Classification of samples • Spectral characteristics of groups • Prediction of unseen samples into known groups
  • 5. DATA ANALYSIS APPROACHES CLASSICAL ANALYSIS Hypothesis driven Assumes a distribution of spectral response MACHINE LEARNING Data driven Interrogation of data leads to hypothesis Validation always required when building a predictive model
  • 6. CLASSICAL ANALYSIS Assumes data obey the Central Limit Theorem Data is Normally distributed (Gaussian or Bell shaped curve) Mathematically we can derive 4 ‘moments’ • Mean (average) • Variance (standard deviation) • Kurtosis (pointedness) • Skewness (asymmetry) Other descriptions lead from these parameters eg Student’s t-test, ANOVA, MANOVA
  • 7. MACHINE LEARNING No underlying assumption Need to generate a description of data
  • 8. HISTORY OF MVA Classical multivariate analysis dates from 1930s Harold Hotelling, Ronald Fisher, Herman Wold and others • Principal components analysis (PCA) • Partial least squares (PLS) • Fisher’s discriminant analysis • Linear discriminant analysis (LDA), etc. Slide rule is King!
  • 9. HISTORY OF MVA CONTINUED Computers become generally available in 1950s Speed and reproducibility of calculations becomes easier New approaches are developed Term ‘Machine Learning’ coined in 1990’s, describing a branch of computer science
  • 11. Mechanical Turk plays chess in 1770 NEW?
  • 12. Mechanical Turk CHEATS at chess in 1770 WELL…
  • 13. WHAT FITS WHERE? Classical Analysis Machine Learning Exploratory analysis PCA K-means, HCA Differences in state between groups of samples Discriminant analysis LDA, QDA, CVA Random Forest Classification Trend analysis Regression analysis, MCR Random Forest Regression Classification of samples LDA, QDA Random Forest Classification SVM
  • 15. RANDOM FOREST Ensemble method Collect lots of weak classifiers to build one strong one Collection of Decision Trees Computationally intensive Developed 1995 – 2001 MATLAB: TreeBagger Python: scikit.RandomForestClassifier
  • 16. DECISION TREE An expression of an algorithm Weak classifier Move through each step in turn Boss around? Weather? Beer Pay day? Beer Work Work WorkYes No Sunny Rainy Windy Recent Long ago
  • 17. ENSEMBLE OF TREES Randomly select subsets of variables (m/z intensities) Train multiple (few hundred) decision trees, each with different variables Each tree does the best it can with only a portion of the data See which trees are best and weight them higher
  • 18. VARIABLE ZOO Ratio measurements taken for many animals For example: • Length of leg • Number of legs • Number of wings • Has horns/antlers? • Length of neck • Length of tail Many examples of each animal used No tree gets all measurements
  • 19. TOO EASY? The giraffe is easily recognised by the number of legs and length of neck. Oh, and it’s not a bird… If any tree has those variables it would always identify the animal as a giraffe. No need for anything else.
  • 20. A Gerenuk is a four-legged mammal with a long neck. The decision tree was good, but not good enough It needed to be tamed by other trees The Random Forest model prevents some trees from dominating the overall result WRONG!
  • 22. Polystrene beads Each bead coated with a different amino acid SIMS image using Biotof 256 × 256 pixels 1000 amu bin-summed to 1 amu Data courtesy of Nick Winograd, Penn State University, USA, ~1999 CLASSIFICATION EXAMPLE
  • 23. Two regions on each bead and also the substrate selected One region to train, the other to test Each region 400 pixels Square root taken Vector normalised TRAINING AND TEST REGIONS Test Test Test
  • 24. RANDOM FOREST MODEL Training data (3 × 400 spectra) Each spectrum labelled: bead1, bead 2, or substrate Random Forest model constructed using scikit.RandomForestClassifier in Python 3.5 300 trees selected. Other parameters left as default Code executed in PyCharm 2017.1.2
  • 25. Bead 1 Bead 2 Substrate Bead 1 97.5 1.5 1.0 Bead 2 3.0 96.0 1.0 Substrate 8.3 3.3 88.5 Percentage of correctly predicted values Diagonal (trace) indicates > 88% of test spectra correctly classified Caution: Result should be verified by cross-validation or bootstrap CONFUSION MATRIX Truth Prediction
  • 26. Each decision tree uses different combination of mass values Determine which m/z values were used by the most accurate trees This is a measure of the importance of those variables: m/z VARIABLE IMPORTANCE
  • 27. Trained Random Forest model used to predict class for each pixel in original image Render the image using result of classification Total time 15 sec PREDICTION OF ENTIRE IMAGE
  • 28. FTIR – CANCER TISSUE Epithelium 24.3% Smooth muscle 50.7% Lymphocytes 2.5% Blood 0.2% Concretion 0.0% Fibrous stroma 12.3% ECM 10.0% Random forest classifier Trained on exemplars with pathologist 6 hour data acquisition 2.5 million spectra classified in < 60 sec No staining required, or de-waxing of the sample Proc. SPIE 9041, Medical Imaging 2014: Digital Pathology, 90410D doi:10.1117/12.2043290
  • 29. SUMMARY Machine Learning methods appear to be useful tools that we should consider for adoption Unsupervised, supervised classification and supervised regression options are all available Increased computer power may be required, but Moore’s Law is on our side here
  • 30. IMAGE CREDITS Mechagodzilla: http://list25.com/25-famous-fictional-robots-history/ Simply explained: http://geekandpoke.typepad.com/geekandpoke/2012/01/simply-explained-dp.html Slide rule: https://commons.wikimedia.org/wiki/File:Slide_rule_scales_back.jpg Brave new world: https://commons.wikimedia.org/wiki/File:IBM_150_Extra_Engineers_1951.jpg Mechanical Turk 1: https://commons.wikimedia.org/wiki/File:Tuerkischer_schachspieler_windisch4.jpg Mechanical Turk 2: https://commons.wikimedia.org/wiki/File:Tuerkischer_schachspieler_racknitz3.jpg Scikit-learn cheat sheet: http://scikit-learn.org/stable/tutorial/machine_learning_map/ Forest: https://commons.wikimedia.org/wiki/File:Forest_Osaka_Japan.jpg Animal silhouettes: https://clipartfest.com/categories/view/4c03d8ea8a4bc1ffca947c8b8dab48af25908403/african- animal-silhouettes-clipart.html Giraffe: https://img.clipartfest.com/4294d3fb2739e14cec3845ef668dcdc0_life-size-african-animal-wall-african-animal- silhouettes-clipart_221-203.gif Gerenuk 1: https://500px.com/cindy_wheeler Gerenuk 2: http://wordwomanpartialellipsisofthesun.blogspot.co.uk/2015/01/gerenuk-giraffe-necked-gazelle-with.html Many Gerenuks: http://wordwomanpartialellipsisofthesun.blogspot.co.uk/2015/01/gerenuk-giraffe-necked-gazelle- with.html Beads: Nick Winograd, Penn State University, USA Cancer tissue: Proc. SPIE 9041, Medical Imaging 2014: Digital Pathology, 90410D doi:10.1117/12.2043290 Tetsujin 28: http://goldenani.blogspot.co.uk/2013/01/1963-part-1-on-outside-looking-in.html