SlideShare a Scribd company logo
Processing Large ToF-SIMS Datasets
Wednesday, 20 September 2017 1
Gustavo Ferraz Trindade, Marie-Laure Abel and John F. Watts
The Surface Analysis Laboratory, University of Surrey, UK
Outline
Wednesday, 20 September 2017 2
!?
Introduction Experimental Processing
Results Alternative Conclusions
Wednesday, 20 September 2017 3
ToF-SIMS data is “growing up”
Introduction
Most surface analysis laboratories ToF-SIMS spectrometers in dual beam depth profile mode
will typically generate hyperspectral image datasets distributed throughout a 3D cube
containing more than 256 x 256 x 500 voxels with each voxel containing from 20,000 to
2,000,000 spectral channels.
Wednesday, 20 September 2017 4
ToF-SIMS data analysis is “growing up”
Keywords “SIMS” and “PCA” @ web of science
Introduction
Wednesday, 20 September 2017
5
Binning voxels and channels, Peak picking, standard approaches
Surrey Matlab GUI developed by Gustavo Ferraz Trindade
Introduction
s i m s M V A
www.mvatools.com
Wednesday, 20 September 2017 7
New trend in Surface Analysis community of processing full datasets
- Random vectors algorithm + GPU
- Focus on PCA only
Introduction
Surf. Interface. Anal 2016
10.1002/sia.6042
Surf. Interface. Anal 2015
10.1002/sia.5800
Wednesday, 20 September 2017 8
My contribution/objective: perform Non-negative matrix
factorisation (NMF aka MCR) on unbinned datasets
Introduction
Wednesday, 20 September 2017 9
Example dataset
Surface segregation of polymer additives X
Large area scan of chemically contaminated fingerprint on silicon wafer
- Great interest from forensics
- Surrey has experience in it
Experimental
Analyst, 2015, 140, 6254
Analyst, 2013, 138, 6246
Surf. Interface Anal. 2010, 42, 826–829
Wednesday, 20 September 2017 10
Each patch will have 100 x 100 pixels (500 x 500 um2)
20 patches were done in a total area of 1 x 1 cm2 (pixel density 0.06 px/um2 )
Each spectrum has 2,000,000 channels
Resulting dataset has 4M x 2M = 8x1012 data points!
Extremely sparse (< 1% non-zero elements)
Great challenge for multivariate analysis
Iontof TOF.SIMS 5
High current bunched mode
25 keV Bi3
+
0.3 pA, 10 kHz
Negative secondary ions
10 scans per patch
Experimental
Wednesday, 20 September 2017 11
General Raw Data (.GRD)
Scan x y tof
Directly loading into pre-allocated sparse arrays in
Matlab 2016a
Resulting data is arranged in matrix A sized 4M x 2M
containing the 4M spectra of every single pixel, with 2M
spectral channels each.
Processing
Wednesday, 20 September 2017 12
the method of choice was
Non-negative matrix factorisation (NMF) a.k.a. MCR
Multiplicative update algorithms (Lee & Seung - 2001)
Processing
NATURE|VOL 401 | 1999 |
Wednesday, 20 September 2017 13
A (NxM) W*H
= =
(+ error)
(2 “pure components”)
(3 “pure components”)(4 “pure components”)(5 “pure components”)
and so on…
Weights
Weights
Pure spectrum Pure spectrum
+
Processing
Wednesday, 20 September 2017 14
To overcome time and memory limitations:
Sub-sampling using Sobol sequences
Processing
Surf. Interface. Anal 2016
Wednesday, 20 September 2017 15
Results
Component 1
Component 3
Component 2
265 u: Sodium dodecyl sulphate
SO2
-
SO3
-
SO4H- C29H28O4
-
NaS2O7-
OH-
SiO2
-
Data size: 4M x 1.3M
Subsample size: 15,000 x 1.3M
Iterations: 500
Time/iteration: 36s
FOV: 1 cm x 1cm
Wednesday, 20 September 2017 16
Spectrum of a single pixel
In spite of the fact that the dataset has
very few counts per pixel, NMF was
successfully achieved.
Advantage of performing multivariate
analysis on noisy, very large datasets.
A pixel by pixel view will not contain
relevant information but the whole data
would still have latent structure and be
able to undergo factorisation without
binning.
Results
Wednesday, 20 September 2017
17
Since the secondary ions analysed were
negatively charged, the Si- and SiO- peaks have
very low intensity.
Even so NMF managed to separate them
perfectly from the fingerprint signal
Reinforces the advantage of using unbinned
datasets when it comes to finding hidden
features.
Si-
SiO-
Results
Wednesday, 20 September 2017
18
Systematic misalignment of ALL peaks for
components 1 and 2
- Topography of deposited fingerprint?
- Non-perfect primary ions TOF correction?
Image
zoomed in
on 9 patches
Results
SO3
-
Wednesday, 20 September 2017 19
To overcome misalignment problem
- Better sample preparation
- Review primary ions tof correction
- Data based only methods:
Align channel by channel to
a reference pixel (warping)
Time consuming. Quickest found method
takes minutes per spectrum
Apply fixed shift
(misalignment due to height differences)
Only a few counts per pixel. Impossible to
identify peak positions
Results
Wednesday, 20 September 2017 20
Third approach for alignment (that would not need good statistics per pixel)
- Perform alignment on NMF components (matrix H) and reconstruct back
𝐴
𝑁𝑀𝐹
= 𝑊𝐻 + 𝑒𝑟𝑟𝑜𝑟
𝐴𝐿𝐼𝐺𝑁𝑀𝐸𝑁𝑇
𝐻𝐴 = H + S
𝑅𝐸𝐶𝑂𝑁𝑆𝑇𝑅𝑈𝐶𝑇𝐼𝑂𝑁
𝐴 𝐴 = 𝑊𝐻𝐴 + 𝑒𝑟𝑟𝑜𝑟 = 𝑊𝐻 + 𝑊𝑆 + 𝑒𝑟𝑟𝑜𝑟 = 𝐴 + 𝑊𝑆
AA = A + WS
Correction matrix for A would be shift matrix “S” (obtained from matrix H) weighted by
relative concentrations of pure components (matrix W)
- It seems to work with “simulated data” but we are still not sure whether it is mathematically
correct
- Small problem: this would require to process the entire matrix A (no subsampling)
Results
A
W
H HA
AA
(aligned)
NMF
Alignment
Reconstruction
NMF
(again)
Wednesday, 20 September 2017 21
- It seems to work with “simulated data” but we are still not sure whether it is appropriate
Results
H Matrix before and after alignment W Matrix (overlay of 3 components) before alignment
and after alignment + reconstruction + NMF
- Small problem: this would require to process the entire matrix A (no subsampling)
Before
After
Wednesday, 20 September 2017 22
Results
H Matrix W Matrix (overlay of 3 components)
Before
alignment
After
alignment
Wednesday, 20 September 2017
Good approach for NMF of sparse
giant matrices: Map/Reduce
- Introduced by Google in 2004
- Added to Matlab in version 2014b
- Still used in several Big Data
applications
Map/Reduce
Analyse full dataset
Data won’t fit in PCs memory: Requires different method
OSDI 2004
Wednesday, 20 September 2017 24
Map/Reduce
Map/Reduce
Wednesday, 20 September 2017 25
- Map/Reduce NMF
- Multiplicative update algorithm
in map/reduce framework
- Implementation in Matlab R2016a:
challenge due to lack of documentation
Map/Reduce
Proceedings of the 19th international conference on
World wide web WWW 10
Wednesday, 20 September 2017 26
History of implementations in Matlab
Time per iteration (4 workers) x number of elements x sparsity
Same dataset
~ 10x faster
There is room for
improvement!!
Map/Reduce
Wednesday, 20 September 2017 27
Comparison between map/reduce and standard NMF
Adhesive sample
Data 32x32x20000, 150 iterations, same IC
Map/Reduce Standard
Map/Reduce
Wednesday, 20 September 2017 28
Conclusions!?
 ToF-SIMS data will not stop growing and we have to
consider ways to go about processing it
 NMF of a large ToF-SIMS dataset has been achieved with
sparse allocation and subsampling
 Hidden features and weak signals can be identified
when unbinned datasets are processed
 For even larger datasets or to align peaks via reconstruction:
MapReduce may be a way to go
 Deals with data in chunks
 well defined framework
 easily scalable up to large computer clusters
NPL 3D Nano SIMS
Thank you

More Related Content

What's hot

10 recent dfn developments and comparison of dfn and fcm models viswanathan lanl
10 recent dfn developments and comparison of dfn and fcm models viswanathan lanl10 recent dfn developments and comparison of dfn and fcm models viswanathan lanl
10 recent dfn developments and comparison of dfn and fcm models viswanathan lanl
leann_mays
 
07 a70110 remotesensingandgisapplications
07 a70110 remotesensingandgisapplications07 a70110 remotesensingandgisapplications
07 a70110 remotesensingandgisapplicationsimaduddin91
 
2018 IMSM: Semi-analytical BRDF-based Quantification of Light Reflection - MI...
2018 IMSM: Semi-analytical BRDF-based Quantification of Light Reflection - MI...2018 IMSM: Semi-analytical BRDF-based Quantification of Light Reflection - MI...
2018 IMSM: Semi-analytical BRDF-based Quantification of Light Reflection - MI...
The Statistical and Applied Mathematical Sciences Institute
 
Machine learning and Satellite Images
Machine learning and Satellite ImagesMachine learning and Satellite Images
Machine learning and Satellite Images
Abel Alejandro Coronado Iruegas
 
Integrating eo with official statistics using machine learning in mexico geo ...
Integrating eo with official statistics using machine learning in mexico geo ...Integrating eo with official statistics using machine learning in mexico geo ...
Integrating eo with official statistics using machine learning in mexico geo ...
Abel Alejandro Coronado Iruegas
 
Scalable AMR - HPC China 2017, Hefei China
Scalable AMR - HPC China 2017, Hefei ChinaScalable AMR - HPC China 2017, Hefei China
Scalable AMR - HPC China 2017, Hefei China
Michael Norman
 
Graph Matching
Graph MatchingGraph Matching
Graph Matching
graphitech
 
Integrating the Campus to National NSF Cyberinfrastructure such as OSG and XSEDE
Integrating the Campus to National NSF Cyberinfrastructure such as OSG and XSEDEIntegrating the Campus to National NSF Cyberinfrastructure such as OSG and XSEDE
Integrating the Campus to National NSF Cyberinfrastructure such as OSG and XSEDE
Rob Gardner
 
Spectral_classification_of_WorldView2_multiangle_sequence.pptx
Spectral_classification_of_WorldView2_multiangle_sequence.pptxSpectral_classification_of_WorldView2_multiangle_sequence.pptx
Spectral_classification_of_WorldView2_multiangle_sequence.pptxgrssieee
 
Geographic Information Systems (October – 2015) [Question Paper | CBSGS: 75:2...
Geographic Information Systems (October – 2015) [Question Paper | CBSGS: 75:2...Geographic Information Systems (October – 2015) [Question Paper | CBSGS: 75:2...
Geographic Information Systems (October – 2015) [Question Paper | CBSGS: 75:2...
Mumbai B.Sc.IT Study
 
Nas net where model learn to generate models
Nas net where model learn to generate modelsNas net where model learn to generate models
Nas net where model learn to generate models
Khang Pham
 
Towards Utilizing GPUs in Information Visualization
Towards Utilizing GPUs in Information VisualizationTowards Utilizing GPUs in Information Visualization
Towards Utilizing GPUs in Information Visualization
Niklas Elmqvist
 
NUMERICAL METHOD
NUMERICAL METHODNUMERICAL METHOD
NUMERICAL METHOD
mehedi15
 
Deep Learning Applications to Satellite Imagery
Deep Learning Applications to Satellite ImageryDeep Learning Applications to Satellite Imagery
Deep Learning Applications to Satellite Imagery
rlewis48
 
Relief Clipping Planes (SIGGRAPH ASIA 2008)
Relief Clipping Planes (SIGGRAPH ASIA 2008)Relief Clipping Planes (SIGGRAPH ASIA 2008)
Relief Clipping Planes (SIGGRAPH ASIA 2008)Matthias Trapp
 
Looking from Above: Object Detection and Other Computer Vision Tasks on Satel...
Looking from Above: Object Detection and Other Computer Vision Tasks on Satel...Looking from Above: Object Detection and Other Computer Vision Tasks on Satel...
Looking from Above: Object Detection and Other Computer Vision Tasks on Satel...
Xiaoyong Zhu
 
Temporary Coherence 3D Animation
Temporary Coherence 3D AnimationTemporary Coherence 3D Animation
Temporary Coherence 3D Animation
Akshat Singh
 
5 Steps to Improve your Active Travel Communications
5 Steps to Improve your Active Travel Communications5 Steps to Improve your Active Travel Communications
5 Steps to Improve your Active Travel Communications
Pindar Creative
 
Team 9: Extraction and classification of satellite image patches
Team 9: Extraction and classification of satellite image patchesTeam 9: Extraction and classification of satellite image patches
Team 9: Extraction and classification of satellite image patches
leopauly
 
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS On the spectrum of the plenoptic f...
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS On the spectrum of the plenoptic f...IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS On the spectrum of the plenoptic f...
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS On the spectrum of the plenoptic f...
IEEEBEBTECHSTUDENTPROJECTS
 

What's hot (20)

10 recent dfn developments and comparison of dfn and fcm models viswanathan lanl
10 recent dfn developments and comparison of dfn and fcm models viswanathan lanl10 recent dfn developments and comparison of dfn and fcm models viswanathan lanl
10 recent dfn developments and comparison of dfn and fcm models viswanathan lanl
 
07 a70110 remotesensingandgisapplications
07 a70110 remotesensingandgisapplications07 a70110 remotesensingandgisapplications
07 a70110 remotesensingandgisapplications
 
2018 IMSM: Semi-analytical BRDF-based Quantification of Light Reflection - MI...
2018 IMSM: Semi-analytical BRDF-based Quantification of Light Reflection - MI...2018 IMSM: Semi-analytical BRDF-based Quantification of Light Reflection - MI...
2018 IMSM: Semi-analytical BRDF-based Quantification of Light Reflection - MI...
 
Machine learning and Satellite Images
Machine learning and Satellite ImagesMachine learning and Satellite Images
Machine learning and Satellite Images
 
Integrating eo with official statistics using machine learning in mexico geo ...
Integrating eo with official statistics using machine learning in mexico geo ...Integrating eo with official statistics using machine learning in mexico geo ...
Integrating eo with official statistics using machine learning in mexico geo ...
 
Scalable AMR - HPC China 2017, Hefei China
Scalable AMR - HPC China 2017, Hefei ChinaScalable AMR - HPC China 2017, Hefei China
Scalable AMR - HPC China 2017, Hefei China
 
Graph Matching
Graph MatchingGraph Matching
Graph Matching
 
Integrating the Campus to National NSF Cyberinfrastructure such as OSG and XSEDE
Integrating the Campus to National NSF Cyberinfrastructure such as OSG and XSEDEIntegrating the Campus to National NSF Cyberinfrastructure such as OSG and XSEDE
Integrating the Campus to National NSF Cyberinfrastructure such as OSG and XSEDE
 
Spectral_classification_of_WorldView2_multiangle_sequence.pptx
Spectral_classification_of_WorldView2_multiangle_sequence.pptxSpectral_classification_of_WorldView2_multiangle_sequence.pptx
Spectral_classification_of_WorldView2_multiangle_sequence.pptx
 
Geographic Information Systems (October – 2015) [Question Paper | CBSGS: 75:2...
Geographic Information Systems (October – 2015) [Question Paper | CBSGS: 75:2...Geographic Information Systems (October – 2015) [Question Paper | CBSGS: 75:2...
Geographic Information Systems (October – 2015) [Question Paper | CBSGS: 75:2...
 
Nas net where model learn to generate models
Nas net where model learn to generate modelsNas net where model learn to generate models
Nas net where model learn to generate models
 
Towards Utilizing GPUs in Information Visualization
Towards Utilizing GPUs in Information VisualizationTowards Utilizing GPUs in Information Visualization
Towards Utilizing GPUs in Information Visualization
 
NUMERICAL METHOD
NUMERICAL METHODNUMERICAL METHOD
NUMERICAL METHOD
 
Deep Learning Applications to Satellite Imagery
Deep Learning Applications to Satellite ImageryDeep Learning Applications to Satellite Imagery
Deep Learning Applications to Satellite Imagery
 
Relief Clipping Planes (SIGGRAPH ASIA 2008)
Relief Clipping Planes (SIGGRAPH ASIA 2008)Relief Clipping Planes (SIGGRAPH ASIA 2008)
Relief Clipping Planes (SIGGRAPH ASIA 2008)
 
Looking from Above: Object Detection and Other Computer Vision Tasks on Satel...
Looking from Above: Object Detection and Other Computer Vision Tasks on Satel...Looking from Above: Object Detection and Other Computer Vision Tasks on Satel...
Looking from Above: Object Detection and Other Computer Vision Tasks on Satel...
 
Temporary Coherence 3D Animation
Temporary Coherence 3D AnimationTemporary Coherence 3D Animation
Temporary Coherence 3D Animation
 
5 Steps to Improve your Active Travel Communications
5 Steps to Improve your Active Travel Communications5 Steps to Improve your Active Travel Communications
5 Steps to Improve your Active Travel Communications
 
Team 9: Extraction and classification of satellite image patches
Team 9: Extraction and classification of satellite image patchesTeam 9: Extraction and classification of satellite image patches
Team 9: Extraction and classification of satellite image patches
 
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS On the spectrum of the plenoptic f...
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS On the spectrum of the plenoptic f...IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS On the spectrum of the plenoptic f...
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS On the spectrum of the plenoptic f...
 

Similar to Processing Large ToF-SIMS Datasets

MVA methodologies for surface analysis data
MVA methodologies for surface analysis dataMVA methodologies for surface analysis data
MVA methodologies for surface analysis data
Gustavo Ferraz Trindade
 
simsMVA: A Matlab tool for multivariate analysis of ToF-SIMS datasets
simsMVA: A Matlab tool for multivariate analysis of ToF-SIMS datasetssimsMVA: A Matlab tool for multivariate analysis of ToF-SIMS datasets
simsMVA: A Matlab tool for multivariate analysis of ToF-SIMS datasets
Gustavo Ferraz Trindade
 
Multivariate analysis of ToF-SIMS datasets
Multivariate analysis of ToF-SIMS datasetsMultivariate analysis of ToF-SIMS datasets
Multivariate analysis of ToF-SIMS datasets
Gustavo Ferraz Trindade
 
Visualization Techniques for Massive Datasets
Visualization Techniques for Massive DatasetsVisualization Techniques for Massive Datasets
Visualization Techniques for Massive Datasets
Matthias Trapp
 
Yuki Oyama - Markov assignment for a pedestrian activity-based network design...
Yuki Oyama - Markov assignment for a pedestrian activity-based network design...Yuki Oyama - Markov assignment for a pedestrian activity-based network design...
Yuki Oyama - Markov assignment for a pedestrian activity-based network design...
Yuki Oyama
 
Road Segmentation from satellites images
Road Segmentation from satellites imagesRoad Segmentation from satellites images
Road Segmentation from satellites images
YoussefKitane
 
Complexity Resolution Control for Context Based on Metromaps
Complexity Resolution Control for Context Based on MetromapsComplexity Resolution Control for Context Based on Metromaps
Complexity Resolution Control for Context Based on Metromaps
Tokyo University of Science
 
SFScon19 - Andrea Vianello - Automatic system to generate an RGB mosaic of th...
SFScon19 - Andrea Vianello - Automatic system to generate an RGB mosaic of th...SFScon19 - Andrea Vianello - Automatic system to generate an RGB mosaic of th...
SFScon19 - Andrea Vianello - Automatic system to generate an RGB mosaic of th...
South Tyrol Free Software Conference
 
EUSIPCO_2018_Slides.pdf
EUSIPCO_2018_Slides.pdfEUSIPCO_2018_Slides.pdf
EUSIPCO_2018_Slides.pdf
Amine Hadj-Youcef
 
An accurate retrieval through R-MAC+ descriptors for landmark recognition
An accurate retrieval through R-MAC+ descriptors for landmark recognitionAn accurate retrieval through R-MAC+ descriptors for landmark recognition
An accurate retrieval through R-MAC+ descriptors for landmark recognition
Federico Magliani
 
Cross-Year Multi-Modal Image Retrieval Using Siamese Networks by Margarita Kh...
Cross-Year Multi-Modal Image Retrieval Using Siamese Networks by Margarita Kh...Cross-Year Multi-Modal Image Retrieval Using Siamese Networks by Margarita Kh...
Cross-Year Multi-Modal Image Retrieval Using Siamese Networks by Margarita Kh...
Paris Women in Machine Learning and Data Science
 
FV_IGARSS11.ppt
FV_IGARSS11.pptFV_IGARSS11.ppt
FV_IGARSS11.pptgrssieee
 
FV_IGARSS11.ppt
FV_IGARSS11.pptFV_IGARSS11.ppt
FV_IGARSS11.pptgrssieee
 
FV_IGARSS11.ppt
FV_IGARSS11.pptFV_IGARSS11.ppt
FV_IGARSS11.pptgrssieee
 
FV_IGARSS11.ppt
FV_IGARSS11.pptFV_IGARSS11.ppt
FV_IGARSS11.pptgrssieee
 
Prunet, Pascal: Plume detection and characterization from XCO2 imagery: Evalu...
Prunet, Pascal: Plume detection and characterization from XCO2 imagery: Evalu...Prunet, Pascal: Plume detection and characterization from XCO2 imagery: Evalu...
Prunet, Pascal: Plume detection and characterization from XCO2 imagery: Evalu...
Integrated Carbon Observation System (ICOS)
 
Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017SERC at Carleton College
 
Comparison of Various RCNN techniques for Classification of Object from Image
Comparison of Various RCNN techniques for Classification of Object from ImageComparison of Various RCNN techniques for Classification of Object from Image
Comparison of Various RCNN techniques for Classification of Object from Image
IRJET Journal
 
Deep Learning Tomography
Deep Learning TomographyDeep Learning Tomography
Deep Learning Tomography
Amir Adler
 
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
inside-BigData.com
 

Similar to Processing Large ToF-SIMS Datasets (20)

MVA methodologies for surface analysis data
MVA methodologies for surface analysis dataMVA methodologies for surface analysis data
MVA methodologies for surface analysis data
 
simsMVA: A Matlab tool for multivariate analysis of ToF-SIMS datasets
simsMVA: A Matlab tool for multivariate analysis of ToF-SIMS datasetssimsMVA: A Matlab tool for multivariate analysis of ToF-SIMS datasets
simsMVA: A Matlab tool for multivariate analysis of ToF-SIMS datasets
 
Multivariate analysis of ToF-SIMS datasets
Multivariate analysis of ToF-SIMS datasetsMultivariate analysis of ToF-SIMS datasets
Multivariate analysis of ToF-SIMS datasets
 
Visualization Techniques for Massive Datasets
Visualization Techniques for Massive DatasetsVisualization Techniques for Massive Datasets
Visualization Techniques for Massive Datasets
 
Yuki Oyama - Markov assignment for a pedestrian activity-based network design...
Yuki Oyama - Markov assignment for a pedestrian activity-based network design...Yuki Oyama - Markov assignment for a pedestrian activity-based network design...
Yuki Oyama - Markov assignment for a pedestrian activity-based network design...
 
Road Segmentation from satellites images
Road Segmentation from satellites imagesRoad Segmentation from satellites images
Road Segmentation from satellites images
 
Complexity Resolution Control for Context Based on Metromaps
Complexity Resolution Control for Context Based on MetromapsComplexity Resolution Control for Context Based on Metromaps
Complexity Resolution Control for Context Based on Metromaps
 
SFScon19 - Andrea Vianello - Automatic system to generate an RGB mosaic of th...
SFScon19 - Andrea Vianello - Automatic system to generate an RGB mosaic of th...SFScon19 - Andrea Vianello - Automatic system to generate an RGB mosaic of th...
SFScon19 - Andrea Vianello - Automatic system to generate an RGB mosaic of th...
 
EUSIPCO_2018_Slides.pdf
EUSIPCO_2018_Slides.pdfEUSIPCO_2018_Slides.pdf
EUSIPCO_2018_Slides.pdf
 
An accurate retrieval through R-MAC+ descriptors for landmark recognition
An accurate retrieval through R-MAC+ descriptors for landmark recognitionAn accurate retrieval through R-MAC+ descriptors for landmark recognition
An accurate retrieval through R-MAC+ descriptors for landmark recognition
 
Cross-Year Multi-Modal Image Retrieval Using Siamese Networks by Margarita Kh...
Cross-Year Multi-Modal Image Retrieval Using Siamese Networks by Margarita Kh...Cross-Year Multi-Modal Image Retrieval Using Siamese Networks by Margarita Kh...
Cross-Year Multi-Modal Image Retrieval Using Siamese Networks by Margarita Kh...
 
FV_IGARSS11.ppt
FV_IGARSS11.pptFV_IGARSS11.ppt
FV_IGARSS11.ppt
 
FV_IGARSS11.ppt
FV_IGARSS11.pptFV_IGARSS11.ppt
FV_IGARSS11.ppt
 
FV_IGARSS11.ppt
FV_IGARSS11.pptFV_IGARSS11.ppt
FV_IGARSS11.ppt
 
FV_IGARSS11.ppt
FV_IGARSS11.pptFV_IGARSS11.ppt
FV_IGARSS11.ppt
 
Prunet, Pascal: Plume detection and characterization from XCO2 imagery: Evalu...
Prunet, Pascal: Plume detection and characterization from XCO2 imagery: Evalu...Prunet, Pascal: Plume detection and characterization from XCO2 imagery: Evalu...
Prunet, Pascal: Plume detection and characterization from XCO2 imagery: Evalu...
 
Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017
 
Comparison of Various RCNN techniques for Classification of Object from Image
Comparison of Various RCNN techniques for Classification of Object from ImageComparison of Various RCNN techniques for Classification of Object from Image
Comparison of Various RCNN techniques for Classification of Object from Image
 
Deep Learning Tomography
Deep Learning TomographyDeep Learning Tomography
Deep Learning Tomography
 
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
 

Recently uploaded

Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
Areesha Ahmad
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
NathanBaughman3
 
plant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptxplant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptx
yusufzako14
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
aishnasrivastava
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
insect morphology and physiology of insect
insect morphology and physiology of insectinsect morphology and physiology of insect
insect morphology and physiology of insect
anitaento25
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
AADYARAJPANDEY1
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
subedisuryaofficial
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable Predictions
Michel Dumontier
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
Large scale production of streptomycin.pptx
Large scale production of streptomycin.pptxLarge scale production of streptomycin.pptx
Large scale production of streptomycin.pptx
Cherry
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
kumarmathi863
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
AlguinaldoKong
 

Recently uploaded (20)

Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 
plant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptxplant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptx
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
insect morphology and physiology of insect
insect morphology and physiology of insectinsect morphology and physiology of insect
insect morphology and physiology of insect
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable Predictions
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
Large scale production of streptomycin.pptx
Large scale production of streptomycin.pptxLarge scale production of streptomycin.pptx
Large scale production of streptomycin.pptx
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
 

Processing Large ToF-SIMS Datasets

  • 1. Processing Large ToF-SIMS Datasets Wednesday, 20 September 2017 1 Gustavo Ferraz Trindade, Marie-Laure Abel and John F. Watts The Surface Analysis Laboratory, University of Surrey, UK
  • 2. Outline Wednesday, 20 September 2017 2 !? Introduction Experimental Processing Results Alternative Conclusions
  • 3. Wednesday, 20 September 2017 3 ToF-SIMS data is “growing up” Introduction Most surface analysis laboratories ToF-SIMS spectrometers in dual beam depth profile mode will typically generate hyperspectral image datasets distributed throughout a 3D cube containing more than 256 x 256 x 500 voxels with each voxel containing from 20,000 to 2,000,000 spectral channels.
  • 4. Wednesday, 20 September 2017 4 ToF-SIMS data analysis is “growing up” Keywords “SIMS” and “PCA” @ web of science Introduction
  • 5. Wednesday, 20 September 2017 5 Binning voxels and channels, Peak picking, standard approaches Surrey Matlab GUI developed by Gustavo Ferraz Trindade Introduction s i m s M V A www.mvatools.com
  • 6. Wednesday, 20 September 2017 7 New trend in Surface Analysis community of processing full datasets - Random vectors algorithm + GPU - Focus on PCA only Introduction Surf. Interface. Anal 2016 10.1002/sia.6042 Surf. Interface. Anal 2015 10.1002/sia.5800
  • 7. Wednesday, 20 September 2017 8 My contribution/objective: perform Non-negative matrix factorisation (NMF aka MCR) on unbinned datasets Introduction
  • 8. Wednesday, 20 September 2017 9 Example dataset Surface segregation of polymer additives X Large area scan of chemically contaminated fingerprint on silicon wafer - Great interest from forensics - Surrey has experience in it Experimental Analyst, 2015, 140, 6254 Analyst, 2013, 138, 6246 Surf. Interface Anal. 2010, 42, 826–829
  • 9. Wednesday, 20 September 2017 10 Each patch will have 100 x 100 pixels (500 x 500 um2) 20 patches were done in a total area of 1 x 1 cm2 (pixel density 0.06 px/um2 ) Each spectrum has 2,000,000 channels Resulting dataset has 4M x 2M = 8x1012 data points! Extremely sparse (< 1% non-zero elements) Great challenge for multivariate analysis Iontof TOF.SIMS 5 High current bunched mode 25 keV Bi3 + 0.3 pA, 10 kHz Negative secondary ions 10 scans per patch Experimental
  • 10. Wednesday, 20 September 2017 11 General Raw Data (.GRD) Scan x y tof Directly loading into pre-allocated sparse arrays in Matlab 2016a Resulting data is arranged in matrix A sized 4M x 2M containing the 4M spectra of every single pixel, with 2M spectral channels each. Processing
  • 11. Wednesday, 20 September 2017 12 the method of choice was Non-negative matrix factorisation (NMF) a.k.a. MCR Multiplicative update algorithms (Lee & Seung - 2001) Processing NATURE|VOL 401 | 1999 |
  • 12. Wednesday, 20 September 2017 13 A (NxM) W*H = = (+ error) (2 “pure components”) (3 “pure components”)(4 “pure components”)(5 “pure components”) and so on… Weights Weights Pure spectrum Pure spectrum + Processing
  • 13. Wednesday, 20 September 2017 14 To overcome time and memory limitations: Sub-sampling using Sobol sequences Processing Surf. Interface. Anal 2016
  • 14. Wednesday, 20 September 2017 15 Results Component 1 Component 3 Component 2 265 u: Sodium dodecyl sulphate SO2 - SO3 - SO4H- C29H28O4 - NaS2O7- OH- SiO2 - Data size: 4M x 1.3M Subsample size: 15,000 x 1.3M Iterations: 500 Time/iteration: 36s FOV: 1 cm x 1cm
  • 15. Wednesday, 20 September 2017 16 Spectrum of a single pixel In spite of the fact that the dataset has very few counts per pixel, NMF was successfully achieved. Advantage of performing multivariate analysis on noisy, very large datasets. A pixel by pixel view will not contain relevant information but the whole data would still have latent structure and be able to undergo factorisation without binning. Results
  • 16. Wednesday, 20 September 2017 17 Since the secondary ions analysed were negatively charged, the Si- and SiO- peaks have very low intensity. Even so NMF managed to separate them perfectly from the fingerprint signal Reinforces the advantage of using unbinned datasets when it comes to finding hidden features. Si- SiO- Results
  • 17. Wednesday, 20 September 2017 18 Systematic misalignment of ALL peaks for components 1 and 2 - Topography of deposited fingerprint? - Non-perfect primary ions TOF correction? Image zoomed in on 9 patches Results SO3 -
  • 18. Wednesday, 20 September 2017 19 To overcome misalignment problem - Better sample preparation - Review primary ions tof correction - Data based only methods: Align channel by channel to a reference pixel (warping) Time consuming. Quickest found method takes minutes per spectrum Apply fixed shift (misalignment due to height differences) Only a few counts per pixel. Impossible to identify peak positions Results
  • 19. Wednesday, 20 September 2017 20 Third approach for alignment (that would not need good statistics per pixel) - Perform alignment on NMF components (matrix H) and reconstruct back 𝐴 𝑁𝑀𝐹 = 𝑊𝐻 + 𝑒𝑟𝑟𝑜𝑟 𝐴𝐿𝐼𝐺𝑁𝑀𝐸𝑁𝑇 𝐻𝐴 = H + S 𝑅𝐸𝐶𝑂𝑁𝑆𝑇𝑅𝑈𝐶𝑇𝐼𝑂𝑁 𝐴 𝐴 = 𝑊𝐻𝐴 + 𝑒𝑟𝑟𝑜𝑟 = 𝑊𝐻 + 𝑊𝑆 + 𝑒𝑟𝑟𝑜𝑟 = 𝐴 + 𝑊𝑆 AA = A + WS Correction matrix for A would be shift matrix “S” (obtained from matrix H) weighted by relative concentrations of pure components (matrix W) - It seems to work with “simulated data” but we are still not sure whether it is mathematically correct - Small problem: this would require to process the entire matrix A (no subsampling) Results A W H HA AA (aligned) NMF Alignment Reconstruction NMF (again)
  • 20. Wednesday, 20 September 2017 21 - It seems to work with “simulated data” but we are still not sure whether it is appropriate Results H Matrix before and after alignment W Matrix (overlay of 3 components) before alignment and after alignment + reconstruction + NMF - Small problem: this would require to process the entire matrix A (no subsampling) Before After
  • 21. Wednesday, 20 September 2017 22 Results H Matrix W Matrix (overlay of 3 components) Before alignment After alignment
  • 22. Wednesday, 20 September 2017 Good approach for NMF of sparse giant matrices: Map/Reduce - Introduced by Google in 2004 - Added to Matlab in version 2014b - Still used in several Big Data applications Map/Reduce Analyse full dataset Data won’t fit in PCs memory: Requires different method OSDI 2004
  • 23. Wednesday, 20 September 2017 24 Map/Reduce Map/Reduce
  • 24. Wednesday, 20 September 2017 25 - Map/Reduce NMF - Multiplicative update algorithm in map/reduce framework - Implementation in Matlab R2016a: challenge due to lack of documentation Map/Reduce Proceedings of the 19th international conference on World wide web WWW 10
  • 25. Wednesday, 20 September 2017 26 History of implementations in Matlab Time per iteration (4 workers) x number of elements x sparsity Same dataset ~ 10x faster There is room for improvement!! Map/Reduce
  • 26. Wednesday, 20 September 2017 27 Comparison between map/reduce and standard NMF Adhesive sample Data 32x32x20000, 150 iterations, same IC Map/Reduce Standard Map/Reduce
  • 27. Wednesday, 20 September 2017 28 Conclusions!?  ToF-SIMS data will not stop growing and we have to consider ways to go about processing it  NMF of a large ToF-SIMS dataset has been achieved with sparse allocation and subsampling  Hidden features and weak signals can be identified when unbinned datasets are processed  For even larger datasets or to align peaks via reconstruction: MapReduce may be a way to go  Deals with data in chunks  well defined framework  easily scalable up to large computer clusters NPL 3D Nano SIMS

Editor's Notes

  1. Ethane dioic (carboxylic)
  2. - Fazer um rapido overview da apresentacao. Quando falar de resultados mencionar que foram feitas analyses SEM, EDX, XPS e SIMS.
  3. - Fazer um rapido overview da apresentacao. Quando falar de resultados mencionar que foram feitas analyses SEM, EDX, XPS e SIMS.
  4. - Fazer um rapido overview da apresentacao. Quando falar de resultados mencionar que foram feitas analyses SEM, EDX, XPS e SIMS.
  5. - Fazer um rapido overview da apresentacao. Quando falar de resultados mencionar que foram feitas analyses SEM, EDX, XPS e SIMS.
  6. - Fazer um rapido overview da apresentacao. Quando falar de resultados mencionar que foram feitas analyses SEM, EDX, XPS e SIMS.
  7. - Fazer um rapido overview da apresentacao. Quando falar de resultados mencionar que foram feitas analyses SEM, EDX, XPS e SIMS.
  8. - Fazer um rapido overview da apresentacao. Quando falar de resultados mencionar que foram feitas analyses SEM, EDX, XPS e SIMS.
  9. - Fazer um rapido overview da apresentacao. Quando falar de resultados mencionar que foram feitas analyses SEM, EDX, XPS e SIMS.
  10. - Fazer um rapido overview da apresentacao. Quando falar de resultados mencionar que foram feitas analyses SEM, EDX, XPS e SIMS.
  11. - Fazer um rapido overview da apresentacao. Quando falar de resultados mencionar que foram feitas analyses SEM, EDX, XPS e SIMS.
  12. - Fazer um rapido overview da apresentacao. Quando falar de resultados mencionar que foram feitas analyses SEM, EDX, XPS e SIMS.
  13. - Fazer um rapido overview da apresentacao. Quando falar de resultados mencionar que foram feitas analyses SEM, EDX, XPS e SIMS.
  14. - Fazer um rapido overview da apresentacao. Quando falar de resultados mencionar que foram feitas analyses SEM, EDX, XPS e SIMS.
  15. - Fazer um rapido overview da apresentacao. Quando falar de resultados mencionar que foram feitas analyses SEM, EDX, XPS e SIMS.
  16. - Fazer um rapido overview da apresentacao. Quando falar de resultados mencionar que foram feitas analyses SEM, EDX, XPS e SIMS.
  17. - Fazer um rapido overview da apresentacao. Quando falar de resultados mencionar que foram feitas analyses SEM, EDX, XPS e SIMS.
  18. - Fazer um rapido overview da apresentacao. Quando falar de resultados mencionar que foram feitas analyses SEM, EDX, XPS e SIMS.
  19. - Fazer um rapido overview da apresentacao. Quando falar de resultados mencionar que foram feitas analyses SEM, EDX, XPS e SIMS.
  20. - Fazer um rapido overview da apresentacao. Quando falar de resultados mencionar que foram feitas analyses SEM, EDX, XPS e SIMS.
  21. - Fazer um rapido overview da apresentacao. Quando falar de resultados mencionar que foram feitas analyses SEM, EDX, XPS e SIMS.
  22. - Fazer um rapido overview da apresentacao. Quando falar de resultados mencionar que foram feitas analyses SEM, EDX, XPS e SIMS.
  23. - Fazer um rapido overview da apresentacao. Quando falar de resultados mencionar que foram feitas analyses SEM, EDX, XPS e SIMS.
  24. - Fazer um rapido overview da apresentacao. Quando falar de resultados mencionar que foram feitas analyses SEM, EDX, XPS e SIMS.
  25. - Fazer um rapido overview da apresentacao. Quando falar de resultados mencionar que foram feitas analyses SEM, EDX, XPS e SIMS.
  26. - Fazer um rapido overview da apresentacao. Quando falar de resultados mencionar que foram feitas analyses SEM, EDX, XPS e SIMS.
  27. - Fazer um rapido overview da apresentacao. Quando falar de resultados mencionar que foram feitas analyses SEM, EDX, XPS e SIMS.
  28. - Fazer um rapido overview da apresentacao. Quando falar de resultados mencionar que foram feitas analyses SEM, EDX, XPS e SIMS.
  29. - Fazer um rapido overview da apresentacao. Quando falar de resultados mencionar que foram feitas analyses SEM, EDX, XPS e SIMS.