This document discusses processing large ToF-SIMS datasets to study surface segregation of polymer additives. It describes:
1) Performing an experiment to study surface changes in a polymer under heating conditions using ToF-SIMS surface mapping.
2) The challenges of analyzing the large sparse dataset produced, and using non-negative matrix factorization (NMF) to identify components.
3) Developing a MapReduce approach in MATLAB to enable NMF on datasets too large to fit in memory, showing it is faster than standard NMF.
FAIRSpectra - Enabling the FAIRification of Analytical Science
Processing Large ToF-SIMS Datasets For The Study Of Surface Segregation Of Polymer Additives
1. +
Processing Large ToF-SIMS Datasets For The Study Of
Surface Segregation Of Polymer Additives
Wednesday, 20 September 2017 1
Gustavo Ferraz Trindade, Marie-Laure Abel and John F. Watts
The Surface Analysis Laboratory, Surrey, UK
CAC 2016
Barcelona
4. Wednesday, 20 September 2017 4
Our equipment: TOF.SIMS 5 (IONTOF GmbH)
Liquid metal primary ion source (Bin
+ )
Electron impact sputter source (C60
+ )
Single stage reflectron ToF analyser
Nominal mass resolution @ 29 u: 10,000
Data acquired as spectra, ion maps (imaging)
or dual beam depth profiles
ToF-SIMS
5. Wednesday, 20 September 2017 5
combined spectra of a 500 x 500 um2 beam-rastered region
0.1 u
Our equipment: TOF.SIMS 5 (IONTOF GmbH)
ToF-SIMS
6. Wednesday, 20 September 2017 6
Automotive grade polypropylene
Samples
(PP copolymer + carbon black)
Ultimately undergo flame treatment prior to paint application
- surface segregation of additives might hinder the adhesion process
7. Wednesday, 20 September 2017 7
Proposed experiment to study surface change under heating conditions:
- Special heating sample holder with temperature control
- Bring surface to high temperatures (150 C)
- Acquire surface ToF-SIMS maps periodically
Experiment
8. Wednesday, 20 September 2017 8
Every image will have 128 x 128 pixels (500 x 500 um)
200 scans were done and each spectrum has 2.000.000 channels
Resulting dataset has 3M x 2M = 6x1012 data points!
Extremely sparse (< 1% non-zero elements)
Great challenge for multivariate analysis
Experiment
9. Wednesday, 20 September 2017 9
Due to the “profile” characteristic of the data set, the method of choice was
Non-negative matrix factorisation (NMF) a.k.a. MCR
Two approaches for multiplicative update algorithms (Lee & Seung - 2001)
MVA of ToF-SIMS
Binning voxels and channels
Reduced dataset
Classical method
Analyse full dataset
Data will not fit in PCs memory
Requires different method
10. Wednesday, 20 September 2017 10
Surrey Matlab GUI
Developed by G.F Trindade
Binning voxels and channels
Reduced dataset
Classical method
MVA of ToF-SIMS
s i m s M V A
www.mvatools.com
11. Wednesday, 20 September 2017 11
Issues prior to any MVA with ToF-SIMS raw data
Pre-processing
Export and read binary
RAW data file
1 full spectrum per pixel or voxel
Alignment of spectra from
different pixels
12. Wednesday, 20 September 2017 12
Export and read binary
RAW data file
1 full spectrum per pixel or voxel
Data in the form
Scan | x | y | ToF
For every secondary ion
detected
Issues prior to any MVA with ToF-SIMS raw data
Pre-processing
13. Wednesday, 20 September 2017 13
Multiple reads from disk
Sparse allocation
Scan x y tof
> 20 times faster
Pre-processing
14. Wednesday, 20 September 2017 14
Alignment of spectra from
different pixels
Ions with the same mass will travel shorter
or longer paths depending on where they
are formed on the surface
Each spectrum has ~2.000.000 channels
Quick method needed
(Fourier Transform based method way
quicker than correlation matrix based ones)
Issues prior to any MVA with ToF-SIMS raw data
Pre-processing
16. Wednesday, 20 September 2017 16
Results
NMF results (2000 iterations, 3 components)| Original matrix size (64 x 64 x 10) x (140001)
Resulting W matrix upscaled for visualisation
17. Wednesday, 20 September 2017 17
NMF results (2000 iterations, 3 components)| Original matrix size (64 x 64 x 10) x (140001)
Resulting W matrix upscaled for visualisation
Results
18. Wednesday, 20 September 2017 18
NMF results (2000 iterations, 3 components)| Original matrix size (64 x 64 x 10) x (140001)
Resulting W matrix upscaled for visualisation
(PDMS)
+
(Irganox 1010 Antioxidant)
Results
19. Wednesday, 20 September 2017 19
New trend in Surface Analysis
community of processing full
datasets
- Random vector algorithm + GPU
- Focus on PCA only
Alternative
Analyse full dataset
Data will not fit in PCs memory
Requires different method
20. Wednesday, 20 September 2017
Good approach for NMF of sparse
giant matrices: Map/Reduce
- Introduced by google in 2004
- Added to Matlab in version 2014b
- Still used in several Big Data
applications
Analyse full dataset
Data won’t fit in PCs memory
Requires different method
Map/Reduce
22. Wednesday, 20 September 2017 22
- Map/Reduce NMF
- Multiplicative update
method in map/reduce
framework
- Implementation in Matlab R2016a: challenge due to lack of
documentation
Map/Reduce
23. Wednesday, 20 September 2017 23
History of implementations in Matlab
Time per iteration (4 workers) x number of elements x sparsity
Same dataset
~ 10x faster
There is room for
improvement!!
Map/Reduce
25. Wednesday, 20 September 2017 25
Comparison between map/reduce and standard NMF
Adhesive sample
Data 32x32x20000, 150 iterations, same IC
Map/Reduce Standard
Map/Reduce
26. Wednesday, 20 September 2017 26
Conclusions
Conclusions
- Works on single machines (parallel)
- Easily scalable to clusters (parallel and
distributed)
- Tests to be done on Surrey HPC + Matlab
MDSC
- Surface contaminant and/or release
agents rapidly leave the surface at
high temperatures
- Anti-oxidant additive segregates to
the surface
- More in-depth analysis is required
(possible due to full-spec NMF comps.)
I - Polypropylene dataset II - Map/Reduce as an alternative
!?
ADD CONSTRAINTS USE TRAINING SUBSETS