SECONDARY ION
MASS SPECTROMETRY
DATA ANALYSIS IN 1D, 2D AND 3D

ALEX HENDERSON

     Surface Analysis Research Centre
     University of Manchester, UK
SIMS technique

   Fire energetic ions at a
    solid – primary ions
   Solid emits (sputters)
    positive ions
    negative ions
    electrons and neutrals
   These secondary ions
    produce a mass spectrum

      Secondary Ion
     Mass Spectrometry
Comparison with other MS
   No (chromatographic) separation step so spectra are
    overlapping
   Limited amount of sample available
   Relatively low throughput so no large databases of spectra
   Highly surface sensitive so detects contamination easily (pro
    and con)
   Need to transport analyte to gas phase; proteins decompose,
    polymers generally OK
   MS/MS being developed (here)
   Negative ion spectra available
1D analysis
Collections of spectra
O
                                                                                                           H 2C      O      C        C 15H 31
                                                                                                                            O



              Typical SIMS spectrum
                                                                                                            HC       O      C        C 15H 31
                                                                                                                             O
                                                                                                                                                         +
                                                                                                           H 2C         O    P       O     CH2   CH2         N(CH 3) 3

                                                                                                                             OH




                                  1,2-Dipalm itoyl-sn-glycero-3-phosphatidylcholine (DPPC) (positive ion)


            50,000

            45,000
                                                                         CH3

            40,000                                              CH2 CH   N
                                                                             +
                                                                                 CH3                 O                       CH3
                                                                                                                                 +
            35,000                                                       CH3                  HO     P    O       CH2 CH2    N       CH3

                                                                                                     OH                      CH3
            30,000
Intensity




            25,000

            20,000

            15,000

            10,000

             5,000

                0
                     0       20        40         60          80         100           120     140                160                180               200
                                                                         m/z
                                                                                                          The Static SIMS Library, SurfaceSpectra Ltd


                            100 – 250k data points
                            1 – 3000 u mass range


                                                                                                                                     www.surfacespectra.com
Spectra of bacteria

                  Enterococcus spp.                                              Proteus mirabilis

        4                                                              4
     x 10                      En                                  x 10                         Pm
 6                                                            5

                                                             4.5
 5
                                                              4

                                                             3.5
 4
                                                              3

 3                                                           2.5

                                                              2
 2
                                                             1.5

                                                              1
 1
                                                             0.5

 0                                                            0
  0         100    200   300   400   500   600   700   800         0       100   200      300   400   500   600   700   800
                               m/z                                                              m/z




                                                                                       Applied Surface Science 252 (2006) 6719-6722
Spectra of bacteria

                    Enterococcus spp.                                             Proteus mirabilis


                                 En                                                           Pm
 8000                                                          2500


 7000
                                                               2000
 6000


 5000
                                                               1500

 4000

                                                               1000
 3000


 2000
                                                               500

 1000


   0                                                             0
        100   150    200   250   300   350   400   450   500          100   150   200   250   300   350   400   450   500
                                 m/z                                                          m/z




                                                                                    Applied Surface Science 252 (2006) 6719-6722
PCA of bacteria

5 species of bacteria

No a priori
knowledge used
in PCA

Positive ion spectra
1-800 u, rebinned to
1 u steps

Square root of
intensity

Sum-normalised




                        Applied Surface Science 252 (2006) 6719-6722
Canonical Variates Analysis
(Discriminant Function Analysis)
   Principal components are pure (orthogonal) aspects
    of the data
   Canonical variates are combinations of PCs that
    best describe an a priori class structure
   Based on Fisher’s Linear Discriminant, subtly
    different from Linear Discriminant Analysis (LDA): no
    assumption of normally distributed classes
   CVA is also referred to as DFA
Canonical Variates Analysis
(Discriminant Function Analysis)




   Maximise ratio of between-group variance to within-
    group variance
PC-CVA bacteria classification

5 species of bacteria

Class structure used

9 PCs selected using
PRESS test

Cross-validation
indicates percent
correctly classified:
  Cf 75% CC
  Ec 92% CC
  En 100% CC
  Kp 25% CC
  Pm 50% CC



                        Applied Surface Science 252 (2006) 6869-6874
2D analysis
Spatially organised collections of spectra
Hyperspectral images
Imaging SIMS
   Focussed ion beam stepped
    from point to point
   2D data generated
   Full mass spectrum at each
    point/pixel



   Beam diameter down to
    ~1-2 µm
    (50 nm possible, but can
    damage sample)
SIMS image - ‘chemical photograph’

            4500

            4000

            3500

            3000
intensity




            2500

            2000

            1500

            1000

            500

             0
                  0   100     200   300    400   500   600
                                    m/z

                            Total ion spectrum                        Total ion image
                                                                      Bead diameter 50 μm
                                                             Data courtesy of Penn State University, USA
Selected ion images
            4500

            4000

            3500

            3000
intensity




            2500

            2000

            1500

            1000

            500

             0
                  0   100   200   300   400   500   600
                                  m/z




                                                          Data courtesy of Penn State University, USA
Maximum Autocorrelation Factor Analysis
(MAF)
   Incorporates spatial information with spectral
    information
   Determine the correlation with the same image
    offset diagonally by 1 pixel
   Autocorrelation means scaling invariant
   Computation time ~4× PCA
   Possible issue when image feature size ≈ pixel size
MAF versus PCA of HeLa cells
                    a                 b                          c




a)  Total ion image, scaled and smoothed
b)  Score on PC 6, scaled and smoothed
c)  Score on MAF 4, scaled and smoothed
MAF captures the distinction between inner and outer cellular regions
more clearly than PCA. Comparison with the total ion image shows the
additional information available following multivariate analysis.


                                               Surface and Interface Analysis 41 (2009) 666-674
MAF versus PCA of HeLa cells




          Loading on PC 6             Loading on MAF 4



   MAF loadings more ‘informative’



                                       Surface and Interface Analysis 41 (2009) 666-674
High mass resolution required
Nominally all at m/z 86
 Lipid (DPPC)
  m/z = 86.0969692
 Unknown

 Silicon substrate [Si3H2]+

   m/z = 85.9464332
Separation = 0.15 u

                                                m/z 86


                               Analytical chemistry 80 (2008) 9058-9064
Rat kidney tissue
                           Cryo-microtomed slices
                           Sequential layers
                             Unwashed
                             Washed   in methanol
                           Data is
                             10x10    tiles
                             32x32 pixels per tile
                             24000 channels per pixel
                             2.4 billion datapoints
                             19.2 GB (as doubles in MATLAB)

 Almost all zeros, so use sparse matricies to handle the data
Summation of all tissue pixels
Clear changes in chemical distribution over small mass range




                            Individual images from nominal mass m/z 197
                            Images generated at 0.05 u increments
                            from 196.75 to 197.30
                            Images are 9 x 9 mm2 comprising
                            10 x 10 tiles of 32 x 32 pixels



John S Fletcher               Ionoptika J105                                      April 2009
                                               Analytical and Bioanalytical Chemistry 396 (2010) 85-104
Rat kidney section - overlay




320 x 320 pixels
9 x 9 mm

                                              Full mass spectrum
                                              available at each pixel
  Lipid (DPPC)     DAG   Salt related
                                        Analytical and Bioanalytical Chemistry 396 (2010) 85-104
SIMS and AFM
          AFM                                    SIMS (total ion)
                          Identify points of
                         reference between                                    (Human cheek cells)
                               images




                                               Register images
                                                and overlay
    AFM following
‘projective’ transform
   and cropping in
       MATLAB


                                                             Applied Surface Science 255 (2008) 1264–1270
SIMS and AFM

      nm




           µm
                                                       µm




   Use AFM z-height to offset SIMS data
   Use AFM x- and y-position to accurately scale SIMS
    x- and y-positions

                                      Applied Surface Science 255 (2008) 1264–1270
3D analysis
Three-dimensional spatially organised collections
of spectra
‘Hyper-’hyperspectral images
HeLa-M cells
   Immortalised cervical
    cancer cell line commonly
    used as a model system
    for biology
   Cultured on poly-L-lysine
    coated substrates
   AFM after freeze drying:
    30 – 50 µm diameter
    0.6 – 1µm thick
Formalin fixed, freeze-dried HeLa cells
 Lipid (DPPC) increasing ion dose
                                                                                   Around the nucleus


                                                                                         184




                                                                          152
                                                                                   Inside the nucleus




                                                    1st layer: evidence of chemistry having spread
                                                     outside cell area
                                                    3rd layer: spectra show chemistry still mixed up
                                                     between nucleus and membrane
                                                     4th layer: one cell exhibits bimodal nucleus
Rapid Commun. Mass Spectrom. 25 (2011) 925-932       indicating the dividing stage of cell cycle – mitosis
3D visualisation – issues
   Problem with point of view of ‘observer’
   Mass analyser is normal to sample
   Different sample heights appear to be at same
    level

   Data size:
     256  x 256 pixels x 10 layers x 11400 channels
     7.5 billion datapoints = 58 GB
     >99% of data is zero so only ~5 GB in RAM
3D visualisation – our approach
   PCA of whole dataset (sparse, MATLAB)
   Identify PC that separates substrate from organic
    overlayer
   Assume substrate is flat
   Slide columns of pixels to align with crossover in
    score of that PC
   Discard substrate-rich pixels
   Repeat PCA to identify peaks or components
   Render results (AVS Express)
                                     Rapid Commun. Mass Spectrom. 25 (2011) 925-932
Two sample preparation methods
Aim is to maintain the biological integrity of cells in vacuum system

 Formalin fixed, freeze-dried cells   Frozen hydrated cells

    Hela-M cells cultured on poly-      Hela-M cells cultured on one
     L-lysine coated silicon wafer        plate of hinged two-plate steel
    Formalin fixed (4%)                  substrate coated in poly-L-lysine
    Cells are washed with               Cells are washed with ammonium
     ammonium formate (0.15 M)            formate (0.15M) to remove salts
     prior to analysis to remove         Cells trapped between plates
     salts                                and rapidly frozen in liquid
    Freeze-dried in vacuum               propane
                                         Sandwich fractured in vacuum at
                                          160 K to reveal cells

                                                 Rapid Commun. Mass Spectrom. 25 (2011) 925-932
Frozen hydrated HeLa cells




 Lipid (DPPC headgroup) @ m/z 184.1   Nucleic acid (adenine) @ m/z 136.1
                                               Rapid Commun. Mass Spectrom. 25 (2011) 925-932
Formalin fixed, freeze-dried HeLa cells




                         Lipid (DPPC) @ m/z 184.1   Principal component indicating
Rapid Commun. Mass Spectrom. 25 (2011) 925-932           guanine & tryptophan
Movies available online

256 x 256 pix        Frozen hydrated HeLa cells
x 10 layers              http://www.youtube.com/watch?v=sPuMYUQPvHQ
x 11400 chans
                         Visualisation of the membrane (green, m/z 184.1,
                          phosphocholine) and nucleus (red, m/z 136.1, adenine)
7.5 billion pts      Formalin fixed, freeze-dried HeLa cells
= 58 GB
                         http://www.youtube.com/watch?v=2phMkYo7yhg
                         Visualisation of the membrane (green, m/z 184.1,
>99% of data
                          phosphocholine) and nucleus (purple, negative scores
is zero so only           on principal component 5)
~5 GB in RAM             Possible mitosis visible at lower right
Summary
   What SIMS lacks in ultimate species identification it
    makes up for in spatial location
   New ionisation sources produce data nearer to
    ‘traditional’ MS, opening up resources
   Tandem MS approaches are breaking through
 SARC
 SARC                                   Validation algorithms
   
    John Vickerman
        John Vickerman                      Roy Goodacre
    Nick Lockyer
    Nick Lockyer
    John Fletcher                           Yun Xu
    John Fletcher
    Sadia Sheraz
         (nee Rabbani)
                                            Elon Correa
    ... Sadia Rabbani (now Sheraz)
    and the rest!                      Visualisation group in
    ... and the rest!                   Research Computer Services




Acknowledgements


           www.sarc.manchester.ac.uk
www.sarc.manchester.ac.uk
alex.henderson@manchester.ac.uk

Secondary Ion Mass Spectrometry

  • 1.
    SECONDARY ION MASS SPECTROMETRY DATAANALYSIS IN 1D, 2D AND 3D ALEX HENDERSON Surface Analysis Research Centre University of Manchester, UK
  • 2.
    SIMS technique  Fire energetic ions at a solid – primary ions  Solid emits (sputters) positive ions negative ions electrons and neutrals  These secondary ions produce a mass spectrum Secondary Ion Mass Spectrometry
  • 3.
    Comparison with otherMS  No (chromatographic) separation step so spectra are overlapping  Limited amount of sample available  Relatively low throughput so no large databases of spectra  Highly surface sensitive so detects contamination easily (pro and con)  Need to transport analyte to gas phase; proteins decompose, polymers generally OK  MS/MS being developed (here)  Negative ion spectra available
  • 4.
  • 5.
    O H 2C O C C 15H 31 O Typical SIMS spectrum HC O C C 15H 31 O + H 2C O P O CH2 CH2 N(CH 3) 3 OH 1,2-Dipalm itoyl-sn-glycero-3-phosphatidylcholine (DPPC) (positive ion) 50,000 45,000 CH3 40,000 CH2 CH N + CH3 O CH3 + 35,000 CH3 HO P O CH2 CH2 N CH3 OH CH3 30,000 Intensity 25,000 20,000 15,000 10,000 5,000 0 0 20 40 60 80 100 120 140 160 180 200 m/z The Static SIMS Library, SurfaceSpectra Ltd  100 – 250k data points  1 – 3000 u mass range www.surfacespectra.com
  • 6.
    Spectra of bacteria Enterococcus spp. Proteus mirabilis 4 4 x 10 En x 10 Pm 6 5 4.5 5 4 3.5 4 3 3 2.5 2 2 1.5 1 1 0.5 0 0 0 100 200 300 400 500 600 700 800 0 100 200 300 400 500 600 700 800 m/z m/z Applied Surface Science 252 (2006) 6719-6722
  • 7.
    Spectra of bacteria Enterococcus spp. Proteus mirabilis En Pm 8000 2500 7000 2000 6000 5000 1500 4000 1000 3000 2000 500 1000 0 0 100 150 200 250 300 350 400 450 500 100 150 200 250 300 350 400 450 500 m/z m/z Applied Surface Science 252 (2006) 6719-6722
  • 8.
    PCA of bacteria 5species of bacteria No a priori knowledge used in PCA Positive ion spectra 1-800 u, rebinned to 1 u steps Square root of intensity Sum-normalised Applied Surface Science 252 (2006) 6719-6722
  • 9.
    Canonical Variates Analysis (DiscriminantFunction Analysis)  Principal components are pure (orthogonal) aspects of the data  Canonical variates are combinations of PCs that best describe an a priori class structure  Based on Fisher’s Linear Discriminant, subtly different from Linear Discriminant Analysis (LDA): no assumption of normally distributed classes  CVA is also referred to as DFA
  • 10.
    Canonical Variates Analysis (DiscriminantFunction Analysis)  Maximise ratio of between-group variance to within- group variance
  • 11.
    PC-CVA bacteria classification 5species of bacteria Class structure used 9 PCs selected using PRESS test Cross-validation indicates percent correctly classified: Cf 75% CC Ec 92% CC En 100% CC Kp 25% CC Pm 50% CC Applied Surface Science 252 (2006) 6869-6874
  • 12.
    2D analysis Spatially organisedcollections of spectra Hyperspectral images
  • 13.
    Imaging SIMS  Focussed ion beam stepped from point to point  2D data generated  Full mass spectrum at each point/pixel  Beam diameter down to ~1-2 µm (50 nm possible, but can damage sample)
  • 14.
    SIMS image -‘chemical photograph’ 4500 4000 3500 3000 intensity 2500 2000 1500 1000 500 0 0 100 200 300 400 500 600 m/z Total ion spectrum Total ion image Bead diameter 50 μm Data courtesy of Penn State University, USA
  • 15.
    Selected ion images 4500 4000 3500 3000 intensity 2500 2000 1500 1000 500 0 0 100 200 300 400 500 600 m/z Data courtesy of Penn State University, USA
  • 16.
    Maximum Autocorrelation FactorAnalysis (MAF)  Incorporates spatial information with spectral information  Determine the correlation with the same image offset diagonally by 1 pixel  Autocorrelation means scaling invariant  Computation time ~4× PCA  Possible issue when image feature size ≈ pixel size
  • 17.
    MAF versus PCAof HeLa cells a b c a) Total ion image, scaled and smoothed b) Score on PC 6, scaled and smoothed c) Score on MAF 4, scaled and smoothed MAF captures the distinction between inner and outer cellular regions more clearly than PCA. Comparison with the total ion image shows the additional information available following multivariate analysis. Surface and Interface Analysis 41 (2009) 666-674
  • 18.
    MAF versus PCAof HeLa cells Loading on PC 6 Loading on MAF 4  MAF loadings more ‘informative’ Surface and Interface Analysis 41 (2009) 666-674
  • 19.
    High mass resolutionrequired Nominally all at m/z 86  Lipid (DPPC) m/z = 86.0969692  Unknown  Silicon substrate [Si3H2]+ m/z = 85.9464332 Separation = 0.15 u m/z 86 Analytical chemistry 80 (2008) 9058-9064
  • 20.
    Rat kidney tissue  Cryo-microtomed slices  Sequential layers  Unwashed  Washed in methanol  Data is  10x10 tiles  32x32 pixels per tile  24000 channels per pixel  2.4 billion datapoints  19.2 GB (as doubles in MATLAB) Almost all zeros, so use sparse matricies to handle the data
  • 21.
    Summation of alltissue pixels
  • 22.
    Clear changes inchemical distribution over small mass range Individual images from nominal mass m/z 197 Images generated at 0.05 u increments from 196.75 to 197.30 Images are 9 x 9 mm2 comprising 10 x 10 tiles of 32 x 32 pixels John S Fletcher Ionoptika J105 April 2009 Analytical and Bioanalytical Chemistry 396 (2010) 85-104
  • 23.
    Rat kidney section- overlay 320 x 320 pixels 9 x 9 mm Full mass spectrum available at each pixel Lipid (DPPC) DAG Salt related Analytical and Bioanalytical Chemistry 396 (2010) 85-104
  • 24.
    SIMS and AFM AFM SIMS (total ion) Identify points of reference between (Human cheek cells) images Register images and overlay AFM following ‘projective’ transform and cropping in MATLAB Applied Surface Science 255 (2008) 1264–1270
  • 25.
    SIMS and AFM nm µm µm  Use AFM z-height to offset SIMS data  Use AFM x- and y-position to accurately scale SIMS x- and y-positions Applied Surface Science 255 (2008) 1264–1270
  • 26.
    3D analysis Three-dimensional spatiallyorganised collections of spectra ‘Hyper-’hyperspectral images
  • 27.
    HeLa-M cells  Immortalised cervical cancer cell line commonly used as a model system for biology  Cultured on poly-L-lysine coated substrates  AFM after freeze drying: 30 – 50 µm diameter 0.6 – 1µm thick
  • 28.
    Formalin fixed, freeze-driedHeLa cells Lipid (DPPC) increasing ion dose Around the nucleus 184 152 Inside the nucleus  1st layer: evidence of chemistry having spread outside cell area  3rd layer: spectra show chemistry still mixed up between nucleus and membrane  4th layer: one cell exhibits bimodal nucleus Rapid Commun. Mass Spectrom. 25 (2011) 925-932 indicating the dividing stage of cell cycle – mitosis
  • 29.
    3D visualisation –issues  Problem with point of view of ‘observer’  Mass analyser is normal to sample  Different sample heights appear to be at same level  Data size:  256 x 256 pixels x 10 layers x 11400 channels  7.5 billion datapoints = 58 GB  >99% of data is zero so only ~5 GB in RAM
  • 30.
    3D visualisation –our approach  PCA of whole dataset (sparse, MATLAB)  Identify PC that separates substrate from organic overlayer  Assume substrate is flat  Slide columns of pixels to align with crossover in score of that PC  Discard substrate-rich pixels  Repeat PCA to identify peaks or components  Render results (AVS Express) Rapid Commun. Mass Spectrom. 25 (2011) 925-932
  • 31.
    Two sample preparationmethods Aim is to maintain the biological integrity of cells in vacuum system Formalin fixed, freeze-dried cells Frozen hydrated cells  Hela-M cells cultured on poly-  Hela-M cells cultured on one L-lysine coated silicon wafer plate of hinged two-plate steel  Formalin fixed (4%) substrate coated in poly-L-lysine  Cells are washed with  Cells are washed with ammonium ammonium formate (0.15 M) formate (0.15M) to remove salts prior to analysis to remove  Cells trapped between plates salts and rapidly frozen in liquid  Freeze-dried in vacuum propane  Sandwich fractured in vacuum at 160 K to reveal cells Rapid Commun. Mass Spectrom. 25 (2011) 925-932
  • 32.
    Frozen hydrated HeLacells Lipid (DPPC headgroup) @ m/z 184.1 Nucleic acid (adenine) @ m/z 136.1 Rapid Commun. Mass Spectrom. 25 (2011) 925-932
  • 33.
    Formalin fixed, freeze-driedHeLa cells Lipid (DPPC) @ m/z 184.1 Principal component indicating Rapid Commun. Mass Spectrom. 25 (2011) 925-932 guanine & tryptophan
  • 34.
    Movies available online 256x 256 pix  Frozen hydrated HeLa cells x 10 layers  http://www.youtube.com/watch?v=sPuMYUQPvHQ x 11400 chans  Visualisation of the membrane (green, m/z 184.1, phosphocholine) and nucleus (red, m/z 136.1, adenine) 7.5 billion pts  Formalin fixed, freeze-dried HeLa cells = 58 GB  http://www.youtube.com/watch?v=2phMkYo7yhg  Visualisation of the membrane (green, m/z 184.1, >99% of data phosphocholine) and nucleus (purple, negative scores is zero so only on principal component 5) ~5 GB in RAM  Possible mitosis visible at lower right
  • 35.
    Summary  What SIMS lacks in ultimate species identification it makes up for in spatial location  New ionisation sources produce data nearer to ‘traditional’ MS, opening up resources  Tandem MS approaches are breaking through
  • 36.
     SARC SARC  Validation algorithms  John Vickerman John Vickerman  Roy Goodacre Nick Lockyer  Nick Lockyer John Fletcher  Yun Xu  John Fletcher Sadia Sheraz (nee Rabbani)  Elon Correa ... Sadia Rabbani (now Sheraz)  and the rest!  Visualisation group in  ... and the rest! Research Computer Services Acknowledgements www.sarc.manchester.ac.uk
  • 37.