IVFC Signal Denoising

735 views

Published on

ivfc slides to Honwai Leong.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
735
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

IVFC Signal Denoising

  1. 1. Introduction Methods and Results SummaryCell Counting on In Vivo Flow Cytometry Time Series data Chaofeng Wang March 25, 2011 Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  2. 2. Introduction Methods and Results SummaryAbstract In the presentation, I will introduce three methods for IVFC data analysis. Line-Separating Method is the conventional and earliest method. Wavelet-based peak picking is an adaptive method inspired from audio processing And statistical thresholding method uses Gaussian Mixture Model to count cell automatically and consistently. Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  3. 3. Introduction Methods and Results SummaryIn Vivo Flow Cytometry (IVFC) Excited and detected at a same confocal plane. Output: Time Series data. 1 1 For IVFC settings, refer to [9]. Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  4. 4. Introduction Methods and Results SummaryIn Vivo Flow Cytometry (IVFC) Capabilities [9]: Real-time Cell Counting (v.s. Hemocytometer) Suitable for cells of high velocity and Low SNR signal(v.s. Confocal and 2-photon imaging) - 5 ∼ 100 kHz sampling rate. Monitoring cell kinetics in vivo (without blood extraction) Most Applications are in Metastasis research [7, 13]. Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  5. 5. Introduction Methods and Results SummaryLow SNR Reasons Inventory 1. Auto-fluorescence 2. Unspecific Labeling from incomplete cleansing 3. Labeled cells deviating from Confocal Plane 4. Non-uniform Staining 5. Instability of fluorescent dyes in long-time assaying 6. Labeled cells may aggregate. 2 out of 119 images of labeled cells are potentially clustered cells [8] 7. Instrumental noises and White noises Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  6. 6. Introduction Methods and Results SummaryConventional Gating: Line Separating Method (LSM) Line Separating Gui V2.0 Thresholds adjustable Discrete FWHM is calculated, thus discreteness. MATLAB scripts by chaofeng Wang. Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  7. 7. Introduction Methods and Results SummaryDiscrete FWHM Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  8. 8. Introduction Methods and Results SummaryLine Separating Method (LSM) Gating Strategies Background assaying - control data Manual pickup of noise segments from experiment data Expert adjustment (subjectivity) Peak Height - Full Width at Half Maximum (FWHM) feature space, Separating by a straight line (underfitting, Hyperbola, y = x −1 + a?) 2 2 LSM is proposed on the invention of IVFC by Novak et al [10]. Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  9. 9. Introduction Methods and Results SummaryWavelet Based Peak Picking Two Steps, 1. Wavelet Denoising. 2. Adaptive Peak Picking. The work is contributed to David Damm, presented on BMEI 2009 conference [4]. Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  10. 10. Introduction Methods and Results SummaryWavelet Denoising Noise Model: recover an unknown function f on [0, 1] from noisy data di = f (ti ) + σzi , i = 0, . . . , n − 1 i where ti = n, zi is a standard Gaussian White Noise (zi ∼ N(0, 1), i.i.d), and σ is a noise level. Denoise Aim: Optimize the Mean Squared Error subject to the ˆ condition that f is at least as smooth as f with high probability. 3 3 Reference: [6, 5] Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  11. 11. Introduction Methods and Results SummarySoft thresholding Apply the soft thresholding nonlinearity coordinatewise to the empirical wavelet coefficients: ηt (y ) = sgn(y )(|y | − t)+ where (x)+ = 0 if x < 0; (x)+ = x if x ≥ 0. And t is specially chosen threshold. tn = γ1 × σ × 2log (n)/n γ1 is a constant, which is set to 1 in simpler situations. For practical situations where σ is unknown, σ = MAD/0.6745 is ˆ used. 4 4 Reference: [5] Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  12. 12. Introduction Methods and Results SummaryAdaptive Peak Picking Finite State Automaton In A1 and P1, accumulated discrete derivative is reset to 0. A peak is reported whenever stat D2 is reached. Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  13. 13. Introduction Methods and Results SummaryAdaptive Peak Picking Threshold baseline is calculated in a rolling window [t − l/2, t + l/2], on a fixed (even interger) window size l: B(t) = Medianw + Stdw Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  14. 14. Introduction Methods and Results SummaryWavelet Based Peak Picking Matlab Wavelet toolbox is used for the research. Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  15. 15. Introduction Methods and Results SummaryWavelet Method in comparison to LSM Table: Comparison of cell counts by wavelet method and LSM Dataset LSM wavelet Consensus 1-1.dcf 80 162 79 1-2.dcf 71 153 70 2-1.dcf 30 42 13 2-2.dcf 41 59 20 3-1.dcf 175 175 135 3-2.dcf 81 157 77 5-1.dcf 36 67 34 5-6.dcf 59 69 46 Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  16. 16. Introduction Methods and Results SummaryStatistical Modeling for IVFC data peaks Disadvantages of LSM: Subjective, labour-intensive - control is always needed to perform. Susceptible to outliers in control. Control losing thresholding power when long-time assaying lasting for days. Experts may give inconsistent thresholds. We propose a thresholding method to achieve consisteny and robustness based on statistical modeling, providing a kind of ground truth for other fast cell counting methods Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  17. 17. Introduction Methods and Results SummaryThe histogram of IVFC data Skewed to the right. Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  18. 18. Introduction Methods and Results SummaryThe histogram of IVFC log(data) All the values ≤ 0 are discarded. Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  19. 19. Introduction Methods and Results SummaryAutomatic classifiers for Flow cytometry Pyne et al: robust skew-t distribution mixture models, FLAME [12] Chan et al: extracted biologically meaningful cell subsets by defining putative cell subsets as groups of mixture components [2] In machine learning category, Vector Quantization methods are used [3, 11]. Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  20. 20. Introduction Methods and Results SummaryStatistical Thresholding Method (STM) Assumptions: Noise peaks are majority and clustered well. Cell peaks are minority and outliers. All the peaks can be modeled into 2 or more Gaussian Mixture Components. Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  21. 21. Introduction Methods and Results SummaryGaussian Mixture Model (GMM) Assume there are K groups in data, in GMM K components accordingly. K p(x) = p(k)p(x|k) k=1 K p(x) = πk N (x|µk , Σk ) k=1 where πk is the proportion of component k in whole data. 5 5 Reference: [1] Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  22. 22. Introduction Methods and Results SummaryExpectation Maximization for GMM 1. Expectation Step: πk N (xi |µk , Σk ) γ(i, k) = K j=1 πj N (xi |µj , Σj ) where γ(i, k) is the prob that xi comes from component k. 2. Likelihood Maximization Step: N 1 µk = γ(i, k)xi Nk i=1 N 1 Σk = γ(i, k)(xi − µk )(xi − µk )T Nk i=1 N where Nk = i=1 γ(i, k), and πk can be estimated as Nk /N. Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  23. 23. Introduction Methods and Results SummaryBayesian/Akaike Information Criterion (BIC, AIC) AIC and BIC are criteria to decide which model is best to avoid overfitting and underfitting, AIC = 2k − 2ln(L) BIC = k × ln(n) − 2ln(L) where k is the number of parameters, and L is the maximized likelihood, n is sample size. Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  24. 24. Introduction Methods and Results SummaryBIC, AIC for k in GMM Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  25. 25. Introduction Methods and Results SummaryThresholding strategy 3-GMM is chosen for IVFC data. Cell peak component is too small and considered outliers. So the threshold is set on the noise component with the largest µ. Set threshold at µ2 + σ2 × a, where µ2 and σ2 is the mean and standard deviation of the second component. a is called sigma factor. Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  26. 26. Introduction Methods and Results SummarySigma Factor Picker The Picker aims to keep False Positive Number as small as possible. Sample number N µ + aσ Φ(µ + aσ) FPN for cell peaks <= 1 a=1 0.841344746069 N.A. <= 100 a=2 0.977249868052 <= 2. <= 1000 a=3 0.998650101968 <= 1. <= 105 a=4 0.999968328758 <= 3. <= 107 a=5 0.999999713348 <= 3. <= 109 a=6 0.999999999013 <= 1. <= 1012 a=7 0.999999999999 <= 1. Table: Sigma Factor Picker Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  27. 27. Introduction Methods and Results SummaryKeep FPN low Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  28. 28. Introduction Methods and Results SummarySTM procedures 1. Bring down the baseline to 0 and smooth. v=v−b v is the input data and b is the estimated baseline. 2. Shift-lessly filtering. vs = Convolve(v, GKern(lgk )) GKern(lgk ) is the Gaussian Kernel of length of lgk . 3. Get all the peaks (or say local maxima) of Vs , noted as p. They are cell peak candidates. 4. Use [0.75 0.95] quantile as bounds to generate initial guess, and use it to fit 3 gaussian mixture model to p. In descending order, they are D1 , D2 , D3 . 5. t = D2 .µ + sf ∗ D2 .σ. Sigma factor sf is determined by the sample number of D2 according to the Sigma Factor Picker Table. 6. All the peaks in p higher than t are picked as cell peaks. A Matlab Script for a Graphical User Interface of STM is available. Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  29. 29. Introduction Methods and Results SummarySimulated data 100 gaussian-shape peaks (in blue) with height 1˜2, fwhm 5˜9 evenly distributed in 10000 samples. Additive white gaussian noise with SNR = 1. Increasing baseline from 0 to 1. Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  30. 30. Introduction Methods and Results SummarySNR Presure Tests on Simulated data Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  31. 31. Introduction Methods and Results SummaryCell Peak Proportion Tests on Simulated data Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  32. 32. Introduction Methods and Results SummarySTM on Control data Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  33. 33. Introduction Methods and Results SummarySTM on Experiment data Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  34. 34. Introduction Methods and Results SummaryReal-time test on Experiment data Used Data Thresholds Cell Counts [0 100] 0.04727 572 [0 200] 0.04663 590 [0 300] 0.04558 615 [0 400] 0.04552 617 [0 500] 0.04510 626 [0 600] 0.04507 626 [0 700] 0.04522 620 [0 800] 0.04473 635 [0 900] 0.04450 642 whole data 0.04450 642 Table: Real-time test Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  35. 35. Introduction Methods and Results SummaryConsistency test on Experiment data Sum counts on 100 seconds segments, and compare to the result of integral counting. Used Data Summed Integral LSM 0-15 m1 652 641 295 15-30 m1 415 395 208 1h m1 229 225 NAN 72h m1 225 221 NAN 0-15 m2 621 614 68 45-60 m2 309 304 55 1h m2 196 200 41 0-15 m3 267 268 N. 30-45 m3 198 197 N. 1h m3 107 106 N. Table: Consistency test Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  36. 36. Introduction Methods and Results SummaryLSM, LSMsd, STM Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  37. 37. Introduction Methods and Results SummarySummary As for Non-stationary time-series data processing for IVFC, GMM-based thresholding provides a consistent method for cell counting. Other statistical models and pattern recognition methods might also be useful. Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  38. 38. Introduction Methods and Results SummaryAcknowlegements To Collaborators for hard work and inspirations: Jin Guo, IPS Guangda Liu, IPS Xiaoying Tan, IPS Prof. Xunbin Wei, IPS Visitors for guidance on Signal processing and Statistics: David Damm, past in Bonn University Keli Huang, Past in Bonn University Prof. Axel Mosig, and all members from the group for all kinds of support. Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  39. 39. Introduction Methods and Results SummaryBibliography I C.M. Bishop and SpringerLink (Online service). Pattern recognition and machine learning, volume 4. Springer New York:, 2006. Cliburn Chan, Feng Feng, Janet Ottinger, David Foster, Mike West, and Thomas B. Kepler. Statistical mixture modeling for cell subtype identification in flow cytometry. Cytometry, 73A(8):693–701, 2008. Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  40. 40. Introduction Methods and Results SummaryBibliography II ES Costa, ME Arroyo, CE Pedreira, MA Garcia-Marcos, MD Tabernero, J. Almeida, and A. Orfao. A new automated flow cytometry data analysis approach for the diagnostic screening of neoplastic b-cell disorders in peripheral blood samples with absolute lymphocytosis. Leukemia, 20(7):1221–1230, 2006. D. Damm, C. Wang, X. Wei, and A. Mosig. Cell counting for in vivo flow cytometer signals using wavelet-based dynamic peak picking. In Biomedical Engineering and Informatics, 2009. BMEI’09. 2nd International Conference on, pages 1–4. IEEE, 2009. Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  41. 41. Introduction Methods and Results SummaryBibliography III D. L. Donoho. De-noising by soft-thresholding. IEEE Trans. Inform. Theory, 41(3):613–627, May 1995. DAVID L. Donoho and JAIN M. Johnstone. Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81(3):425–455, 1994. Irene Georgakoudi, Nicolas Solban, John Novak, William L. Rice, Xunbin Wei, Tayyaba Hasan, and Charles P. Lin. In vivo flow cytometry. Cancer Research, 64(15):5044–5047, 2004. Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  42. 42. Introduction Methods and Results SummaryBibliography IV Ho Lee, Clemens Alt, Costas M. Pitsillides, Mehron Puoris’haag, and Charles P. Lin. In vivo imaging flow cytometer. Opt. Express, 14(17):7789–7800, Aug 2006. J. Novak, I. Georgakoudi, X. Wei, A. Prossin, and CP Lin. In vivo flow cytometer for real-time detection and quantification of circulating cells. Optics letters, 29(1):77–79, 2004. John Novak. Development of the in vivo flow cytometer. PhD thesis, Massachusetts Institute of Technology, Boston, MA, 2004. Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  43. 43. Introduction Methods and Results SummaryBibliography V C.E. Pedreira, E.S. Costa, M.E. Arroyo, J. Almeida, and A. Orfao. A multidimensional classification approach for the automated analysis of flow cytometry data. Biomedical Engineering, IEEE Transactions on, 55(3):1155–1162, 2008. Saumyadipta Pyne, Xinli Hu, Kui Wang, Elizabeth Rossin, Tsung-I Lin, Lisa M. Maier, Clare Baecher-Allan, Geoffrey J. McLachlan, Pablo Tamayo, David A. Hafler, Philip L. De Jager, and Jill P. Mesirov. Automated high-dimensional flow cytometric data analysis. Proceedings of the National Academy of Sciences, 106(21):8519–8524, May 2009. Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
  44. 44. Introduction Methods and Results SummaryBibliography VI X. Wei, D.A. Sipkins, C.M. Pitsillides, J. Novak, I. Georgakoudi, and C.P. Lin. Real-time detection of circulating apoptotic cells by in vivo flow cytometry. Molecular imaging: official journal of the Society for Molecular Imaging, 4(4):415, 2005. Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data

×