Using Principal Component
Analysis to Remove Correlated
Signal from Astronomical Images
Kim Scott
National Radio Astronomy...
Galaxy Evolution in One Slide...
Galaxy Evolution in One Slide...
Galaxy Evolution in One Slide...

?
Galaxy Surveys – What Are We Missing?
Galaxy Surveys – What Are We Missing?

Optical surveys miss
~50% of star formation
in galaxies
Optical surveys
are biased
...
Galaxy Surveys at (Sub)mm Wavelengths
Atmospheric emission

1000× stronger than signal from galaxies

Extragalactic emissi...
Removing the Atmosphere by
Modulating the Signal in Time
Detector array

Galaxy
Removing the Atmosphere by
Modulating the Signal in Time
Detector array

i=1

i=2

i=3

Galaxy

xij: power measured for
ti...
Surveys at λ=1.1mm with AzTEC
ASTE Telescope
AzTEC Dewar
AzTEC Array
(117 detectors)
Raw Time-stream Data

Sample rate = 1∕(15.625 ms)
Raw Time-stream Data

Sample rate = 1∕(15.625 ms)
(20 s = 1280 samples)
Principal Component Analysis (PCA)

[Used in supervised learning to compress data - fit to
fewer number of features]
• xij:...
Principal Component Analysis (PCA)
Step 1: Mean normalization (and feature scaling)
• Compute μj = (1∕m) Σi=1,m xij for ea...
Principal Component Analysis (PCA)
Step 1: Mean normalization (and feature scaling)
• Compute μj = (1∕m) Σi=1,m xij for ea...
Principal Component Analysis (PCA)
Step 1: Mean normalization (and feature scaling)
• Compute μj = (1∕m) Σi=1,m xij for ea...
Principal Component Analysis (PCA)
Step 2: Calculate covariance matrix
• C = (1∕m) X XT
(recall m = # time samples)
• C → ...
Principal Component Analysis (PCA)
Step 4: Choose number of components to remove
• Goal: choose fewest number of component...
Principal Component Analysis (PCA)
Step 5: Reconstruct data without correlated signal
• Know RA/Dec for each detector: nee...
Principal Component Analysis (PCA)
Step 5: Reconstruct data without correlated signal
• Know RA/Dec for each detector: nee...
Image of PKS J1127-1857
Make the map:
• Use information on sky position for each detector at each time
sample (RAij, Decij...
An Extragalactic Survey at λ=1.1 mm
• Most galaxies are 100× fainter
than PKS J1127-1857
• raw data ~ 25 GB
• ttot ~ 80 hr...
An Extragalactic Survey at λ=1.1 mm

• AzTEC/COSMOS survey
• 0.7 deg2
• 500× area of HUDF
• 160 hrs versus 270 hrs for
HUD...
An Extragalactic Survey at λ=1.1 mm

• AzTEC/COSMOS survey
• 0.7 deg2
• 500× area of HUDF
• 160 hrs versus 270 hrs for
HUD...
An Extragalactic Survey at λ=1.1 mm
• AzTEC-3
• Observed 1 Gyr after Big Bang
• Starburst galaxy (SFR~1000 Msun/yr)

Capak...
Upcoming SlideShare
Loading in …5
×

Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images

1,849 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,849
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
14
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images

  1. 1. Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images Kim Scott National Radio Astronomy Observatory Data Science Meet-up February 18, 2014
  2. 2. Galaxy Evolution in One Slide...
  3. 3. Galaxy Evolution in One Slide...
  4. 4. Galaxy Evolution in One Slide... ?
  5. 5. Galaxy Surveys – What Are We Missing?
  6. 6. Galaxy Surveys – What Are We Missing? Optical surveys miss ~50% of star formation in galaxies Optical surveys are biased Dust reemits stellar radiation at infrared to millimeter wavelengths (λ ~ 20 – 2000 μm)
  7. 7. Galaxy Surveys at (Sub)mm Wavelengths Atmospheric emission 1000× stronger than signal from galaxies Extragalactic emission: Transmitted Absorbed
  8. 8. Removing the Atmosphere by Modulating the Signal in Time Detector array Galaxy
  9. 9. Removing the Atmosphere by Modulating the Signal in Time Detector array i=1 i=2 i=3 Galaxy xij: power measured for time sample i on detector j
  10. 10. Surveys at λ=1.1mm with AzTEC ASTE Telescope AzTEC Dewar AzTEC Array (117 detectors)
  11. 11. Raw Time-stream Data Sample rate = 1∕(15.625 ms)
  12. 12. Raw Time-stream Data Sample rate = 1∕(15.625 ms) (20 s = 1280 samples)
  13. 13. Principal Component Analysis (PCA) [Used in supervised learning to compress data - fit to fewer number of features] • xij: power measured for time sample i on detector j • n = number of detectors; m = number of time samples • X = [ x1 x2 ... xm ] → n × m matrix *Only input needed for PCA*
  14. 14. Principal Component Analysis (PCA) Step 1: Mean normalization (and feature scaling) • Compute μj = (1∕m) Σi=1,m xij for each detector • Compute σ2j = (1∕(m-1)) Σi=1,m (xij - μj)2 for each detector • Set xij (xij − μj) ∕ σj • X = [ x1 x2 ... xm ] → n × m matrix
  15. 15. Principal Component Analysis (PCA) Step 1: Mean normalization (and feature scaling) • Compute μj = (1∕m) Σi=1,m xij for each detector • Compute σ2j = (1∕(m-1)) Σi=1,m (xij - μj)2 for each detector • Set xij (xij − μj) ∕ σj • X = [ x1 x2 ... xm ] → n × m matrix
  16. 16. Principal Component Analysis (PCA) Step 1: Mean normalization (and feature scaling) • Compute μj = (1∕m) Σi=1,m xij for each detector • Compute σ2j = (1∕(m-1)) Σi=1,m (xij - μj)2 for each detector • Set xij (xij − μj) ∕ σj • X = [ x1 x2 ... xm ] → n × m matrix 1mV *PCA can identify lower level correlations among subsets of the detectors*
  17. 17. Principal Component Analysis (PCA) Step 2: Calculate covariance matrix • C = (1∕m) X XT (recall m = # time samples) • C → n × n symmetric matrix (recall n = 117 detectors) Step 3: Eigen decomposition • C = Q Λ Q-1 (*solve using SVD*) • Q = [ q1 q2 ... qn ] → n × n matrix containing eigenvectors qi • Λ → n × n diagonal matrix containing eigenvalues λi = Λii • Principal components = uncorrelated variables
  18. 18. Principal Component Analysis (PCA) Step 4: Choose number of components to remove • Goal: choose fewest number of components (k) to REMOVE most of the observed variance in the data • QR = [ qk+1 qk+2 ... qn ] → n × k matrix, k < n • Z = [ z1 z2 ... zm ] = QRT X → k x m matrix • To derive model of galaxy intensities on sky, use Z instead of X (but...) Choosing k: Variance after PCA (given k) < 0.05 Variance with average subtraction only
  19. 19. Principal Component Analysis (PCA) Step 5: Reconstruct data without correlated signal • Know RA/Dec for each detector: need to reconstruct approximation for data to make image • XR = QR Z → n × m matrix with correlated signal removed! 1mV
  20. 20. Principal Component Analysis (PCA) Step 5: Reconstruct data without correlated signal • Know RA/Dec for each detector: need to reconstruct approximation for data to make image • XR = QR Z → n × m matrix with correlated signal removed! 20μV *Variance reduced by factor of 50*
  21. 21. Image of PKS J1127-1857 Make the map: • Use information on sky position for each detector at each time sample (RAij, Decij) and bin data onto image grid • Set the intensity of each image pixel to the average of the xRij values that fall into that bin • Smooth image by telescope point-spread response function (Gaussian with FWHM=30’’) Average Subtraction PCA Cleaned • raw data = 30 MB • ttot = 4 min • 16640 samples/detector
  22. 22. An Extragalactic Survey at λ=1.1 mm • Most galaxies are 100× fainter than PKS J1127-1857 • raw data ~ 25 GB • ttot ~ 80 hrs • ~ 2×107 samples/detector • AzTEC/COSMOS survey • 0.7 deg2 • 500× area of HUDF • 160 hrs versus 11 days for HUDF • 130 mm-bright galaxies Aretxaga et al. 2011
  23. 23. An Extragalactic Survey at λ=1.1 mm • AzTEC/COSMOS survey • 0.7 deg2 • 500× area of HUDF • 160 hrs versus 270 hrs for HUDF • 130 mm-bright galaxies
  24. 24. An Extragalactic Survey at λ=1.1 mm • AzTEC/COSMOS survey • 0.7 deg2 • 500× area of HUDF • 160 hrs versus 270 hrs for HUDF • 130 mm-bright galaxies
  25. 25. An Extragalactic Survey at λ=1.1 mm • AzTEC-3 • Observed 1 Gyr after Big Bang • Starburst galaxy (SFR~1000 Msun/yr) Capak et al. 2011 • AzTEC/COSMOS survey • 0.7 deg2 • 500× area of HUDF • 160 hrs versus 270 hrs for HUDF • 130 mm-bright galaxies Aretxaga et al. 2011

×