Josh Bloom (PI)
       , Justin Higgins, Adam Morgan
“Object”
 Datastream




Transients
Classification
Pipeline

 Classify

   Database

 Broadcast
SASIR              LSST
        SDSS                     PTF / LBL                           (future)         (future)
   ...
SDSS Stripe 82
        SDSS
        stripe-82
      archived data
                           •   A deep field from the Sloa...
Palomar Transient Factory
                    •   Palomar 48” telescope

                    •   100 Mpix, 7.8 sq-deg dete...
Next Generation Survey: LSST


                 Large Synoptic Survey
                   Telescope (LSST):
               ...
Transients Classification Pipeline
                                  “Object”
                                 Datastream

...
Parallelized source correlation
                             and classification

                •   Retrieve difference ob...
Parallelized source correlation
                             and classification

                •   Realtime TCP runs on 2...
Warehouse of light-curves

•   Need representative light-curves for all science

•   With these we can model each science ...
“Noisifying to the Survey”

•   Well sampled light-curves
     •   Can make good classifiers for well-sampled data.

     •...
“Noisifying to the Survey”
“Noisifying to the Survey”

•   For PTF:
     •   Code uses PTF pointing and survey observing plans

     •   Occasionally...
Classifiers
       •    General Classifier
                  Identify:                               Filter out:

•   well s...
Interesting near-galaxy PTF sources

 • Identified by TCP during end of Aug ‘09
 • Classification triggered by latest epoch
...
Periodic variable classifiers
                   •     Currently, science classes are determined by combining
             ...
Evaluating and Combining Classifiers


•   Issues when using multiple classifiers:
      •    How to combine classifiers when...
Talk about T.C.P. for CDI inter-departmental workshop at UC Berkeley. 20090911.
Talk about T.C.P. for CDI inter-departmental workshop at UC Berkeley. 20090911.
Talk about T.C.P. for CDI inter-departmental workshop at UC Berkeley. 20090911.
Upcoming SlideShare
Loading in …5
×

Talk about T.C.P. for CDI inter-departmental workshop at UC Berkeley. 20090911.

630 views

Published on

Talk about T.C.P. for CDI's inter-departmental workshop at UC Berkeley. 20090911.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
630
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Talk about T.C.P. for CDI inter-departmental workshop at UC Berkeley. 20090911.

  1. 1. Josh Bloom (PI) , Justin Higgins, Adam Morgan
  2. 2. “Object” Datastream Transients Classification Pipeline Classify Database Broadcast
  3. 3. SASIR LSST SDSS PTF / LBL (future) (future) stripe-82 subtraction archived data pipeline Survey X Survey Y (real-time survey telescope) (static survey repository) Transients Classification Pipeline Database containing Classify Broadcast “sources” “sources” • features for a source • interesting or transient source • include classifications Database • data epochs associated • include features, context with a source Broadcast
  4. 4. SDSS Stripe 82 SDSS stripe-82 archived data • A deep field from the Sloan Digital Sky Survey • 750 Million observation epochs Transients • ~20 Million “sources” clustered from epochs • 5 colors / filters, 4 years of observations Classification • We used Stripe-82 for testing and development Pipeline Database containing “sources” • features for a source • data epochs associated with a source
  5. 5. Palomar Transient Factory • Palomar 48” telescope • 100 Mpix, 7.8 sq-deg detector • ~120s cadence : ~200MB : <100GB/night • Post subtraction: ~1M difference objects / night • Post filtering: ~10k difference objects / night ~100s transient and variable stars LBL subtraction pipeline T PTF consortium PAIRITEL 1.3m C P Palomar 60” MDM 1.3m & 2.4m
  6. 6. Next Generation Survey: LSST Large Synoptic Survey Telescope (LSST): 1 Gb every 2 seconds 106 supernovae/yr 105 eclipsing systems 107 asteroids... light curves of 800 million sources every 3 days
  7. 7. Transients Classification Pipeline “Object” Datastream source T generation C feature generation P source classification Database Follow-up telescope observations Broadcast
  8. 8. Parallelized source correlation and classification • Retrieve difference objects • Each difference-object is passed to an IPython client • Each parallel IPython client performs: • Source creation or correlation with existing sources • “Feature” generation (or re-generation) for that source source • Classification of that source generation feature generation source classification
  9. 9. Parallelized source correlation and classification • Realtime TCP runs on 22 dedicated cores • LCOGT’s 96 core beowulf • non run-time tasks • Classifier generation • Additional resources: (for future classification work) • Yahoo! M45 cluster source generation • Amazon EC2 cluster feature generation source classification
  10. 10. Warehouse of light-curves • Need representative light-curves for all science • With these we can model each science class • We’ve built a warehouse of example light-curves TCP-TUTOR DotAstro.org internal interface public interface
  11. 11. “Noisifying to the Survey” • Well sampled light-curves • Can make good classifiers for well-sampled data. • Don’t immediately make good classifiers for noisy, sparse data. • We need classifiers which are trained using: • sampling cadence of our survey • sparseness of our survey data • noise and sensitivity limitations of our instrument • We need “Noisification” software which: • Resamples well-sampled light-curves • Outputs noisified sources which are used for generating classifiers
  12. 12. “Noisifying to the Survey”
  13. 13. “Noisifying to the Survey” • For PTF: • Code uses PTF pointing and survey observing plans • Occasionally PTF observes using a faster cadence: • 7.5 minutes between revisiting an RA, Dec • Faster cadence requires a separate set of noisified light-curves and classifiers. • Other surveys: • Other pointing and observing plans could be used. • Can generate noisified light-curves for other surveys. • Then we can generate science classifiers for these surveys.
  14. 14. Classifiers • General Classifier Identify: Filter out: • well sampled (periodic & nonperiodic) • poorly subtracted sources • interesting sources near known galaxies • minor planets / rocks • periodic variable science class when • cosmic rays confidence is high • detector defects • Timeseries Classifiers • Weighted combination of WEKA classifiers • bagged Random Forest classifier using a cost-matrix • Each classifier trained on different cadenced noisified data • Astronomer crafted classifiers for specific science types • Microlens, Super Nova
  15. 15. Interesting near-galaxy PTF sources • Identified by TCP during end of Aug ‘09 • Classification triggered by latest epoch added to the source
  16. 16. Periodic variable classifiers • Currently, science classes are determined by combining the weighted probabilities generated by different classification models, for a source. ~0.4 day period ~0.14 day period RR Lyrae using • Each machine-learned classification model is trained using RR Lyrae using 10 epoch 20 epoch “noisified” lightcurves which were generated using different parameters. noisification noisification ...shows highest classification Clicking on a class for one probability sources for that of dozens of ML models... model::class Overplotting of period-fold plotting period-folded model probably failed here still needs work 0.1 - 0.17 day period RR Lyrae using 15 epoch noisification
  17. 17. Evaluating and Combining Classifiers • Issues when using multiple classifiers: • How to combine classifiers when using: • weighted classifiers • tree-hierarchy of sub-classifiers • How to generate final classification “probabilities” when using: • Widely varying types of classifiers • Classifiers which contain sub-classifications & probabilities • Evaluate the final combination of classifiers • Classify PTF09xxx user classified sources, determine efficiencies • Classify noisified sources, determine efficiencies

×