Your SlideShare is downloading. ×
0
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era

849

Published on

This is a talk given at the "From Data to Knowledge" Workshop on streaming data in Berkeley, California. …

This is a talk given at the "From Data to Knowledge" Workshop on streaming data in Berkeley, California.

YouTube: http://www.youtube.com/watch?v=aEoj7eHh6Gg&feature=plcp
Twitter: @profjsb

http://lyra.berkeley.edu/CDIConf/program.html

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
849
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • * time-domain in astronomy\n* Crucial. new discoveries Looking at the sky with new tools (new eyes). Ptlometic order - planets were suppose to be fixed spherical orbs not with their own moons -- that didn’t fit the world view. opportunistic tools\n* emphasizes the crucial roles of humans both in the data collection side, data analysis, and inference. \n\n\n
  • \n\n
  • happy to acknowledge. big effort. industry support.\n
  • needle in the haystack\n
  • needle in the haystack\n
  • needle in the haystack\n
  • needle in the haystack\n
  • needle in the haystack\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • contrained in time. decisions with incomplete information. Extreme rarities -- maybe a few a year of interest. imbalance and robust\n
  • \n
  • Teaming with things we know and dont know about. exploration of the known and the unknowns.\n\nRumsfeldian\nshort timescales.\n
  • \n
  • Simply must understand that our roles must change.\n
  • Simply must understand that our roles must change.\n
  • identification different than discovery: Galeilio \nGalileo's drawings show that he first observed Neptune on December 28, 1612, and again on January 27, 1613. On both occasions, Galileo mistook Neptune for a fixed star when it appeared very close—inconjunction—to Jupiter in the night sky;[20] hence, he is not credited with Neptune's discovery.\n\n253 year later\n\nJohann Gottfried Galle\n 23 September 1846\n
  • identification different than discovery: Galeilio \nGalileo's drawings show that he first observed Neptune on December 28, 1612, and again on January 27, 1613. On both occasions, Galileo mistook Neptune for a fixed star when it appeared very close—inconjunction—to Jupiter in the night sky;[20] hence, he is not credited with Neptune's discovery.\n\n253 year later\n\nJohann Gottfried Galle\n 23 September 1846\n
  • identification different than discovery: Galeilio \nGalileo's drawings show that he first observed Neptune on December 28, 1612, and again on January 27, 1613. On both occasions, Galileo mistook Neptune for a fixed star when it appeared very close—inconjunction—to Jupiter in the night sky;[20] hence, he is not credited with Neptune's discovery.\n\n253 year later\n\nJohann Gottfried Galle\n 23 September 1846\n
  • identification different than discovery: Galeilio \nGalileo's drawings show that he first observed Neptune on December 28, 1612, and again on January 27, 1613. On both occasions, Galileo mistook Neptune for a fixed star when it appeared very close—inconjunction—to Jupiter in the night sky;[20] hence, he is not credited with Neptune's discovery.\n\n253 year later\n\nJohann Gottfried Galle\n 23 September 1846\n
  • 1.5 M per night, \n
  • \n
  • it should be easy -- there’s a bunch of classes of objects which vary, we measure their light curves and that’s it. Even remarkably homogeneous classes such as Ia and RRL exhibit huge variations. \n\n
  • it should be easy -- there’s a bunch of classes of objects which vary, we measure their light curves and that’s it. Even remarkably homogeneous classes such as Ia and RRL exhibit huge variations. \n\n
  • it should be easy -- there’s a bunch of classes of objects which vary, we measure their light curves and that’s it/\n
  • however, in practice\n
  • however, in practice\n
  • however, in practice\n
  • however, in practice\n
  • however, in practice\n
  • however, in practice\n
  • dynamic time warping\nhundreds of features: n log n, n^2, etc. some of these are results of external queries.\n\nSame things we, as experts, look at in a light curve and ancillary data.\n
  • dynamic time warping\nhundreds of features: n log n, n^2, etc. some of these are results of external queries.\n\nSame things we, as experts, look at in a light curve and ancillary data.\n
  • dynamic time warping\nhundreds of features: n log n, n^2, etc. some of these are results of external queries.\n\nSame things we, as experts, look at in a light curve and ancillary data.\n
  • dynamic time warping\nhundreds of features: n log n, n^2, etc. some of these are results of external queries.\n\nSame things we, as experts, look at in a light curve and ancillary data.\n
  • discovery of physical intuition, like what Alex talked about.\n
  • discovery of physical intuition, like what Alex talked about.\n
  • \n
  • how you observed the data impacts what you think it is. This is obvious.\n\napproach is to craft ground truth from one survey to look like another. Either in light curve space\nor in feature space.\n
  • how you observed the data impacts what you think it is. This is obvious.\n\napproach is to craft ground truth from one survey to look like another. Either in light curve space\nor in feature space.\n
  • how you observed the data impacts what you think it is. This is obvious.\n\napproach is to craft ground truth from one survey to look like another. Either in light curve space\nor in feature space.\n
  • how you observed the data impacts what you think it is. This is obvious.\n\napproach is to craft ground truth from one survey to look like another. Either in light curve space\nor in feature space.\n
  • say good bye to black and white catalogs, \n
  • posterior probabilities\nnot liklihoods -- convolved with the priors\nprescription for adapation\n
  • best way to find needles in teh haystack is to get really good a finding and identifying hay.\n
  • 8 of 13\n∆mV up to ∼8 mag), aperiodic declines in brightness\nAt maximum light RCB stars are bright supergiants,\n\nMerrill-Sanford bands of SiC2 in three of our candidates: ASAS 162232−5349.2, ASAS 065113+0222.1, and ASAS 182658+0109.0. To our knowledge this is the first identification of SiC2 in a DYPer spectrum\n\n
  • last one not so important if we can wait for the answer.\n
  • richer\n
  • \n
  • Transcript

    • 1. January 7 - March 2, 1610
    • 2. Classification of AstronomicalTime-Series Data in theSynoptic Survey Era Josh Bloom Joseph Richards University of California, Berkeley Berkeley Streaming Workshop; 7 May 2012
    • 3. Center for Time-Domain InformaticsUC Berkeley (UCB):Faculty/StaffJSB, Dan Starr (Astro), John Rice, Noureddine El Karoui (Stats), MartinWainwright, Masoud Nikravesh (CS)PostdocsJoey Richards (stat/astro), Berian James, Damian Eads, Dovi Poznanski(→Tel Aviv), Brad Cenko, Nat Butler, Nino Cucchiara, Damian Eads(→Cambridge)Grad StudentsDan Perley (→Caltech), Adam Miller, Adam Morgan, Chris Klein, JamesLong, Tamara Broderick (stats), Sahand Negahban (EECS), John Brewer (→Yale),Henrik Brink (←Copenhagen)UndergradsAnthony Paredes, Tatyana Gavrilchenko, Stuart Gegenheimer, Maxime Rischard,Justin Higgins, Rachel Kennedy, Arien Crellin-Quick, Michelle Kislak (→UCLA),Allison Merritt (→Yale)Lawrence Berkeley National Laboratory (LBNL):Peter Nugent, David Schlegel, Nic Ross, Horst Simon Visit our website: http://cftd.info/
    • 4. TextUnderstanding & Exploiting the Dynamic Universe
    • 5. TextUnderstanding & Exploiting the Dynamic Universe •Twinkle, twinkle... Everything changes at some level (brightness, color, position, ...)
    • 6. TextUnderstanding & Exploiting the Dynamic Universe •Twinkle, twinkle... Everything changes at some level (brightness, color, position, ...) • Stars die...and blow up supernovae, gamma-ray bursts, new phenomena ...
    • 7. TextUnderstanding & Exploiting the Dynamic Universe •Twinkle, twinkle... Everything changes at some level (brightness, color, position, ...) • Stars die...and blow up supernovae, gamma-ray bursts, new phenomena ... • Discovery is only the start
    • 8. TextUnderstanding & Exploiting the Dynamic Universe •Twinkle, twinkle... Everything changes at some level (brightness, color, position, ...) • Stars die...and blow up supernovae, gamma-ray bursts, new phenomena ... • Discovery is only the start Greatest insights require follow-up (imaging, spectroscopy, archive introspection)
    • 9. TextUnderstanding & Exploiting the Dynamic Universe •Twinkle, twinkle... Everything changes at some level (brightness, color, position, ...) • Stars die...and blow up supernovae, gamma-ray bursts, new phenomena ... • Discovery is only the start Greatest insights require follow-up (imaging, spectroscopy, archive introspection) Follow-up is EXPENSIVE (ie., people, time, telescope, resources, $)
    • 10. Gamma-Ray Burst Transients “static” γ-ray sky
    • 11. Gamma-Ray Burst Transients• Short-lived blasts of high energy light (γ-rays & X-rays)
    • 12. Gamma-Ray Burst Transients• Short-lived blasts of high energy light (γ-rays & X-rays) “static” γ-ray sky
    • 13. Gamma-Ray Burst Transients• Short-lived blasts of high energy light (γ-rays & X-rays)• random & rare - found by specialized satellites “static” γ-ray “static” γ-ray sky
    • 14. Gamma-Ray Burst Transients• Short-lived blasts of 106 high energy light (γ-rays & X-rays) 10,000• random & rare - found by specialized satellites 100• also: brightest optical events in universe power 1 (transient “afterglow”) 0.01
    • 15. Gamma-Ray Burst Transients• Short-lived blasts of 106 high energy light (γ-rays & X-rays) 10,000• random & rare - found by specialized satellites 100• also: brightest optical events in universe power 1 (transient “afterglow”)• two origins: exploding 0.01 massive stars & colliding compact objects
    • 16. Gamma-Ray Burst Transients• Short-lived blasts of 106 high energy light (γ-rays & X-rays) 10,000• random & rare - Challenge: how can we found by specialized satellites maximize our science 100 return power• also: brightest optical on discovery with events in universe optimized follow up? 1 (transient “afterglow”)• two origins: exploding 0.01 massive stars & colliding compact objects
    • 17. Follow-Up-Resource-Aware Classification collect burst data from satellite feed predict which events are “high redshift” "high redshift" GRBs in real-time less interesting unclassified "immediately" available data
    • 18. Follow-Up-Resource-Aware Classification Efficiency vs α 1.0 predicted fraction of high-redshift GRBs improvement (90% c.l.) 0.8 Fraction of high (z>4) GRBs observed“59% (86%) ofhigh-z GRBs can 0.6be captured from omfollowing up the ndtop 20% (40%) of 0.4 rathe rankedcandidates” 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 followed-up fraction Fraction of GRBs Followed Up: α Morgan+11 reduced
    • 19. Extragalactic Transient Universe: Explosive Systems -22 -20 Pair Production Supernovae z=0.45log(brightness) -18 Type Ia MH -16 Type IIp -14 IMBH + WD Collision 200Mpc -12 NS + RSG Collision NS + NS Mergers -10 50 100 150 200 Days Since Explosion E. Ramirez-Ruiz (UCSC)
    • 20. TextData Deluge Challenge Large Synoptic Survey Telescope (LSST) - 2018 ! Light curves for 800M sources every 3 days 106 supernovae/yr, 105 eclipsing binaries 3.2 gigapixel camera, 20 TB/night LOFAR & SKA 150 Gps (27 Tflops) → 20 Pps (~100 Pflops) Gaia space astrometry mission - 2013 1 billion stars observed ∼70 times over 5 years Will observe 20K supernovae Many other astronomical surveys are already producing data: SDSS, PTF, CRTS, Pan-STARRS, Hipparcos, OGLE, ASAS, Kepler, LINEAR, DES (soon) etc., etc.
    • 21. TextData Deluge Challenge Large Synoptic Survey Telescope (LSST) - 2018 ! Light curves for 800M sources every 3 days 106 supernovae/yr, 105 eclipsing binaries 3.2 gigapixel camera, 20 TB/night LOFAR & SKA 150 Gps (27 Tflops) → 20 Pps (~100 Pflops) Gaia space astrometry mission - 2013 1 billion stars observed ∼70 times over 5 years Will observe 20K supernovae Many other astronomical surveys are already producing data: SDSS, PTF, CRTS, Pan-STARRS, Hipparcos, OGLE, ASAS, Kepler, LINEAR, DES (soon) etc., etc.
    • 22. TextData Deluge Challenge Large Synoptic Survey Telescope (LSST) - 2018 ! Light curves for 800M sources every 3 days 106 supernovae/yr, 105 eclipsing binaries 3.2 gigapixel camera, 20 TB/night LOFAR & SKAHow do we do discovery, follow-up, and inference when 150 Gps (27 Tflops) → 20 Pps (~100 Pflops) the data rates (& requisite Gaia space astrometry mission - 2013 1 billiontimescales) precludeyears stars observed ∼70 times over 5 human involvement? Will observe 20K supernovae Many other astronomical surveys are already producing data: SDSS, PTF, CRTS, Pan-STARRS, Hipparcos, OGLE, ASAS, Kepler, LINEAR, DES (soon) etc., etc.
    • 23. Machine Learning As Surrogate- trained to quickly make concrete, deterministic, &repeatable statements about abstract concepts “Is this varying source astrophysical in nature or spurious?”
    • 24. Machine Learning As Surrogate - trained to quickly make concrete, deterministic, & repeatable statements about abstract concepts “Is this varyingsource astrophysical Discovery in nature or spurious?”
    • 25. Machine Learning As Surrogate - trained to quickly make concrete, deterministic, & repeatable statements about abstract concepts PTF: 1.5M candidate/night “Is this varying 1:1000 are astrophysicalsource astrophysical machine has opined on in nature or 800M candidates Bloom+11 spurious?” Poznanski, Brink, this workshop Discovery
    • 26. Reference New Difference 11kly 11kx also, cf., Bailey+07
    • 27. 2011fe identified w/ Machine-Learned Discovery Algorithms Discovery image was ~11 hours after explosion Within a few hours, a spectrum confirmed it to be a SN Ia Nearest SN Ia in more than 3 decades 5th brightest supernova in 100 years
    • 28. Machine Learning As Surrogate- trained to quickly make concrete, deterministic, &repeatable statements about abstract concepts “What is the nature (origin/reason...) of the variability?”
    • 29. Machine Learning As Surrogate- trained to quickly make concrete, deterministic, &repeatable statements about abstract concepts“What is the nature Classification(origin/reason...) of the variability?”
    • 30. Pulsating Alpha Cygni (ACYG) Short Period (BCEPS) Beta Cephei (BCEP) Anomalous (BLBOO)Pulsating Stars Multiple Modes (CEPB) Cepheids (CEP) Long Period (CWA) W Virginis (CW) Short Period (CWB) Delta Cep (DCEP) Symmetrical (DCEPS) Delta Scuti (DSCT) Low Amplitude (DSCTC) Slow Irregular (L) Late Spectral Type (K, M, C, S) (LB) Mira (M) Supergiants (LC) Dual Mode (RRB) PV Telescopii (PVTEL) Asymmetric (RRAB) RR Lyrae (RR) Near Symmetric (RRC) Constant Mean Magnitude (RVA) RV Tauri (RV) Variable Mean Magnitude (RVB) Persistent Periodicity (SRA) Semiregular (SR) Poorly Defined Periodicity (SRB) Pulsating Subdwarfs (SXPHE) Supergiants (SRC) F, G, or K (SRD) Only H Absorption (ZZA) ZZ Ceti (ZZ) Only He Absorption (ZZB) HeII Absoption (ZZO)Cataclysmic Variables Cataclysmic Variables SS Cygni U Geminorum (UG) SU Ursae Majoris SNIa Z Camelopardalis SNIb Type I Supernovae (SNI) SNIc Supernovae (SN) SNIIL Type II Supernovae (SNII) SNIIN Fast Novae (NA) Slow Novae (NB) SNIIP Novae (N) Very Slow Novae (NC) Novalike Variables (NL) Recurrent Novae (NR) Gamma-ray Bursts (GRB) Long Gamma-ray Burst (LSB) Soft Gamma-ray Repeater (SGR) Symbiotic Variables (ZAND) Short Gamma-ray Burst (SHB) EclipsingEclipsing Systems Systems with White Dwarfs (WD) Semidetached (SD) Early (O-A) (KE) RS Canum Venaticorum (RS) W Ursa Majoris (KW) Planetary Nebulae (PN) Contact Systems (K) Algol (Beta Persei) (EA) Systems with Supergiant(s) (GS) Eclipsing Binary Systems (E) Beta Lyrae (EB) W Ursae Majoris (EW) Main Sequence (DM) Detached (D) With Subgiant (DS) Detached - AR Lacertae (AR) W Ursa Majoris (DW) Wolf-Rayet Stars (WR)
    • 31. Considerable Complications with Time-Series Data • noisy, irregularly sampled
    • 32. Considerable Complications with Time-Series Data • noisy, irregularly sampled • spurious data
    • 33. Considerable Complications with Time-Series Data • noisy, irregularly sampled • spurious data • telltale signature event may not have happened yet
    • 34. Machine-Learning Approach to Classification Features: homogenize the data; real-number metrics that describe the time-domain characteristics & context of a source ~100 features computed in < 1 sec (including periodogram analysis)Wózniak et al. 2004; Protopapas+06, Willemsen & Eyer 2007; Debosscher et al. 2007; Mahabal et al.2008; Sarro et al. 2009; Blomme et al. 2010; Kim+11, Richards+11
    • 35. Machine-Learning Approach to Classification Features: homogenize the data; real-number metrics that describe the time-domain characteristics & context of a source ~100 features computed in < 1 sec (including periodogram analysis) periodic variability metrics: e.g. domi metrics nant freq : e.g. Stetson indices, χ 2/dof Lomb- uencies in Scargle, p hase offs (constant hypothesis) between ets periods shape analysis wness, kurtosis, context metrics e.g. ske Gaussianity e.g. distance to nearest galaxy, type of nearest galaxy, locatio n in the ecliptic planeWózniak et al. 2004; Protopapas+06, Willemsen & Eyer 2007; Debosscher et al. 2007; Mahabal et al.2008; Sarro et al. 2009; Blomme et al. 2010; Kim+11, Richards+11
    • 36. Variable Star Classification Confusion                                    True  Class             Richards+11                          
    • 37. Variable Star Classification Confusion                             pulsating        True  Class  eruptive          multi-star   Richards+11                          
    • 38. Variable Star Classification Confusion                             pulsating    - global classification errors on    well-observed sources approaching  True 15%  Class - random forest with missing data  eruptive imputation emerging as superior     e.g., Dubath+11,Richards+11      multi-star   Richards+11                          
    • 39. StructuredLearningStructured Classification Structured Classification: Let class taxonomy guide classifier. 5% gross mis- classification rate! HSC: Hierarchical single-label HMC: Hierarchical multi-label classification. classification. I Fit separate classifier at I Fit one classifier, where depth each non-terminal node. L(y , f (x)) w0Richards+11
    • 40. Decision Boundaries are Survey Specific How do we transfer learning from one survey to the next? –3– (a) (b) feature #2 feature #1 feature #1 Hipparcos OGLE-IIIFig. 1.— (a) The grey lines represent the CART classifier constructed using Hipparcos data.The points are Hipparcos sources. This classifier separates Hipparcos sources well (0.6%error as measured by cross-validation). (b) Here the OGLE sources are plotted over the Long+12; Richards+11same decision boundaries. There is now significant class overlap (30% error rate). This is
    • 41. Decision Boundaries are Survey Specific – 31 – How do we transfer learning from one survey to the next? “Expert” ASAS (testing)OGLE+Hip (training) Fig. 8.— Long+12; Richards+11 Active learning samples on a single iteration of the algorithm. Yellow circles signify points that at least 65% of users were able to classify. These points are included
    • 42. Decision Boundaries are Survey Specific How do we transfer learning from one survey to the next? ● ● ● ● ● 0.40 ● ● ● ● Percent of Confident ASAS RF Labels ● ● 0.35 ● ●● 0.30 ● 0.25 ● ● 0.20 ● 0.15 ● 2 4 6 8 0 2 4 6 8 AL Iteration AL Iteration Long+12;eft: Percent agreement of the Random Forest classifier with the ACVS labels, Richards+11 of AL iteration. Right: Percent of ASAS data with confident RF classification
    • 43. Classification Statements are Inherently Fuzzy- classification probabilities should reflectuncertainty in the data & training- higher confidence with greater proximity to training data- calibration of classification probability vector E.g.: 20% of transients classified as supernova of type “Ib” with P=0.2 should be supernova of type “Ib”
    • 44. Classification Statements are Inherently Fuzzy- classification probabilities should reflectuncertainty in the data & training- higher confidence with greater proximity to training data- calibration of classification probability vector E.g.: 20% of transients classified as supernova of type “Ib” with P=0.2 should be supernova of type “Ib” Catalogs of Transients and Variable Stars Must Become Probabilistic
    • 45. http://bigmacc.info
    • 46. Doing Science with Probabilistic CatalogsDemographics (with little followup): trading high purity at the cost of lower efficiency e.g., using RRL to find new Galactic structureNovelty Discovery (with lots of followup): trading high efficiency for lower purity e.g., discovering new instances of rare classes
    • 47. Discovery of Bright Galactic R Coronae Borealis and DY Persei Variables: Rare Gems Mined from ASAS A. A. Miller1,⇤ , J. W. Richards1,2 , J. S. Bloom1 , S. B. Cenko1 , J. M. Silverman1 , arXiv:1204.4181v1 [astro-ph.SR] 18 Apr 2012 D. L. Starr1 , and K. G. Stassun3,4 ABSTRACT – 13 – We present the results of a machine-learning (ML) based search for new R Coronae Borealis (RCB) stars and DY Persei-like stars (DYPers) in the Galaxy using cataloged light curves obtained by the All-Sky Automated Survey (ASAS). RCB stars—a rare class of hydrogen-deficient carbon-rich supergiants—are of great interest owing to the insights they can provide on the late stages of stellar evolution. DYPers are possibly the low-temperature, low-luminosity analogs to the RCB phenomenon, though additional examples are needed to fully estab- lish this connection. While RCB stars and DYPers are traditionally identified by epochs of extreme dimming that occur without regularity, the ML search framework more fully captures the richness and diversity of their photometric behavior. We demonstrate that our ML method recovers ASAS candidates that would have been missed by traditional search methods employing hard cuts on amplitude and periodicity. Our search yields 13 candidates that we consider likely RCB stars/DYPers: new and archival spectroscopic observations confirm that four of these candidates are RCB stars and four are DYPers. Our discovery of four new DYPers increases the number of known Galactic DYPers from two to six; noteworthy is that one of the new DYPers has a measured parallax and is m ⇡ 7 mag, making it the brightest known DYPer to date. Future observations of these new DYPers should prove instrumental in establishing the RCB con-Fig. 2.— ASAS V nection. We consider these results, derived from a machine-learned probabilistic -band light curves of newly discovery RCB stars and DYPers. Note tdi↵ering magnitude ranges shown for each light curve. Spectroscopic observations confi 1 Department of Astronomy, University of California, Berkeley, CA 94720-3411, USAthe top four candidates to RCB/DY California, Berkeley, CA, bottomUSA are DYPers. 17 known Galactic be Universitystars, while the 94720-7450, four Statistics Department, RCB of Per 2 3
    • 48. Discovery of Bright Galactic R Coronae Borealis and DY Persei Variables: Rare Gems Mined from ASAS A. A. Miller1,⇤ , J. W. Richards1,2 , J. S. Bloom1 , S. B. Cenko1 , J. M. Silverman1 , arXiv:1204.4181v1 [astro-ph.SR] 18 Apr 2012 D. L. Starr1 , and K. G. Stassun3,4 ABSTRACT We present the results of a machine-learning (ML) based search for new R Coronae Borealis (RCB) stars and DY Persei-like stars (DYPers) in the Galaxy using cataloged light curves obtained by the All-Sky Automated Survey (ASAS). RCB stars—a rare class of hydrogen-deficient carbon-rich supergiants—are of great interest owing to the insights they can provide on the late stages of stellar evolution. DYPers are possibly the low-temperature, low-luminosity analogs to the RCB phenomenon, though additional examples are needed to fully estab- lish this connection. While RCB stars and DYPers are traditionally identified by epochs of extreme dimming that occur without regularity, the ML search framework more fully captures the richness and diversity of their photometric behavior. We demonstrate that our ML method recovers ASAS candidates that would have been missed by traditional search methods employing hard cuts on amplitude and periodicity. Our search yields 13 candidates that we consider likely RCB stars/DYPers: new and archival spectroscopic observations confirm that four of these candidates are RCB stars and four are DYPers. Our discovery of four new DYPers increases the number of known Galactic DYPers from two to six; noteworthy is that one of the new DYPers has a measured parallax and is m ⇡ 7 mag, making it the brightest known DYPer to date. Future observations of these new DYPers should prove instrumental in establishing the RCB con- nection. We consider these results, derived from a machine-learned probabilistic 1 Department of Astronomy, University of California, Berkeley, CA 94720-3411, USA17 known Galactic RCB/DY California, Berkeley, CA, 94720-7450, USA Statistics Department, University of Per 2 3
    • 49. Variety of Open Questions1. How do bootstrap learning from one survey tothe next, given inherent differences? “active learning” (e.g., Richards+11b)2. How do we detect and quantify real outliers? e.g. clustering, semi-supervised learning (e.g., Protopapas+06, Rebbapragada+09, Bhattacharyya+, in prep)3. How do imbue domain knowledge intoclassifiers? hybridization, metalearning4. How do we weigh classification value withcomputational cost? resource allocation
    • 50. Summaryscience maximization with synoptic surveys demands amore distant human role than beforemachine learning in time-domain astrophysics isnot just talk...it’s working and enabling novelscienceyet, real-time discovery & classification is farfrom solvedhelpful to view endeavor as a resource-limited problem
    • 51. See you tomorrow! food starting 8am talks starting 9amgroup picture before lunch

    ×