SlideShare a Scribd company logo
Computational Training & Data
Literacy for Domain Scientists
Joshua Bloom
UC Berkeley, Astronomy
@profjsb
“Training Students to Extract Value from Big Data” National Academies of Science, DC 11 April 2014
What is the toolbox
of the modern
(data-driven)
scientist?
domain
training
statistics
advanced
computing
database
GUI
parallel
visualization
Bayesian
machine learning
Physics
laboratory techniques
MCMC
MapReduce
And...How do we teach
this with what little time
the students have?
What is the toolbox
of the modern
(data-driven)
scientist?
Astronomical Data Deluge
Serious Challenge to Traditional Approaches & Toolkits
Astronomical Data Deluge
Serious Challenge to Traditional Approaches & Toolkits
Large Synoptic Survey Telescope (LSST) - 2020
! Light curves for 800M sources every 3 days
106 supernovae/yr, 105 eclipsing binaries
3.2 gigapixel camera, 20 TB/night
LOFAR & SKA
150 Gps (27 Tflops) → 20 Pps (~100 Pflops)
Gaia space astrometry mission - 2014
1 billion stars observed ∼70 times over 5 years
Will observe 20K supernovae
Many other astronomical surveys are already producing data:
SDSS, iPTF, CRTS, Pan-STARRS, Hipparcos, OGLE, ASAS,
Kepler, LINEAR, DES etc.,
strategy
scheduling
observing
reduction
finding
discovery
classification
followup
inference
Towards a Fully Automated Scientific Stack
for Transients
}
current
state-of-the-art
stack
automated (e.g. iPTF)
not (yet) automated
published work
NSF/CDI
NSF/BIGDATA
Our ML framework found the
Nearest Supernova in 3 Decades ..‣ Built & Deployed Real-
time ML framework,
discovering >10,000
events in > 10 TB of
imaging
→ 50+ journal articles
‣ Built Probabilistic
Event classification
catalogs with
innovative active
learning
http://timedomain.org https://www.nsf.gov/news/news_summ.jsp?cntn_id=122537
Data-Centric Coursework, Bootcamps,
Seminars, & Lecture Series
BDAS: Berkeley Data
Analytics Stack
[Spark, Shark, ...]
parallel
programming
bootcamp
...and entire degree programs
Data-Centric Coursework, Bootcamps,
Seminars, & Lecture Series
BDAS: Berkeley Data
Analytics Stack
[Spark, Shark, ...]
parallel
programming
bootcamp
...and entire degree programs
Taught by CS/Stats
Aimed at Engineers &
Programmers Heading
Toward Industry
2010: 85 campers 2012a: 135 campers
Python Bootcamps at Berkeley
a modern superglue computing
language for science
‣ high-level scripting language
‣ open source, huge & growing community in
academia & industry
‣ Just in time compilation but also fast numerical
computation
‣ Extensive interfaces to 3rd party frameworks
A reasonable lingua franca for scientists...
2012b: 210 campers
Python Bootcamps at Berkeley
2013a: 253 campers
‣ 3 days of live/archive streamed lectures
‣ all open material in GitHub
‣ widely disseminated (e.g., @ NASA)
‣ funded (~$18k) by the Vice Chancellor for Research
& NSF (BIGDATA)
http://pythonbootcamp.info
Part of the
Designated
Emphasis in
Computation
al Science &
Engineering
at Berkeley
visualization
machine learning
database interaction
user interface & web frameworks
timeseries & numerical
computing
interfacing to other languages
Bayesian inference & MCMC
hardware control
parallelism
64%
36%
female
male
8%
4%
8%
12%
4%
12%
8%
16%
16%
12%Psychology
Astronomy
Neuroscience
Biostatistics
Physics
Chemical Engineering
ISchool
Earth and Planetary Sciences
Industrial Engineering
Mechanical Engineering
“Parallel Image
Reconstruction from
Radio Interferometry
Data”
“Graph Theory Analysis of
Growing Graphs”
http://mb3152.github.io/Graph-Growth/
“Realtime Prediction of Activity
Behavior from Smartphone”
“Bus Arrival
Time Prediction
in Spain”
Time domain preprocessing
- Start with raw photometry!
- Gaussian process detrending!
- Calibration!
- Petigura & Marcy 2012!
!
Transit search
- Matched filter!
- Similar to BLS algorithm (Kovcas+ 2002)!
- Leverages Fast-Folding Algorithm
O(N^2) → O(N log N) (Staelin+ 1968)!
!
Data validation
- Significant peaks in periodogram, but
inconsistent with exoplanet transit
TERRA – optimized for small planets
Detrended/calibrated photometry
TERRA
RawFlux(ppt)CalibratedFlux
Erik Petigura
Berkeley Astro
Grad Student
Petigura, Howard, & Marcy (20
Prevalence of Earth-size planets orbiting Sun-like stars
Erik A. Petiguraa,b,1
, Andrew W. Howardb
, and Geoffrey W. Marcya
a
Astronomy Department, University of California, Berkeley, CA 94720; and b
Institute for Astronomy, University of Hawaii at Manoa, Honolulu, HI 96822
Contributed by Geoffrey W. Marcy, October 22, 2013 (sent for review October 18, 2013)
Determining whether Earth-like planets are common or rare looms
as a touchstone in the question of life in the universe. We searched
for Earth-size planets that cross in front of their host stars by
examining the brightness measurements of 42,000 stars from
National Aeronautics and Space Administration’s Kepler mission.
We found 603 planets, including 10 that are Earth size (1 − 2 R⊕)
and receive comparable levels of stellar energy to that of Earth
(0:25 − 4 F⊕). We account for Kepler’s imperfect detectability of
such planets by injecting synthetic planet–caused dimmings into
the Kepler brightness measurements and recording the fraction
detected. We find that 11 ± 4% of Sun-like stars harbor an Earth-
size planet receiving between one and four times the stellar inten-
sity as Earth. We also find that the occurrence of Earth-size planets is
constant with increasing orbital period (P), within equal intervals of
logP up to ∼200 d. Extrapolating, one finds 5:7+1:7
−2:2 % of Sun-like stars
harbor an Earth-size planet with orbital periods of 200–400 d.
extrasolar planets | astrobiology
The National Aeronautics and Space Administration’s (NASA’s)
Kepler mission was launched in 2009 to search for planets
that transit (cross in front of) their host stars (1–4). The resulting
dimming of the host stars is detectable by measuring their bright-
ness, and Kepler monitored the brightness of 150,000 stars every
30 min for 4 y. To date, this exoplanet survey has detected more
than 3,000 planet candidates (4).
The most easily detectable planets in the Kepler survey are
those that are relatively large and orbit close to their host stars,
especially those stars having lower intrinsic brightness fluctua-
tions (noise). These large, close-in worlds dominate the list of
known exoplanets. However, the Kepler brightness measurements
can be analyzed and debiased to reveal the diversity of planets,
We searched for transiting planets in Kepler brightness mea-
surements using our custom-built TERRA software package
described in previous works (6, 9) and in SI Appendix. In brief,
TERRA conditions Kepler photometry in the time domain, re-
moving outliers, long timescale variability (>10 d), and systematic
errors common to a large number of stars. TERRA then searches
for transit signals by evaluating the signal-to-noise ratio (SNR) of
prospective transits over a finely spaced 3D grid of orbital period,
P, time of transit, t0, and transit duration, ΔT. This grid-based
search extends over the orbital period range of 0.5–400 d.
TERRA produced a list of “threshold crossing events” (TCEs)
that meet the key criterion of a photometric dimming SNR ratio
SNR > 12. Unfortunately, an unwieldy 16,227 TCEs met this cri-
terion, many of which are inconsistent with the periodic dimming
profile from a true transiting planet. Further vetting was performed
by automatically assessing which light curves were consistent with
theoretical models of transiting planets (10). We also visually
inspected each TCE light curve, retaining only those exhibiting a
consistent, periodic, box-shaped dimming, and rejecting those
caused by single epoch outliers, correlated noise, and other data
anomalies. The vetting process was applied homogeneously to all
TCEs and is described in further detail in SI Appendix.
To assess our vetting accuracy, we evaluated the 235 Kepler
objects of interest (KOIs) among Best42k stars having P > 50 d,
which had been found by the Kepler Project and identified as planet
candidates in the official Exoplanet Archive (exoplanetarchive.
ipac.caltech.edu; accessed 19 September 2013). Among them, we
found four whose light curves are not consistent with being
planets. These four KOIs (364.01, 2,224.02, 2,311.01, and 2,474.01)
have long periods and small radii (SI Appendix). This exercise
suggests that our vetting process is robust and that careful scrutiny
of the light curves of small planets in long period orbits is useful to
identify false positives.
ASTRONOMY
Bootcamp/
Seminar Alum
Python
DOE/NERSC computation
PNAS [2014]
“Are we alone in the universe? What makes up the missing mass
of the universe? ... And maybe the biggest question of all: How in
the wide world can you add $3 billion in market capitalization
simply by adding .com to the end of a name?”
President William Jefferson Clinton
Science and Technology Policy Address
21 January 2000
“Add Data Science or Big Data to your course name to increase
enrollment by tenfold.”
Joshua Bloom
Just Now
Python for Data Science @ Berkeley [Sept 2013]
‣ Where do Bootcamps & Seminars fit into
traditional domain science curricula?
- formal coursework competes with research
obligations for graduate students
‣ Are they too vocational/practical for
higher Ed?
‣ Who should teach them & how do we
credit them?
first this... ...then this.
Undergraduate & Graduate Training Mission
Thinking Data Literacy before
Thinking Big Data Proficiency
Undergraduate & Graduate Training Mission
Thinking Data Literacy before
Thinking Big Data Proficiency
Data analysis recipes:
Fitting a model to data⇤
David W. Hogg
Center for Cosmology and Particle Physics, Department of Physics, New York University
Max-Planck-Institut f¨ur Astronomie, Heidelberg
Jo Bovy
Center for Cosmology and Particle Physics, Department of Physics, New York University
Dustin Lang
Department of Computer Science, University of Toronto
Princeton University Observatory
Abstract
We go through the many considerations involved in fitting a model
to data, using as an example the fit of a straight line to a set of points
in a two-dimensional plane. Standard weighted least-squares fitting
is only appropriate when there is a dimension along which the data
points have negligible uncertainties, and another along which all the
uncertainties can be described by Gaussians of known variance; these
34 Fitting a straight line to data
0 50 100 150 200 250 300
x
0
100
200
300
400
500
600
700
y
y = 1.33 x + 164
0 50 100 150 200 250 300
x
0
100
200
300
400
500
600
700
y
0.0
1.0
1.0
1.0
0.0
0.0
0.0
0.00.0
0.0
0.0
0.0
0.00.0
0.0
0.0
0.0
0.0
0.0
0.0
D
Fi
Da
Cen
Ma
Jo
Cen
Du
Dep
Pri
tha
dit
lin
and
arXiv:1008.4686v1 [astro-ph.IM] 27 Aug 2010
Statistical Inference
Versioning & Reproducibility
“Recently, the scientific community was shaken by reports that
a troubling proportion of peer-reviewed preclinical studies are
not reproducible.” McNutt, 2014
http://www.sciencemag.org/content/343/6168/229.summary
- Git has emerged as the de facto versioning tool
- Berkeley Common Environment (BCE) Software Stack
- “Reproducible and Collaborative Statistical Data
Science” (Statistics 157: P. Stark)
- Next up: Versioning (big) data?
Undergraduate & Graduate Training Mission
Thinking Data Literacy before
Thinking Big Data Proficiency
Fernando Pérez
IPython Creator
Fernando Pérez
IPython Creator
Julia
R
IPython notebook
is ~agnostic to the
backend
Established CS/Stats/Math in Service
of novelty in domain science
vs.
Novelty in domain science driving &
informing novelty in CS/Stats/Math
“novelty2 problem”
Extra Burden for Forefront Scientists
https://medium.com/tech-talk/dd88857f662
Berkeley Institute for Data Sciences (BIDS)
‣ Physical Space & New Entity dedicated
to the Moore/Sloan Data Science
principles
‣ Goal: rich resource and ecosystem for
domain scientists to connect &
collaborate with methodologists
http://bitly.com/bundles/fperezorg/1
“Bold new partnership launches to harness potential of data scientists and big data”
Berkeley Institute for Data Sciences
Berkeley Institute for Data Sciences
Towards an Inclusive Ecosystem
Expanding Participation Among
Underrepresented Groups
11%
56%
33%
female
male
decline to state
2013 Python
bootcamp
- 2013 AMP Camp: < 5% women at
- This Workshop: 2 women out of 22 speakers
- 2013 Python Seminar: 36% women
Summary
‣ Data Literacy before Big Data Proficiency
‣ Domain Science increasingly dependent
upon methodological competencies
‣ Higher-Ed Role of such training still TBD
• formal courses competes for time
‣ Need to create inclusive, collaborative
environments bridging domains & methodologies
“Training Students to Extract Value from Big Data” National Academies of Science, DC 11 April 2014@profjsb
@profjsb
Thank you.

More Related Content

What's hot

Solar System Processing with LSST: A Status Update
Solar System Processing with LSST: A Status UpdateSolar System Processing with LSST: A Status Update
Solar System Processing with LSST: A Status Update
Mario Juric
 
LSST Solar System Science: MOPS Status, the Science, and Your Questions
LSST Solar System Science: MOPS Status, the Science, and Your QuestionsLSST Solar System Science: MOPS Status, the Science, and Your Questions
LSST Solar System Science: MOPS Status, the Science, and Your Questions
Mario Juric
 
Cosmology Research in China
Cosmology Research in ChinaCosmology Research in China
Cosmology Research in China
EnergyCosmicLaborato
 
Insights to the Morphology of Planetary Nebulae from 3D Spectroscopy
Insights to the Morphology of Planetary Nebulae from 3D SpectroscopyInsights to the Morphology of Planetary Nebulae from 3D Spectroscopy
Insights to the Morphology of Planetary Nebulae from 3D Spectroscopy
Ashkbiz Danehkar
 
Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...
Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...
Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...
EarthCube
 
Ultra-Fast Outflows in Seyfert I AGN
Ultra-Fast Outflows in Seyfert I AGNUltra-Fast Outflows in Seyfert I AGN
Ultra-Fast Outflows in Seyfert I AGN
Ashkbiz Danehkar
 
A seven-planet resonant chain in TRAPPIST-1
A seven-planet resonant chain in TRAPPIST-1A seven-planet resonant chain in TRAPPIST-1
A seven-planet resonant chain in TRAPPIST-1
Sérgio Sacani
 
Kinematical Properties of Planetary Nebulae with WR-type Nuclei
Kinematical Properties of Planetary Nebulae with WR-type NucleiKinematical Properties of Planetary Nebulae with WR-type Nuclei
Kinematical Properties of Planetary Nebulae with WR-type Nuclei
Ashkbiz Danehkar
 
Astronomical Research in the Classroom with the Faulkes Telescope Project by ...
Astronomical Research in the Classroom with the Faulkes Telescope Project by ...Astronomical Research in the Classroom with the Faulkes Telescope Project by ...
Astronomical Research in the Classroom with the Faulkes Telescope Project by ...
GTTP-GHOU-NUCLIO
 
ASTRONOMICAL OBJECTS DETECTION IN CELESTIAL BODIES USING COMPUTER VISION ALGO...
ASTRONOMICAL OBJECTS DETECTION IN CELESTIAL BODIES USING COMPUTER VISION ALGO...ASTRONOMICAL OBJECTS DETECTION IN CELESTIAL BODIES USING COMPUTER VISION ALGO...
ASTRONOMICAL OBJECTS DETECTION IN CELESTIAL BODIES USING COMPUTER VISION ALGO...
csandit
 
Comaskey_William_Poster_SULI_FALL_2014
Comaskey_William_Poster_SULI_FALL_2014Comaskey_William_Poster_SULI_FALL_2014
Comaskey_William_Poster_SULI_FALL_2014William Comaskey
 
20131107 damasso great
20131107 damasso great20131107 damasso great
20131107 damasso greatOAVdA_APACHE
 
FDL 2017 3D Shape Modeling
FDL 2017 3D Shape ModelingFDL 2017 3D Shape Modeling
FDL 2017 3D Shape Modeling
Leonard Silverberg
 
The kepler 10_planetary_system_revisited_by_harps
The kepler 10_planetary_system_revisited_by_harpsThe kepler 10_planetary_system_revisited_by_harps
The kepler 10_planetary_system_revisited_by_harpsSérgio Sacani
 
The big data Universe. Literally.
The big data Universe. Literally.The big data Universe. Literally.
The big data Universe. Literally.
J On The Beach
 
Science Coffee - Algorithms to Monitor Telemetry for Subtle Indications of De...
Science Coffee - Algorithms to Monitor Telemetry for Subtle Indications of De...Science Coffee - Algorithms to Monitor Telemetry for Subtle Indications of De...
Science Coffee - Algorithms to Monitor Telemetry for Subtle Indications of De...
Advanced-Concepts-Team
 
A dynamically packed_planetary _system_around_gj667_c_with_three_superearths_...
A dynamically packed_planetary _system_around_gj667_c_with_three_superearths_...A dynamically packed_planetary _system_around_gj667_c_with_three_superearths_...
A dynamically packed_planetary _system_around_gj667_c_with_three_superearths_...Sérgio Sacani
 

What's hot (20)

Solar System Processing with LSST: A Status Update
Solar System Processing with LSST: A Status UpdateSolar System Processing with LSST: A Status Update
Solar System Processing with LSST: A Status Update
 
LSST Solar System Science: MOPS Status, the Science, and Your Questions
LSST Solar System Science: MOPS Status, the Science, and Your QuestionsLSST Solar System Science: MOPS Status, the Science, and Your Questions
LSST Solar System Science: MOPS Status, the Science, and Your Questions
 
Cosmology Research in China
Cosmology Research in ChinaCosmology Research in China
Cosmology Research in China
 
Insights to the Morphology of Planetary Nebulae from 3D Spectroscopy
Insights to the Morphology of Planetary Nebulae from 3D SpectroscopyInsights to the Morphology of Planetary Nebulae from 3D Spectroscopy
Insights to the Morphology of Planetary Nebulae from 3D Spectroscopy
 
Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...
Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...
Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...
 
Ultra-Fast Outflows in Seyfert I AGN
Ultra-Fast Outflows in Seyfert I AGNUltra-Fast Outflows in Seyfert I AGN
Ultra-Fast Outflows in Seyfert I AGN
 
Space Tug Rendezvous
Space Tug RendezvousSpace Tug Rendezvous
Space Tug Rendezvous
 
A seven-planet resonant chain in TRAPPIST-1
A seven-planet resonant chain in TRAPPIST-1A seven-planet resonant chain in TRAPPIST-1
A seven-planet resonant chain in TRAPPIST-1
 
Kinematical Properties of Planetary Nebulae with WR-type Nuclei
Kinematical Properties of Planetary Nebulae with WR-type NucleiKinematical Properties of Planetary Nebulae with WR-type Nuclei
Kinematical Properties of Planetary Nebulae with WR-type Nuclei
 
Astronomical Research in the Classroom with the Faulkes Telescope Project by ...
Astronomical Research in the Classroom with the Faulkes Telescope Project by ...Astronomical Research in the Classroom with the Faulkes Telescope Project by ...
Astronomical Research in the Classroom with the Faulkes Telescope Project by ...
 
ASTRONOMICAL OBJECTS DETECTION IN CELESTIAL BODIES USING COMPUTER VISION ALGO...
ASTRONOMICAL OBJECTS DETECTION IN CELESTIAL BODIES USING COMPUTER VISION ALGO...ASTRONOMICAL OBJECTS DETECTION IN CELESTIAL BODIES USING COMPUTER VISION ALGO...
ASTRONOMICAL OBJECTS DETECTION IN CELESTIAL BODIES USING COMPUTER VISION ALGO...
 
Comaskey_William_Poster_SULI_FALL_2014
Comaskey_William_Poster_SULI_FALL_2014Comaskey_William_Poster_SULI_FALL_2014
Comaskey_William_Poster_SULI_FALL_2014
 
20131107 damasso great
20131107 damasso great20131107 damasso great
20131107 damasso great
 
CV-AIHyaAAS
CV-AIHyaAASCV-AIHyaAAS
CV-AIHyaAAS
 
FDL 2017 3D Shape Modeling
FDL 2017 3D Shape ModelingFDL 2017 3D Shape Modeling
FDL 2017 3D Shape Modeling
 
The kepler 10_planetary_system_revisited_by_harps
The kepler 10_planetary_system_revisited_by_harpsThe kepler 10_planetary_system_revisited_by_harps
The kepler 10_planetary_system_revisited_by_harps
 
The big data Universe. Literally.
The big data Universe. Literally.The big data Universe. Literally.
The big data Universe. Literally.
 
AGanguly_ALuo
AGanguly_ALuoAGanguly_ALuo
AGanguly_ALuo
 
Science Coffee - Algorithms to Monitor Telemetry for Subtle Indications of De...
Science Coffee - Algorithms to Monitor Telemetry for Subtle Indications of De...Science Coffee - Algorithms to Monitor Telemetry for Subtle Indications of De...
Science Coffee - Algorithms to Monitor Telemetry for Subtle Indications of De...
 
A dynamically packed_planetary _system_around_gj667_c_with_three_superearths_...
A dynamically packed_planetary _system_around_gj667_c_with_three_superearths_...A dynamically packed_planetary _system_around_gj667_c_with_three_superearths_...
A dynamically packed_planetary _system_around_gj667_c_with_three_superearths_...
 

Viewers also liked

Joshua Bloom Data Science at Berkeley
Joshua Bloom Data Science at BerkeleyJoshua Bloom Data Science at Berkeley
Joshua Bloom Data Science at Berkeley
PyData
 
Corrall - Data literacy conceptions and pedagogies: Redefining information li...
Corrall - Data literacy conceptions and pedagogies: Redefining information li...Corrall - Data literacy conceptions and pedagogies: Redefining information li...
Corrall - Data literacy conceptions and pedagogies: Redefining information li...IL Group (CILIP Information Literacy Group)
 
Building Data Literacy Among Middle School Administrators and Teachers
Building Data Literacy Among Middle School Administrators and TeachersBuilding Data Literacy Among Middle School Administrators and Teachers
Building Data Literacy Among Middle School Administrators and Teachers
North Carolina Association for Middle Level Education
 
Introduction to Ethics of Big Data
Introduction to Ethics of Big DataIntroduction to Ethics of Big Data
Introduction to Ethics of Big Data
28 Burnside
 
Promoting Data Literacy at the Grassroots (ACRL 2015, Portland, OR)
Promoting Data Literacy at the Grassroots (ACRL 2015, Portland, OR)Promoting Data Literacy at the Grassroots (ACRL 2015, Portland, OR)
Promoting Data Literacy at the Grassroots (ACRL 2015, Portland, OR)
Adam Beauchamp
 
Ethics of Big Data
Ethics of Big DataEthics of Big Data
Ethics of Big Data
Matti Vesala
 
Service Design as Method: Library Services Developed from User's Needs
Service Design as Method: Library Services Developed from User's NeedsService Design as Method: Library Services Developed from User's Needs
Service Design as Method: Library Services Developed from User's Needs
LIBER Europe
 
HUMANITIES DATA LITERACY: STUDENT PERSPECTIVE ON DIGITAL CULTURAL HERITAGE CO...
HUMANITIES DATA LITERACY: STUDENT PERSPECTIVE ON DIGITAL CULTURAL HERITAGE CO...HUMANITIES DATA LITERACY: STUDENT PERSPECTIVE ON DIGITAL CULTURAL HERITAGE CO...
HUMANITIES DATA LITERACY: STUDENT PERSPECTIVE ON DIGITAL CULTURAL HERITAGE CO...
LIBER Europe
 
Introduction to Ethics of Big Data
Introduction to Ethics of Big DataIntroduction to Ethics of Big Data
Introduction to Ethics of Big Data28 Burnside
 

Viewers also liked (9)

Joshua Bloom Data Science at Berkeley
Joshua Bloom Data Science at BerkeleyJoshua Bloom Data Science at Berkeley
Joshua Bloom Data Science at Berkeley
 
Corrall - Data literacy conceptions and pedagogies: Redefining information li...
Corrall - Data literacy conceptions and pedagogies: Redefining information li...Corrall - Data literacy conceptions and pedagogies: Redefining information li...
Corrall - Data literacy conceptions and pedagogies: Redefining information li...
 
Building Data Literacy Among Middle School Administrators and Teachers
Building Data Literacy Among Middle School Administrators and TeachersBuilding Data Literacy Among Middle School Administrators and Teachers
Building Data Literacy Among Middle School Administrators and Teachers
 
Introduction to Ethics of Big Data
Introduction to Ethics of Big DataIntroduction to Ethics of Big Data
Introduction to Ethics of Big Data
 
Promoting Data Literacy at the Grassroots (ACRL 2015, Portland, OR)
Promoting Data Literacy at the Grassroots (ACRL 2015, Portland, OR)Promoting Data Literacy at the Grassroots (ACRL 2015, Portland, OR)
Promoting Data Literacy at the Grassroots (ACRL 2015, Portland, OR)
 
Ethics of Big Data
Ethics of Big DataEthics of Big Data
Ethics of Big Data
 
Service Design as Method: Library Services Developed from User's Needs
Service Design as Method: Library Services Developed from User's NeedsService Design as Method: Library Services Developed from User's Needs
Service Design as Method: Library Services Developed from User's Needs
 
HUMANITIES DATA LITERACY: STUDENT PERSPECTIVE ON DIGITAL CULTURAL HERITAGE CO...
HUMANITIES DATA LITERACY: STUDENT PERSPECTIVE ON DIGITAL CULTURAL HERITAGE CO...HUMANITIES DATA LITERACY: STUDENT PERSPECTIVE ON DIGITAL CULTURAL HERITAGE CO...
HUMANITIES DATA LITERACY: STUDENT PERSPECTIVE ON DIGITAL CULTURAL HERITAGE CO...
 
Introduction to Ethics of Big Data
Introduction to Ethics of Big DataIntroduction to Ethics of Big Data
Introduction to Ethics of Big Data
 

Similar to Computational Training and Data Literacy for Domain Scientists

Computational Training for Domain Scientists & Data Literacy
Computational Training for Domain Scientists & Data LiteracyComputational Training for Domain Scientists & Data Literacy
Computational Training for Domain Scientists & Data Literacy
Joshua Bloom
 
Data Science at Berkeley
Data Science at BerkeleyData Science at Berkeley
Data Science at Berkeley
Joshua Bloom
 
ExoSGAN and ExoACGAN: Exoplanet Detection using Adversarial Training Algorithms
ExoSGAN and ExoACGAN: Exoplanet Detection using Adversarial Training AlgorithmsExoSGAN and ExoACGAN: Exoplanet Detection using Adversarial Training Algorithms
ExoSGAN and ExoACGAN: Exoplanet Detection using Adversarial Training Algorithms
IRJET Journal
 
Astronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache SparkAstronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache Spark
Databricks
 
GrantProposalSethKrantzler_Fra
GrantProposalSethKrantzler_FraGrantProposalSethKrantzler_Fra
GrantProposalSethKrantzler_FraSeth Krantzler
 
The Emerging Cyberinfrastructure for Earth and Ocean Sciences
The Emerging Cyberinfrastructure for Earth and Ocean SciencesThe Emerging Cyberinfrastructure for Earth and Ocean Sciences
The Emerging Cyberinfrastructure for Earth and Ocean Sciences
Larry Smarr
 
Toward a Global Interactive Earth Observing Cyberinfrastructure
Toward a Global Interactive Earth Observing CyberinfrastructureToward a Global Interactive Earth Observing Cyberinfrastructure
Toward a Global Interactive Earth Observing Cyberinfrastructure
Larry Smarr
 
A Review on Astronomy, Space Science and Technology Development in Thailand
A Review on Astronomy, Space Science and Technology Development in ThailandA Review on Astronomy, Space Science and Technology Development in Thailand
A Review on Astronomy, Space Science and Technology Development in Thailand
ILOAHawaii
 
Identifying Exoplanets with Machine Learning Methods: A Preliminary Study
Identifying Exoplanets with Machine Learning Methods: A Preliminary StudyIdentifying Exoplanets with Machine Learning Methods: A Preliminary Study
Identifying Exoplanets with Machine Learning Methods: A Preliminary Study
IJCI JOURNAL
 
A Search for Technosignatures Around 11,680 Stars with the Green Bank Telesco...
A Search for Technosignatures Around 11,680 Stars with the Green Bank Telesco...A Search for Technosignatures Around 11,680 Stars with the Green Bank Telesco...
A Search for Technosignatures Around 11,680 Stars with the Green Bank Telesco...
Sérgio Sacani
 
Teletransportacion marcikic2003
Teletransportacion  marcikic2003Teletransportacion  marcikic2003
Teletransportacion marcikic2003
Asesor De Tesis Doctorados
 
201109021 mcguinness ska_meeting
201109021 mcguinness ska_meeting201109021 mcguinness ska_meeting
201109021 mcguinness ska_meeting
Deborah McGuinness
 
FDL 2017 Lunar Water and Volatiles
FDL 2017 Lunar Water and VolatilesFDL 2017 Lunar Water and Volatiles
FDL 2017 Lunar Water and Volatiles
Leonard Silverberg
 
The Possible Tidal Demise of Kepler’s First Planetary System
The Possible Tidal Demise of Kepler’s First Planetary SystemThe Possible Tidal Demise of Kepler’s First Planetary System
The Possible Tidal Demise of Kepler’s First Planetary System
Sérgio Sacani
 
Cyberinfrastructure to Support Ocean Observatories
Cyberinfrastructure to Support Ocean ObservatoriesCyberinfrastructure to Support Ocean Observatories
Cyberinfrastructure to Support Ocean Observatories
Larry Smarr
 
MIRI & the James Webb Space Telescope
MIRI & the James Webb Space TelescopeMIRI & the James Webb Space Telescope
Research Data Infrastructure for Geochemistry (DFG Roundtable)
Research Data Infrastructure for Geochemistry (DFG Roundtable)Research Data Infrastructure for Geochemistry (DFG Roundtable)
Research Data Infrastructure for Geochemistry (DFG Roundtable)
Kerstin Lehnert
 
Kepler 432b a_massive_warm_jupiter_in_a_52days_eccentric_orbit_transiting_a_g...
Kepler 432b a_massive_warm_jupiter_in_a_52days_eccentric_orbit_transiting_a_g...Kepler 432b a_massive_warm_jupiter_in_a_52days_eccentric_orbit_transiting_a_g...
Kepler 432b a_massive_warm_jupiter_in_a_52days_eccentric_orbit_transiting_a_g...
Sérgio Sacani
 

Similar to Computational Training and Data Literacy for Domain Scientists (20)

Computational Training for Domain Scientists & Data Literacy
Computational Training for Domain Scientists & Data LiteracyComputational Training for Domain Scientists & Data Literacy
Computational Training for Domain Scientists & Data Literacy
 
Data Science at Berkeley
Data Science at BerkeleyData Science at Berkeley
Data Science at Berkeley
 
ExoSGAN and ExoACGAN: Exoplanet Detection using Adversarial Training Algorithms
ExoSGAN and ExoACGAN: Exoplanet Detection using Adversarial Training AlgorithmsExoSGAN and ExoACGAN: Exoplanet Detection using Adversarial Training Algorithms
ExoSGAN and ExoACGAN: Exoplanet Detection using Adversarial Training Algorithms
 
Astronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache SparkAstronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache Spark
 
GrantProposalSethKrantzler_Fra
GrantProposalSethKrantzler_FraGrantProposalSethKrantzler_Fra
GrantProposalSethKrantzler_Fra
 
The Emerging Cyberinfrastructure for Earth and Ocean Sciences
The Emerging Cyberinfrastructure for Earth and Ocean SciencesThe Emerging Cyberinfrastructure for Earth and Ocean Sciences
The Emerging Cyberinfrastructure for Earth and Ocean Sciences
 
Toward a Global Interactive Earth Observing Cyberinfrastructure
Toward a Global Interactive Earth Observing CyberinfrastructureToward a Global Interactive Earth Observing Cyberinfrastructure
Toward a Global Interactive Earth Observing Cyberinfrastructure
 
A Review on Astronomy, Space Science and Technology Development in Thailand
A Review on Astronomy, Space Science and Technology Development in ThailandA Review on Astronomy, Space Science and Technology Development in Thailand
A Review on Astronomy, Space Science and Technology Development in Thailand
 
Identifying Exoplanets with Machine Learning Methods: A Preliminary Study
Identifying Exoplanets with Machine Learning Methods: A Preliminary StudyIdentifying Exoplanets with Machine Learning Methods: A Preliminary Study
Identifying Exoplanets with Machine Learning Methods: A Preliminary Study
 
A Search for Technosignatures Around 11,680 Stars with the Green Bank Telesco...
A Search for Technosignatures Around 11,680 Stars with the Green Bank Telesco...A Search for Technosignatures Around 11,680 Stars with the Green Bank Telesco...
A Search for Technosignatures Around 11,680 Stars with the Green Bank Telesco...
 
Teletransportacion marcikic2003
Teletransportacion  marcikic2003Teletransportacion  marcikic2003
Teletransportacion marcikic2003
 
201109021 mcguinness ska_meeting
201109021 mcguinness ska_meeting201109021 mcguinness ska_meeting
201109021 mcguinness ska_meeting
 
FDL 2017 Lunar Water and Volatiles
FDL 2017 Lunar Water and VolatilesFDL 2017 Lunar Water and Volatiles
FDL 2017 Lunar Water and Volatiles
 
6%2E2017-2021
6%2E2017-20216%2E2017-2021
6%2E2017-2021
 
The Possible Tidal Demise of Kepler’s First Planetary System
The Possible Tidal Demise of Kepler’s First Planetary SystemThe Possible Tidal Demise of Kepler’s First Planetary System
The Possible Tidal Demise of Kepler’s First Planetary System
 
Cyberinfrastructure to Support Ocean Observatories
Cyberinfrastructure to Support Ocean ObservatoriesCyberinfrastructure to Support Ocean Observatories
Cyberinfrastructure to Support Ocean Observatories
 
MIRI & the James Webb Space Telescope
MIRI & the James Webb Space TelescopeMIRI & the James Webb Space Telescope
MIRI & the James Webb Space Telescope
 
Research Data Infrastructure for Geochemistry (DFG Roundtable)
Research Data Infrastructure for Geochemistry (DFG Roundtable)Research Data Infrastructure for Geochemistry (DFG Roundtable)
Research Data Infrastructure for Geochemistry (DFG Roundtable)
 
Kepler 432b a_massive_warm_jupiter_in_a_52days_eccentric_orbit_transiting_a_g...
Kepler 432b a_massive_warm_jupiter_in_a_52days_eccentric_orbit_transiting_a_g...Kepler 432b a_massive_warm_jupiter_in_a_52days_eccentric_orbit_transiting_a_g...
Kepler 432b a_massive_warm_jupiter_in_a_52days_eccentric_orbit_transiting_a_g...
 
Presentation
PresentationPresentation
Presentation
 

More from Joshua Bloom

Autoencoding RNN for inference on unevenly sampled time-series data
Autoencoding RNN for inference on unevenly sampled time-series dataAutoencoding RNN for inference on unevenly sampled time-series data
Autoencoding RNN for inference on unevenly sampled time-series data
Joshua Bloom
 
Industrial Machine Learning (SIGKDD17)
Industrial Machine Learning (SIGKDD17)Industrial Machine Learning (SIGKDD17)
Industrial Machine Learning (SIGKDD17)
Joshua Bloom
 
Industrial Machine Learning (at GE)
Industrial Machine Learning (at GE)Industrial Machine Learning (at GE)
Industrial Machine Learning (at GE)
Joshua Bloom
 
PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning"
Joshua Bloom
 
Large-Scale Inference in Time Domain Astrophysics
Large-Scale Inference in Time Domain AstrophysicsLarge-Scale Inference in Time Domain Astrophysics
Large-Scale Inference in Time Domain Astrophysics
Joshua Bloom
 
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey EraJoshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom
 

More from Joshua Bloom (6)

Autoencoding RNN for inference on unevenly sampled time-series data
Autoencoding RNN for inference on unevenly sampled time-series dataAutoencoding RNN for inference on unevenly sampled time-series data
Autoencoding RNN for inference on unevenly sampled time-series data
 
Industrial Machine Learning (SIGKDD17)
Industrial Machine Learning (SIGKDD17)Industrial Machine Learning (SIGKDD17)
Industrial Machine Learning (SIGKDD17)
 
Industrial Machine Learning (at GE)
Industrial Machine Learning (at GE)Industrial Machine Learning (at GE)
Industrial Machine Learning (at GE)
 
PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning"
 
Large-Scale Inference in Time Domain Astrophysics
Large-Scale Inference in Time Domain AstrophysicsLarge-Scale Inference in Time Domain Astrophysics
Large-Scale Inference in Time Domain Astrophysics
 
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey EraJoshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
 

Recently uploaded

Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Excellence Foundation for South Sudan
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
EduSkills OECD
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
Steve Thomason
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
PedroFerreira53928
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
rosedainty
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Celine George
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 

Recently uploaded (20)

Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 

Computational Training and Data Literacy for Domain Scientists

  • 1. Computational Training & Data Literacy for Domain Scientists Joshua Bloom UC Berkeley, Astronomy @profjsb “Training Students to Extract Value from Big Data” National Academies of Science, DC 11 April 2014
  • 2. What is the toolbox of the modern (data-driven) scientist? domain training statistics advanced computing database GUI parallel visualization Bayesian machine learning Physics laboratory techniques MCMC MapReduce
  • 3. And...How do we teach this with what little time the students have? What is the toolbox of the modern (data-driven) scientist?
  • 4. Astronomical Data Deluge Serious Challenge to Traditional Approaches & Toolkits
  • 5. Astronomical Data Deluge Serious Challenge to Traditional Approaches & Toolkits Large Synoptic Survey Telescope (LSST) - 2020 ! Light curves for 800M sources every 3 days 106 supernovae/yr, 105 eclipsing binaries 3.2 gigapixel camera, 20 TB/night LOFAR & SKA 150 Gps (27 Tflops) → 20 Pps (~100 Pflops) Gaia space astrometry mission - 2014 1 billion stars observed ∼70 times over 5 years Will observe 20K supernovae Many other astronomical surveys are already producing data: SDSS, iPTF, CRTS, Pan-STARRS, Hipparcos, OGLE, ASAS, Kepler, LINEAR, DES etc.,
  • 6. strategy scheduling observing reduction finding discovery classification followup inference Towards a Fully Automated Scientific Stack for Transients } current state-of-the-art stack automated (e.g. iPTF) not (yet) automated published work NSF/CDI NSF/BIGDATA
  • 7. Our ML framework found the Nearest Supernova in 3 Decades ..‣ Built & Deployed Real- time ML framework, discovering >10,000 events in > 10 TB of imaging → 50+ journal articles ‣ Built Probabilistic Event classification catalogs with innovative active learning http://timedomain.org https://www.nsf.gov/news/news_summ.jsp?cntn_id=122537
  • 8. Data-Centric Coursework, Bootcamps, Seminars, & Lecture Series BDAS: Berkeley Data Analytics Stack [Spark, Shark, ...] parallel programming bootcamp ...and entire degree programs
  • 9. Data-Centric Coursework, Bootcamps, Seminars, & Lecture Series BDAS: Berkeley Data Analytics Stack [Spark, Shark, ...] parallel programming bootcamp ...and entire degree programs Taught by CS/Stats Aimed at Engineers & Programmers Heading Toward Industry
  • 10. 2010: 85 campers 2012a: 135 campers Python Bootcamps at Berkeley
  • 11. a modern superglue computing language for science ‣ high-level scripting language ‣ open source, huge & growing community in academia & industry ‣ Just in time compilation but also fast numerical computation ‣ Extensive interfaces to 3rd party frameworks A reasonable lingua franca for scientists...
  • 12. 2012b: 210 campers Python Bootcamps at Berkeley 2013a: 253 campers
  • 13. ‣ 3 days of live/archive streamed lectures ‣ all open material in GitHub ‣ widely disseminated (e.g., @ NASA) ‣ funded (~$18k) by the Vice Chancellor for Research & NSF (BIGDATA) http://pythonbootcamp.info
  • 14. Part of the Designated Emphasis in Computation al Science & Engineering at Berkeley visualization machine learning database interaction user interface & web frameworks timeseries & numerical computing interfacing to other languages Bayesian inference & MCMC hardware control parallelism
  • 15. 64% 36% female male 8% 4% 8% 12% 4% 12% 8% 16% 16% 12%Psychology Astronomy Neuroscience Biostatistics Physics Chemical Engineering ISchool Earth and Planetary Sciences Industrial Engineering Mechanical Engineering “Parallel Image Reconstruction from Radio Interferometry Data” “Graph Theory Analysis of Growing Graphs” http://mb3152.github.io/Graph-Growth/ “Realtime Prediction of Activity Behavior from Smartphone” “Bus Arrival Time Prediction in Spain”
  • 16. Time domain preprocessing - Start with raw photometry! - Gaussian process detrending! - Calibration! - Petigura & Marcy 2012! ! Transit search - Matched filter! - Similar to BLS algorithm (Kovcas+ 2002)! - Leverages Fast-Folding Algorithm O(N^2) → O(N log N) (Staelin+ 1968)! ! Data validation - Significant peaks in periodogram, but inconsistent with exoplanet transit TERRA – optimized for small planets Detrended/calibrated photometry TERRA RawFlux(ppt)CalibratedFlux Erik Petigura Berkeley Astro Grad Student Petigura, Howard, & Marcy (20 Prevalence of Earth-size planets orbiting Sun-like stars Erik A. Petiguraa,b,1 , Andrew W. Howardb , and Geoffrey W. Marcya a Astronomy Department, University of California, Berkeley, CA 94720; and b Institute for Astronomy, University of Hawaii at Manoa, Honolulu, HI 96822 Contributed by Geoffrey W. Marcy, October 22, 2013 (sent for review October 18, 2013) Determining whether Earth-like planets are common or rare looms as a touchstone in the question of life in the universe. We searched for Earth-size planets that cross in front of their host stars by examining the brightness measurements of 42,000 stars from National Aeronautics and Space Administration’s Kepler mission. We found 603 planets, including 10 that are Earth size (1 − 2 R⊕) and receive comparable levels of stellar energy to that of Earth (0:25 − 4 F⊕). We account for Kepler’s imperfect detectability of such planets by injecting synthetic planet–caused dimmings into the Kepler brightness measurements and recording the fraction detected. We find that 11 ± 4% of Sun-like stars harbor an Earth- size planet receiving between one and four times the stellar inten- sity as Earth. We also find that the occurrence of Earth-size planets is constant with increasing orbital period (P), within equal intervals of logP up to ∼200 d. Extrapolating, one finds 5:7+1:7 −2:2 % of Sun-like stars harbor an Earth-size planet with orbital periods of 200–400 d. extrasolar planets | astrobiology The National Aeronautics and Space Administration’s (NASA’s) Kepler mission was launched in 2009 to search for planets that transit (cross in front of) their host stars (1–4). The resulting dimming of the host stars is detectable by measuring their bright- ness, and Kepler monitored the brightness of 150,000 stars every 30 min for 4 y. To date, this exoplanet survey has detected more than 3,000 planet candidates (4). The most easily detectable planets in the Kepler survey are those that are relatively large and orbit close to their host stars, especially those stars having lower intrinsic brightness fluctua- tions (noise). These large, close-in worlds dominate the list of known exoplanets. However, the Kepler brightness measurements can be analyzed and debiased to reveal the diversity of planets, We searched for transiting planets in Kepler brightness mea- surements using our custom-built TERRA software package described in previous works (6, 9) and in SI Appendix. In brief, TERRA conditions Kepler photometry in the time domain, re- moving outliers, long timescale variability (>10 d), and systematic errors common to a large number of stars. TERRA then searches for transit signals by evaluating the signal-to-noise ratio (SNR) of prospective transits over a finely spaced 3D grid of orbital period, P, time of transit, t0, and transit duration, ΔT. This grid-based search extends over the orbital period range of 0.5–400 d. TERRA produced a list of “threshold crossing events” (TCEs) that meet the key criterion of a photometric dimming SNR ratio SNR > 12. Unfortunately, an unwieldy 16,227 TCEs met this cri- terion, many of which are inconsistent with the periodic dimming profile from a true transiting planet. Further vetting was performed by automatically assessing which light curves were consistent with theoretical models of transiting planets (10). We also visually inspected each TCE light curve, retaining only those exhibiting a consistent, periodic, box-shaped dimming, and rejecting those caused by single epoch outliers, correlated noise, and other data anomalies. The vetting process was applied homogeneously to all TCEs and is described in further detail in SI Appendix. To assess our vetting accuracy, we evaluated the 235 Kepler objects of interest (KOIs) among Best42k stars having P > 50 d, which had been found by the Kepler Project and identified as planet candidates in the official Exoplanet Archive (exoplanetarchive. ipac.caltech.edu; accessed 19 September 2013). Among them, we found four whose light curves are not consistent with being planets. These four KOIs (364.01, 2,224.02, 2,311.01, and 2,474.01) have long periods and small radii (SI Appendix). This exercise suggests that our vetting process is robust and that careful scrutiny of the light curves of small planets in long period orbits is useful to identify false positives. ASTRONOMY Bootcamp/ Seminar Alum Python DOE/NERSC computation PNAS [2014]
  • 17. “Are we alone in the universe? What makes up the missing mass of the universe? ... And maybe the biggest question of all: How in the wide world can you add $3 billion in market capitalization simply by adding .com to the end of a name?” President William Jefferson Clinton Science and Technology Policy Address 21 January 2000 “Add Data Science or Big Data to your course name to increase enrollment by tenfold.” Joshua Bloom Just Now
  • 18. Python for Data Science @ Berkeley [Sept 2013]
  • 19. ‣ Where do Bootcamps & Seminars fit into traditional domain science curricula? - formal coursework competes with research obligations for graduate students ‣ Are they too vocational/practical for higher Ed? ‣ Who should teach them & how do we credit them?
  • 20. first this... ...then this. Undergraduate & Graduate Training Mission Thinking Data Literacy before Thinking Big Data Proficiency
  • 21. Undergraduate & Graduate Training Mission Thinking Data Literacy before Thinking Big Data Proficiency Data analysis recipes: Fitting a model to data⇤ David W. Hogg Center for Cosmology and Particle Physics, Department of Physics, New York University Max-Planck-Institut f¨ur Astronomie, Heidelberg Jo Bovy Center for Cosmology and Particle Physics, Department of Physics, New York University Dustin Lang Department of Computer Science, University of Toronto Princeton University Observatory Abstract We go through the many considerations involved in fitting a model to data, using as an example the fit of a straight line to a set of points in a two-dimensional plane. Standard weighted least-squares fitting is only appropriate when there is a dimension along which the data points have negligible uncertainties, and another along which all the uncertainties can be described by Gaussians of known variance; these 34 Fitting a straight line to data 0 50 100 150 200 250 300 x 0 100 200 300 400 500 600 700 y y = 1.33 x + 164 0 50 100 150 200 250 300 x 0 100 200 300 400 500 600 700 y 0.0 1.0 1.0 1.0 0.0 0.0 0.0 0.00.0 0.0 0.0 0.0 0.00.0 0.0 0.0 0.0 0.0 0.0 0.0 D Fi Da Cen Ma Jo Cen Du Dep Pri tha dit lin and arXiv:1008.4686v1 [astro-ph.IM] 27 Aug 2010 Statistical Inference
  • 22. Versioning & Reproducibility “Recently, the scientific community was shaken by reports that a troubling proportion of peer-reviewed preclinical studies are not reproducible.” McNutt, 2014 http://www.sciencemag.org/content/343/6168/229.summary - Git has emerged as the de facto versioning tool - Berkeley Common Environment (BCE) Software Stack - “Reproducible and Collaborative Statistical Data Science” (Statistics 157: P. Stark) - Next up: Versioning (big) data? Undergraduate & Graduate Training Mission Thinking Data Literacy before Thinking Big Data Proficiency
  • 26. Established CS/Stats/Math in Service of novelty in domain science vs. Novelty in domain science driving & informing novelty in CS/Stats/Math “novelty2 problem” Extra Burden for Forefront Scientists https://medium.com/tech-talk/dd88857f662
  • 27. Berkeley Institute for Data Sciences (BIDS) ‣ Physical Space & New Entity dedicated to the Moore/Sloan Data Science principles ‣ Goal: rich resource and ecosystem for domain scientists to connect & collaborate with methodologists http://bitly.com/bundles/fperezorg/1 “Bold new partnership launches to harness potential of data scientists and big data”
  • 28. Berkeley Institute for Data Sciences
  • 29. Berkeley Institute for Data Sciences
  • 30. Towards an Inclusive Ecosystem Expanding Participation Among Underrepresented Groups 11% 56% 33% female male decline to state 2013 Python bootcamp - 2013 AMP Camp: < 5% women at - This Workshop: 2 women out of 22 speakers - 2013 Python Seminar: 36% women
  • 31. Summary ‣ Data Literacy before Big Data Proficiency ‣ Domain Science increasingly dependent upon methodological competencies ‣ Higher-Ed Role of such training still TBD • formal courses competes for time ‣ Need to create inclusive, collaborative environments bridging domains & methodologies “Training Students to Extract Value from Big Data” National Academies of Science, DC 11 April 2014@profjsb