SlideShare a Scribd company logo
1 of 89
Download to read offline
Building A Newsfeed From
The Universe:
Data Streams In Astronomy
Maria Patterson, Phd
Data Scientist
@OpenSciPinay
Building A Newsfeed From
The Universe:
Data Streams In Astronomy
Maria Patterson, Phd
Data Scientist
@OpenSciPinay
HA venture studio that conceives,
launches, and scales enterprise cloud
companies.
A highly focused fund that invests
in best-in-class enterprise cloud
companies from our Studio and
around the world.
High Alpha Studio High Alpha Capital
Maria Patterson @OpenSciPinay
A new model for entrepreneurship
that unites company creation with
venture funding.
Maria Patterson @OpenSciPinay
Since 2015 our studio has
launched 16 companies;
we are currently
launching ~1 new
company every other
month.
High Alpha 

Studio
PRE-LAUNCH
PRE-LAUNCH PRE-LAUNCH PRE-LAUNCH
Maria Patterson @OpenSciPinay
Modern Astronomy 101
Maria Patterson @OpenSciPinay
Maria Patterson @OpenSciPinay
Maria Patterson @OpenSciPinayPicture from CC-IN2P3 in Lyon, France
Listening to the sky in all directions
The Sloan Digital Sky Survey (SDSS) is
an early pioneer of the survey
technique, collecting tens of TBs of
image data from nearly 1 billion objects.
Astronomical “Sky Surveys”
Maria Patterson @OpenSciPinayImage Credit: PS1SC/R.Ratkowski/Sloan Digital Sky Survey
Changing object detectionFaint object detection
Why survey the sky?
Maria Patterson @OpenSciPinay
PalomarSDSS
Zwicky Transient Facility
Maria Patterson @OpenSciPinay
• First light November 1, 2017
• 48” telescope at Palomar Observatory
• Image size = 235 x area of Moon
• Images entire Northern sky every 3 nights
• Images Milky Way plane twice a night
• Designed to detect transients -
supernova, gamma-ray bursts, etc, and
moving objects - comets, asteroids
Maria Patterson @OpenSciPinay
Large Synoptic Survey
Telescope
Maria Patterson @OpenSciPinay
• Under construction for full operations 2022
• 8.4 m mirror in northern Chile
• 3.2 Gigapixel camera, largest ever
• Images entire Southern sky every few nights
• 20 TB raw data / night for 10 years
• 60 PB survey end, 15 PB catalog database
• All data public and open source code!
Large Synoptic Survey
Telescope
Maria Patterson @OpenSciPinay
• Under construction for full operations 2022
• 8.4 m mirror in northern Chile
• 3.2 Gigapixel camera, largest ever
• Images entire Southern sky every few nights
• 20 TB raw data / night for 10 years
• 60 PB survey end, 15 PB catalog database
• All data public and open source code!
Astronomical alert
data streams
Maria Patterson @OpenSciPinay
Transients are detected by image “differencing”
Template
Maria Patterson @OpenSciPinay
Transients are detected by image “differencing”
New image
Maria Patterson @OpenSciPinay
Transients are detected by image “differencing”
Template
Maria Patterson @OpenSciPinay
Transients are detected by image “differencing”
New image
Maria Patterson @OpenSciPinay
Transients are detected by image “differencing”
Maria Patterson @OpenSciPinay
Difference
Transients are detected by image “differencing”
New image Template-
Maria Patterson @OpenSciPinay
Transients are detected by image “differencing”
New image Template Difference- =
Maria Patterson @OpenSciPinay
Sky surveys detect millions of objects nightly
Maria Patterson @OpenSciPinay
Sky surveys detect millions of objects nightly
Maria Patterson @OpenSciPinay
Sky surveys detect millions of objects nightly
Maria Patterson @OpenSciPinayMaria Patterson @OpenSciPinay
New bottleneck to scientific discovery
Maria Patterson @OpenSciPinay
New bottleneck to scientific discovery
Maria Patterson @OpenSciPinay
New bottleneck to scientific discovery
Maria Patterson @OpenSciPinay
New bottleneck to scientific discovery
Maria Patterson @OpenSciPinay
New bottleneck to scientific discovery
Maria Patterson @OpenSciPinay
Requirements for an astronomical alert system
Maria Patterson @OpenSciPinay
Requirements for an astronomical alert system
Maria Patterson @OpenSciPinay
(Telescopes don’t
actually stick out of the dome)
Permanent
Archive
Requirements for an astronomical alert system
Maria Patterson @OpenSciPinay
(Telescopes don’t
actually stick out of the dome)
Permanent
Archive
Requirements for an astronomical alert system
Maria Patterson @OpenSciPinay
Permanent
Archive
Requirements for an astronomical alert system
Maria Patterson @OpenSciPinay
Permanent
Archive
Requirements for an astronomical alert system
Maria Patterson @OpenSciPinay
Permanent
Archive
Requirements for an astronomical alert system
Maria Patterson @OpenSciPinay
Permanent
Archive
Requirements for an astronomical alert system
Maria Patterson @OpenSciPinay
Filter
Service
Permanent
Archive
Requirements for an astronomical alert system
Maria Patterson @OpenSciPinay
Filter
Service
Permanent
Archive
Requirements for an astronomical alert system
Maria Patterson @OpenSciPinay
Filter
Service
How should we package alert data?
Maria Patterson @OpenSciPinay
XML-based
• Measurements characterizing objects
• Verbose, redundant, and heavy
• Non-standard / non-typed fields
• Meant more for human inspection
• How do we include images?
• How can we better scale?
Traditional Format
Maria Patterson @OpenSciPinay
• Compact, as opposed to XML’s verbosity
• Fast parsing with structured messages
• Easy to characterize with simple JSON schema
• Availability of user-friendly Python modules
• avro-python3
• fastavro
• Strictly enforced schemas, but allows evolution
• Allows “postage stamp” cutout files
Data formatting: Apache Avro
Schema
Data
Maria Patterson @OpenSciPinay
Including image data “postage stamps”
Maria Patterson @OpenSciPinay
Data formatting: Apache Avro
https://github.com/ZwickyTransientFacility/ztf-avro-alert
https://github.com/lsst-dm/sample-avro-alert
How should we distribute alert data?
Maria Patterson @OpenSciPinay
Event-driven Python module
• Must be connected to get data
• Difficult to filter - uses XML’s XPath
• Not easy to sink to database
• Not scalable to LSST scale
Traditional Method
Maria Patterson @OpenSciPinay
• Scalability - many consumers, in parallel
• Feed astronomical “community brokers"
• Keep database archive in sync
• Maintain all history
• Let consumers “rewind” if disconnected
• Availability of user-friendly Python packages
• Runs in Docker, easy to dev
Data transport: Apache Kafka
Maria Patterson @OpenSciPinay
https://github.com/lsst-dm/alert_stream
How do we find objects of interest?
Maria Patterson @OpenSciPinay
Data filtering: Write your own Python
• Allows complex operations / machine
learning Python modules
• If True: write to new topic
• If False: drop
• Deployed in separate Docker containers
Maria Patterson @OpenSciPinay
Data filtering: Write your own Python
• Allows complex operations / machine
learning Python modules
• If True: write to new topic
• If False: drop
• Deployed in separate Docker containers
Maria Patterson @OpenSciPinay
Data filtering: Write your own Python
• Allows complex operations / machine
learning Python modules
• If True: write to new topic
• If False: drop
• Deployed in separate Docker containers
Maria Patterson @OpenSciPinay
Data filtering: Write your own Python
• Allows complex operations / machine
learning Python modules
• If True: write to new topic
• If False: drop
• Deployed in separate Docker containers
Maria Patterson @OpenSciPinay
Data filtering: Write your own Python
• Allows complex operations / machine
learning Python modules
• If True: write to new topic
• If False: drop
• Deployed in separate Docker containers
Maria Patterson @OpenSciPinay
Data filtering: Write your own Python
• Allows complex operations / machine
learning Python modules
• If True: write to new topic
• If False: drop
• Deployed in separate Docker containers
Maria Patterson @OpenSciPinay
Data filtering: Write your own Python
• Allows complex operations / machine
learning Python modules
• If True: write to new topic
• If False: drop
• Deployed in separate Docker containers
Maria Patterson @OpenSciPinay
Maria Patterson @OpenSciPinay
Putting it all together:
ZTF Alert Distribution System (ZADS)
IPAC
• Data processing (difference Imaging)
• Archive to databases
• Enrich with supplementary data
• Packaging to Avro format
(custom open source code)
Maria Patterson @OpenSciPinay
Putting it all together:
ZTF Alert Distribution System (ZADS)
IPAC
Maria Patterson @OpenSciPinay
Putting it all together:
ZTF Alert Distribution System (ZADS)
IPAC
• MirrorMaker - Docker
• (confluentinc/cp-kafka)
• One topic per night
Maria Patterson @OpenSciPinay
Putting it all together:
ZTF Alert Distribution System (ZADS)
IPAC
UW
• Three broker cluster
• confluentinc/cp-kafka Docker image
• confluentinc/cp-zookeeper Docker image
• MirrorMaker - Docker
• (confluentinc/cp-kafka)
• One topic per night
Maria Patterson @OpenSciPinay
Putting it all together:
ZTF Alert Distribution System (ZADS)
IPAC
UW
Maria Patterson @OpenSciPinay
Putting it all together:
ZTF Alert Distribution System (ZADS)
UW
Avro
Archive
IPAC
UW
• Individual consumers
• Custom open source code
• Open data archive
Maria Patterson @OpenSciPinay
Putting it all together:
ZTF Alert Distribution System (ZADS)
UW
Avro
Archive
IPAC
UW
Maria Patterson @OpenSciPinay
Putting it all together:
ZTF Alert Distribution System (ZADS)
UW
Avro
Archive
IPAC
UW • UW cluster 16 partitions
• Custom open source code
• Subscribe to topics per night
Maria Patterson @OpenSciPinay
Putting it all together:
ZTF Alert Distribution System (ZADS)
UW
Avro
Archive
IPAC
UW
Maria Patterson @OpenSciPinay
Putting it all together:
ZTF Alert Distribution System (ZADS)
UW
Avro
Archive
IPAC
UW
Maria Patterson @OpenSciPinay
Putting it all together:
ZTF Alert Distribution System (ZADS)
UW
Avro
Archive
IPAC
UW
Filter service
Dockerized filters
Maria Patterson @OpenSciPinay
Putting it all together:
ZTF Alert Distribution System (ZADS)
• Cloud-based service
• MirrorMaker - Docker.
(confluentinc/cp-kafka)
UW
Avro
Archive
IPAC
UW
Filter service
Dockerized filters
Maria Patterson @OpenSciPinay
Putting it all together:
ZTF Alert Distribution System (ZADS)
UW
Avro
Archive
IPAC
UW
Filter service
Dockerized filters
Maria Patterson @OpenSciPinay
Putting it all together:
ZTF Alert Distribution System (ZADS)
UW
Avro
Archive
IPAC
UW
Filter service
Dockerized filters
Maria Patterson @OpenSciPinay
Putting it all together:
ZTF Alert Distribution System (ZADS)
UW
Avro
Archive
IPAC
UW
Filter service
Dockerized filters
Maria Patterson @OpenSciPinay
Putting it all together:
ZTF Alert Distribution System (ZADS)
UW
Avro
Archive
IPAC
UW
Filter service
Dockerized filters
Maria Patterson @OpenSciPinay
Putting it all together:
ZTF Alert Distribution System (ZADS)
ZTF Alert Distribution System (ZADS)
• 600k - 1.2 million alerts per night
• Up to ~75 GB / night archived
• ~20 minutes from image taken to
available to consumers
• ~4 seconds packaging/ transfer time
• https://ztf.uw.edu/alerts/public/
Maria Patterson @OpenSciPinay
Maria Patterson @OpenSciPinay
"Design of the LSST Alert Distribution System":
https://dmtn-093.lsst.io/
• 189 CCDs, processed in parallel
• 4k x 4k, 3.2 billion.pixels per image
• End of pipeline alert generator
• 10,000 alerts every 39 seconds
Maria Patterson @OpenSciPinay
"Design of the LSST Alert Distribution System":
https://dmtn-093.lsst.io/
• 189 CCDs, processed in parallel
• 4k x 4k, 3.2 billion.pixels per image
• End of pipeline alert generator
• 10,000 alerts every 39 seconds
Alert Hub
Central node
(or cluster)
Maria Patterson @OpenSciPinay
"Design of the LSST Alert Distribution System":
https://dmtn-093.lsst.io/
• 189 CCDs, processed in parallel
• 4k x 4k, 3.2 billion.pixels per image
• End of pipeline alert generator
• 10,000 alerts every 39 seconds
Alert Hub
Central node
(or cluster)
Archive
Maria Patterson @OpenSciPinay
"Design of the LSST Alert Distribution System":
https://dmtn-093.lsst.io/
• 189 CCDs, processed in parallel
• 4k x 4k, 3.2 billion.pixels per image
• End of pipeline alert generator
• 10,000 alerts every 39 seconds
Alert Hub
Central node
(or cluster)
Archive
Maria Patterson @OpenSciPinay
"Design of the LSST Alert Distribution System":
https://dmtn-093.lsst.io/
• 189 CCDs, processed in parallel
• 4k x 4k, 3.2 billion.pixels per image
• End of pipeline alert generator
• 10,000 alerts every 39 seconds
Alert Hub
Central node
(or cluster)
Archive
Filter Cluster
Maria Patterson @OpenSciPinay
"Design of the LSST Alert Distribution System":
https://dmtn-093.lsst.io/
• 189 CCDs, processed in parallel
• 4k x 4k, 3.2 billion.pixels per image
• End of pipeline alert generator
• 10,000 alerts every 39 seconds
Alert Hub
Central node
(or cluster)
Archive
Filter
Submission
Service
User Interface
Filter Cluster
Maria Patterson @OpenSciPinay
"Design of the LSST Alert Distribution System":
https://dmtn-093.lsst.io/
• 189 CCDs, processed in parallel
• 4k x 4k, 3.2 billion.pixels per image
• End of pipeline alert generator
• 10,000 alerts every 39 seconds
Alert Hub
Central node
(or cluster)
Archive
Filter
Submission
Service
User Interface
Filter Cluster
Maria Patterson @OpenSciPinay
"Design of the LSST Alert Distribution System":
https://dmtn-093.lsst.io/
• 189 CCDs, processed in parallel
• 4k x 4k, 3.2 billion.pixels per image
• End of pipeline alert generator
• 10,000 alerts every 39 seconds
Alert Hub
Central node
(or cluster)
Archive
Filter
Submission
Service
User Interface
Filter Cluster
Maria Patterson @OpenSciPinay
"Design of the LSST Alert Distribution System":
https://dmtn-093.lsst.io/
• 189 CCDs, processed in parallel
• 4k x 4k, 3.2 billion.pixels per image
• End of pipeline alert generator
• 10,000 alerts every 39 seconds
Alert Hub
Central node
(or cluster)
Archive
Filter 1
Filter 2
Filter N
Filter
Submission
Service
User Interface
Filter Cluster
Maria Patterson @OpenSciPinay
"Design of the LSST Alert Distribution System":
https://dmtn-093.lsst.io/
• 189 CCDs, processed in parallel
• 4k x 4k, 3.2 billion.pixels per image
• End of pipeline alert generator
• 10,000 alerts every 39 seconds
Alert Hub
Central node
(or cluster)
Archive
Filter 1
Filter 2
Filter N
Filter
Submission
Service
User Interface
Filter Cluster
Maria Patterson @OpenSciPinay
"Design of the LSST Alert Distribution System":
https://dmtn-093.lsst.io/
• 189 CCDs, processed in parallel
• 4k x 4k, 3.2 billion.pixels per image
• End of pipeline alert generator
• 10,000 alerts every 39 seconds
Alert Hub
Central node
(or cluster)
Archive
Filter 1
Filter 2
Filter N
Filter
Submission
Service
User Interface
Filter Cluster
Maria Patterson @OpenSciPinay
"Design of the LSST Alert Distribution System":
https://dmtn-093.lsst.io/
• 189 CCDs, processed in parallel
• 4k x 4k, 3.2 billion.pixels per image
• End of pipeline alert generator
• 10,000 alerts every 39 seconds
Alert Hub
Central node
(or cluster)
Archive
Filter 1
Filter 2
Filter N
Filter
Submission
Service
User Interface
Filter Cluster
Maria Patterson @OpenSciPinay
"Design of the LSST Alert Distribution System":
https://dmtn-093.lsst.io/
• 189 CCDs, processed in parallel
• 4k x 4k, 3.2 billion.pixels per image
• End of pipeline alert generator
• 10,000 alerts every 39 seconds
Alert Hub
Central node
(or cluster)
Archive
Filter 1
Filter 2
Filter N
Maria Patterson @OpenSciPinay
See ztf.uw.edu for details
Maria Patterson @OpenSciPinay
For more information
• lsst.org
• Code repositories: dm.lsst.org/browse
• github.com/lsst-dm repos:
• alert_stream
• sample-avro-alert
Large Synoptic Survey Telescope
• ztf.caltech.edu
• dirac.astro.washington.edu
• github.com/ZwickyTransientFacility repos:
• ztf-avro-alert
• alert_stream
Zwicky Transient Facility
ALL THE KUDOS! to the ZTF team and LSST
DM team (esp., Eric Bellm, John Swinbank,
and Simon Krughoff)
Maria Patterson
maria@highalpha.com @OpenSciPinay
See also
• mtpatter.github.io - 3 technical notes on architecture
• github.com/mtpatter/postgres-kafka-demo

More Related Content

What's hot

Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsSlim Baltagi
 
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiApache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiSlim Baltagi
 
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin KumarSiphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumarconfluent
 
The Many Faces of Apache Kafka: Leveraging real-time data at scale
The Many Faces of Apache Kafka: Leveraging real-time data at scaleThe Many Faces of Apache Kafka: Leveraging real-time data at scale
The Many Faces of Apache Kafka: Leveraging real-time data at scaleNeha Narkhede
 
Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019
Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019 Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019
Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019 confluent
 
Maximilian Michels - Flink and Beam
Maximilian Michels - Flink and BeamMaximilian Michels - Flink and Beam
Maximilian Michels - Flink and BeamFlink Forward
 
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR BenchmarksExtending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR BenchmarksJamie Grier
 
ApacheCon 2021: Cracking the nut with Apache Pulsar (FLiP)
ApacheCon 2021:  Cracking the nut with Apache Pulsar (FLiP)ApacheCon 2021:  Cracking the nut with Apache Pulsar (FLiP)
ApacheCon 2021: Cracking the nut with Apache Pulsar (FLiP)Timothy Spann
 
Spark Streamingによるリアルタイムユーザ属性推定
Spark Streamingによるリアルタイムユーザ属性推定Spark Streamingによるリアルタイムユーザ属性推定
Spark Streamingによるリアルタイムユーザ属性推定Yoshiyasu SAEKI
 
Espresso Database Replication with Kafka, Tom Quiggle
Espresso Database Replication with Kafka, Tom QuiggleEspresso Database Replication with Kafka, Tom Quiggle
Espresso Database Replication with Kafka, Tom Quiggleconfluent
 
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...Thoughtworks
 
Continuous SQL with Apache Streaming (FLaNK and FLiP)
Continuous SQL with Apache Streaming (FLaNK and FLiP)Continuous SQL with Apache Streaming (FLaNK and FLiP)
Continuous SQL with Apache Streaming (FLaNK and FLiP)Timothy Spann
 
Stateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory SpeedStateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory SpeedJamie Grier
 
Scalable and Reliable Logging at Pinterest
Scalable and Reliable Logging at PinterestScalable and Reliable Logging at Pinterest
Scalable and Reliable Logging at PinterestKrishna Gade
 
Scala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in ScalaScala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in ScalaAlexander Dean
 
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics FrameworksOverview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics FrameworksSlim Baltagi
 
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, ConfluentCan Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, ConfluentHostedbyConfluent
 
Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16 Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16 PivotalOpenSourceHub
 
Spark Compute as a Service at Paypal with Prabhu Kasinathan
Spark Compute as a Service at Paypal with Prabhu KasinathanSpark Compute as a Service at Paypal with Prabhu Kasinathan
Spark Compute as a Service at Paypal with Prabhu KasinathanDatabricks
 

What's hot (20)

Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming Analytics
 
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiApache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
 
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin KumarSiphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
 
The Many Faces of Apache Kafka: Leveraging real-time data at scale
The Many Faces of Apache Kafka: Leveraging real-time data at scaleThe Many Faces of Apache Kafka: Leveraging real-time data at scale
The Many Faces of Apache Kafka: Leveraging real-time data at scale
 
Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019
Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019 Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019
Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019
 
Maximilian Michels - Flink and Beam
Maximilian Michels - Flink and BeamMaximilian Michels - Flink and Beam
Maximilian Michels - Flink and Beam
 
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR BenchmarksExtending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
 
ApacheCon 2021: Cracking the nut with Apache Pulsar (FLiP)
ApacheCon 2021:  Cracking the nut with Apache Pulsar (FLiP)ApacheCon 2021:  Cracking the nut with Apache Pulsar (FLiP)
ApacheCon 2021: Cracking the nut with Apache Pulsar (FLiP)
 
Spark Streamingによるリアルタイムユーザ属性推定
Spark Streamingによるリアルタイムユーザ属性推定Spark Streamingによるリアルタイムユーザ属性推定
Spark Streamingによるリアルタイムユーザ属性推定
 
Espresso Database Replication with Kafka, Tom Quiggle
Espresso Database Replication with Kafka, Tom QuiggleEspresso Database Replication with Kafka, Tom Quiggle
Espresso Database Replication with Kafka, Tom Quiggle
 
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...
 
Continuous SQL with Apache Streaming (FLaNK and FLiP)
Continuous SQL with Apache Streaming (FLaNK and FLiP)Continuous SQL with Apache Streaming (FLaNK and FLiP)
Continuous SQL with Apache Streaming (FLaNK and FLiP)
 
Stateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory SpeedStateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory Speed
 
Scalable and Reliable Logging at Pinterest
Scalable and Reliable Logging at PinterestScalable and Reliable Logging at Pinterest
Scalable and Reliable Logging at Pinterest
 
Scala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in ScalaScala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in Scala
 
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics FrameworksOverview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
 
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, ConfluentCan Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 
Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16 Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16
 
Spark Compute as a Service at Paypal with Prabhu Kasinathan
Spark Compute as a Service at Paypal with Prabhu KasinathanSpark Compute as a Service at Paypal with Prabhu Kasinathan
Spark Compute as a Service at Paypal with Prabhu Kasinathan
 

Similar to Building a newsfeed from the Universe: Data streams in astronomy (Maria Patterson, High Alpha) Kafka Summit SF 2019

Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Databricks
 
Maria Patterson - Building a community fountain around your data stream
Maria Patterson - Building a community fountain around your data streamMaria Patterson - Building a community fountain around your data stream
Maria Patterson - Building a community fountain around your data streamPyData
 
Intro to Python Data Analysis in Wakari
Intro to Python Data Analysis in WakariIntro to Python Data Analysis in Wakari
Intro to Python Data Analysis in WakariKarissa Rae McKelvey
 
Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Zekeriya Besiroglu
 
Open Source Lambda Architecture for deep learning
Open Source Lambda Architecture for deep learningOpen Source Lambda Architecture for deep learning
Open Source Lambda Architecture for deep learningPatrick Nicolas
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogC4Media
 
Python as the Zen of Data Science
Python as the Zen of Data SciencePython as the Zen of Data Science
Python as the Zen of Data ScienceTravis Oliphant
 
tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Ob...
tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Ob...tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Ob...
tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Ob...David Peyruc
 
AUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with Dask
AUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with DaskAUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with Dask
AUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with DaskVíctor Zabalza
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsGaignard Alban
 
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Databricks
 
Introduction to Singularity and Data Containers
Introduction to Singularity and Data ContainersIntroduction to Singularity and Data Containers
Introduction to Singularity and Data ContainersVanessa S
 
Colleen Murphy: Puppet and OpenStack
Colleen Murphy: Puppet and OpenStackColleen Murphy: Puppet and OpenStack
Colleen Murphy: Puppet and OpenStackPuppet
 
Thinking in Properties
Thinking in PropertiesThinking in Properties
Thinking in PropertiesSusan Potter
 
PyRate for fun and research
PyRate for fun and researchPyRate for fun and research
PyRate for fun and researchBrianna McHorse
 
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)Spark Summit
 
Leaving the Ivory Tower: Research in the Real World
Leaving the Ivory Tower: Research in the Real WorldLeaving the Ivory Tower: Research in the Real World
Leaving the Ivory Tower: Research in the Real WorldArmonDadgar
 
21st Century CPAN Testing: CPANci
21st Century CPAN Testing: CPANci21st Century CPAN Testing: CPANci
21st Century CPAN Testing: CPANciMike Friedman
 

Similar to Building a newsfeed from the Universe: Data streams in astronomy (Maria Patterson, High Alpha) Kafka Summit SF 2019 (20)

Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
 
Maria Patterson - Building a community fountain around your data stream
Maria Patterson - Building a community fountain around your data streamMaria Patterson - Building a community fountain around your data stream
Maria Patterson - Building a community fountain around your data stream
 
Intro to Python Data Analysis in Wakari
Intro to Python Data Analysis in WakariIntro to Python Data Analysis in Wakari
Intro to Python Data Analysis in Wakari
 
Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...
 
Open Source Lambda Architecture for deep learning
Open Source Lambda Architecture for deep learningOpen Source Lambda Architecture for deep learning
Open Source Lambda Architecture for deep learning
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @Datadog
 
Python as the Zen of Data Science
Python as the Zen of Data SciencePython as the Zen of Data Science
Python as the Zen of Data Science
 
tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Ob...
tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Ob...tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Ob...
tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Ob...
 
AUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with Dask
AUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with DaskAUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with Dask
AUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with Dask
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
 
Introduction to Singularity and Data Containers
Introduction to Singularity and Data ContainersIntroduction to Singularity and Data Containers
Introduction to Singularity and Data Containers
 
Config mgmt camp 2015
Config mgmt camp 2015Config mgmt camp 2015
Config mgmt camp 2015
 
Colleen Murphy: Puppet and OpenStack
Colleen Murphy: Puppet and OpenStackColleen Murphy: Puppet and OpenStack
Colleen Murphy: Puppet and OpenStack
 
Thinking in Properties
Thinking in PropertiesThinking in Properties
Thinking in Properties
 
PyRate for fun and research
PyRate for fun and researchPyRate for fun and research
PyRate for fun and research
 
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
 
Beyond the Science Gateway
Beyond the Science GatewayBeyond the Science Gateway
Beyond the Science Gateway
 
Leaving the Ivory Tower: Research in the Real World
Leaving the Ivory Tower: Research in the Real WorldLeaving the Ivory Tower: Research in the Real World
Leaving the Ivory Tower: Research in the Real World
 
21st Century CPAN Testing: CPANci
21st Century CPAN Testing: CPANci21st Century CPAN Testing: CPANci
21st Century CPAN Testing: CPANci
 

More from confluent

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 

More from confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Recently uploaded

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Recently uploaded (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

Building a newsfeed from the Universe: Data streams in astronomy (Maria Patterson, High Alpha) Kafka Summit SF 2019

  • 1. Building A Newsfeed From The Universe: Data Streams In Astronomy Maria Patterson, Phd Data Scientist @OpenSciPinay
  • 2. Building A Newsfeed From The Universe: Data Streams In Astronomy Maria Patterson, Phd Data Scientist @OpenSciPinay
  • 3. HA venture studio that conceives, launches, and scales enterprise cloud companies. A highly focused fund that invests in best-in-class enterprise cloud companies from our Studio and around the world. High Alpha Studio High Alpha Capital Maria Patterson @OpenSciPinay
  • 4. A new model for entrepreneurship that unites company creation with venture funding. Maria Patterson @OpenSciPinay
  • 5. Since 2015 our studio has launched 16 companies; we are currently launching ~1 new company every other month. High Alpha 
 Studio PRE-LAUNCH PRE-LAUNCH PRE-LAUNCH PRE-LAUNCH Maria Patterson @OpenSciPinay
  • 6. Modern Astronomy 101 Maria Patterson @OpenSciPinay
  • 8. Maria Patterson @OpenSciPinayPicture from CC-IN2P3 in Lyon, France
  • 9. Listening to the sky in all directions The Sloan Digital Sky Survey (SDSS) is an early pioneer of the survey technique, collecting tens of TBs of image data from nearly 1 billion objects. Astronomical “Sky Surveys” Maria Patterson @OpenSciPinayImage Credit: PS1SC/R.Ratkowski/Sloan Digital Sky Survey
  • 10. Changing object detectionFaint object detection Why survey the sky? Maria Patterson @OpenSciPinay PalomarSDSS
  • 11. Zwicky Transient Facility Maria Patterson @OpenSciPinay • First light November 1, 2017 • 48” telescope at Palomar Observatory • Image size = 235 x area of Moon • Images entire Northern sky every 3 nights • Images Milky Way plane twice a night • Designed to detect transients - supernova, gamma-ray bursts, etc, and moving objects - comets, asteroids
  • 13. Large Synoptic Survey Telescope Maria Patterson @OpenSciPinay • Under construction for full operations 2022 • 8.4 m mirror in northern Chile • 3.2 Gigapixel camera, largest ever • Images entire Southern sky every few nights • 20 TB raw data / night for 10 years • 60 PB survey end, 15 PB catalog database • All data public and open source code!
  • 14. Large Synoptic Survey Telescope Maria Patterson @OpenSciPinay • Under construction for full operations 2022 • 8.4 m mirror in northern Chile • 3.2 Gigapixel camera, largest ever • Images entire Southern sky every few nights • 20 TB raw data / night for 10 years • 60 PB survey end, 15 PB catalog database • All data public and open source code!
  • 15. Astronomical alert data streams Maria Patterson @OpenSciPinay
  • 16. Transients are detected by image “differencing” Template Maria Patterson @OpenSciPinay
  • 17. Transients are detected by image “differencing” New image Maria Patterson @OpenSciPinay
  • 18. Transients are detected by image “differencing” Template Maria Patterson @OpenSciPinay
  • 19. Transients are detected by image “differencing” New image Maria Patterson @OpenSciPinay
  • 20. Transients are detected by image “differencing” Maria Patterson @OpenSciPinay Difference
  • 21. Transients are detected by image “differencing” New image Template- Maria Patterson @OpenSciPinay
  • 22. Transients are detected by image “differencing” New image Template Difference- = Maria Patterson @OpenSciPinay
  • 23. Sky surveys detect millions of objects nightly Maria Patterson @OpenSciPinay
  • 24. Sky surveys detect millions of objects nightly Maria Patterson @OpenSciPinay
  • 25. Sky surveys detect millions of objects nightly Maria Patterson @OpenSciPinayMaria Patterson @OpenSciPinay
  • 26. New bottleneck to scientific discovery Maria Patterson @OpenSciPinay
  • 27. New bottleneck to scientific discovery Maria Patterson @OpenSciPinay
  • 28. New bottleneck to scientific discovery Maria Patterson @OpenSciPinay
  • 29. New bottleneck to scientific discovery Maria Patterson @OpenSciPinay
  • 30. New bottleneck to scientific discovery Maria Patterson @OpenSciPinay
  • 31. Requirements for an astronomical alert system Maria Patterson @OpenSciPinay
  • 32. Requirements for an astronomical alert system Maria Patterson @OpenSciPinay (Telescopes don’t actually stick out of the dome)
  • 33. Permanent Archive Requirements for an astronomical alert system Maria Patterson @OpenSciPinay (Telescopes don’t actually stick out of the dome)
  • 34. Permanent Archive Requirements for an astronomical alert system Maria Patterson @OpenSciPinay
  • 35. Permanent Archive Requirements for an astronomical alert system Maria Patterson @OpenSciPinay
  • 36. Permanent Archive Requirements for an astronomical alert system Maria Patterson @OpenSciPinay
  • 37. Permanent Archive Requirements for an astronomical alert system Maria Patterson @OpenSciPinay
  • 38. Permanent Archive Requirements for an astronomical alert system Maria Patterson @OpenSciPinay Filter Service
  • 39. Permanent Archive Requirements for an astronomical alert system Maria Patterson @OpenSciPinay Filter Service
  • 40. Permanent Archive Requirements for an astronomical alert system Maria Patterson @OpenSciPinay Filter Service
  • 41. How should we package alert data? Maria Patterson @OpenSciPinay
  • 42. XML-based • Measurements characterizing objects • Verbose, redundant, and heavy • Non-standard / non-typed fields • Meant more for human inspection • How do we include images? • How can we better scale? Traditional Format Maria Patterson @OpenSciPinay
  • 43. • Compact, as opposed to XML’s verbosity • Fast parsing with structured messages • Easy to characterize with simple JSON schema • Availability of user-friendly Python modules • avro-python3 • fastavro • Strictly enforced schemas, but allows evolution • Allows “postage stamp” cutout files Data formatting: Apache Avro Schema Data Maria Patterson @OpenSciPinay
  • 44. Including image data “postage stamps” Maria Patterson @OpenSciPinay Data formatting: Apache Avro https://github.com/ZwickyTransientFacility/ztf-avro-alert https://github.com/lsst-dm/sample-avro-alert
  • 45. How should we distribute alert data? Maria Patterson @OpenSciPinay
  • 46. Event-driven Python module • Must be connected to get data • Difficult to filter - uses XML’s XPath • Not easy to sink to database • Not scalable to LSST scale Traditional Method Maria Patterson @OpenSciPinay
  • 47. • Scalability - many consumers, in parallel • Feed astronomical “community brokers" • Keep database archive in sync • Maintain all history • Let consumers “rewind” if disconnected • Availability of user-friendly Python packages • Runs in Docker, easy to dev Data transport: Apache Kafka Maria Patterson @OpenSciPinay https://github.com/lsst-dm/alert_stream
  • 48. How do we find objects of interest? Maria Patterson @OpenSciPinay
  • 49. Data filtering: Write your own Python • Allows complex operations / machine learning Python modules • If True: write to new topic • If False: drop • Deployed in separate Docker containers Maria Patterson @OpenSciPinay
  • 50. Data filtering: Write your own Python • Allows complex operations / machine learning Python modules • If True: write to new topic • If False: drop • Deployed in separate Docker containers Maria Patterson @OpenSciPinay
  • 51. Data filtering: Write your own Python • Allows complex operations / machine learning Python modules • If True: write to new topic • If False: drop • Deployed in separate Docker containers Maria Patterson @OpenSciPinay
  • 52. Data filtering: Write your own Python • Allows complex operations / machine learning Python modules • If True: write to new topic • If False: drop • Deployed in separate Docker containers Maria Patterson @OpenSciPinay
  • 53. Data filtering: Write your own Python • Allows complex operations / machine learning Python modules • If True: write to new topic • If False: drop • Deployed in separate Docker containers Maria Patterson @OpenSciPinay
  • 54. Data filtering: Write your own Python • Allows complex operations / machine learning Python modules • If True: write to new topic • If False: drop • Deployed in separate Docker containers Maria Patterson @OpenSciPinay
  • 55. Data filtering: Write your own Python • Allows complex operations / machine learning Python modules • If True: write to new topic • If False: drop • Deployed in separate Docker containers Maria Patterson @OpenSciPinay
  • 56. Maria Patterson @OpenSciPinay Putting it all together: ZTF Alert Distribution System (ZADS)
  • 57. IPAC • Data processing (difference Imaging) • Archive to databases • Enrich with supplementary data • Packaging to Avro format (custom open source code) Maria Patterson @OpenSciPinay Putting it all together: ZTF Alert Distribution System (ZADS)
  • 58. IPAC Maria Patterson @OpenSciPinay Putting it all together: ZTF Alert Distribution System (ZADS)
  • 59. IPAC • MirrorMaker - Docker • (confluentinc/cp-kafka) • One topic per night Maria Patterson @OpenSciPinay Putting it all together: ZTF Alert Distribution System (ZADS)
  • 60. IPAC UW • Three broker cluster • confluentinc/cp-kafka Docker image • confluentinc/cp-zookeeper Docker image • MirrorMaker - Docker • (confluentinc/cp-kafka) • One topic per night Maria Patterson @OpenSciPinay Putting it all together: ZTF Alert Distribution System (ZADS)
  • 61. IPAC UW Maria Patterson @OpenSciPinay Putting it all together: ZTF Alert Distribution System (ZADS)
  • 62. UW Avro Archive IPAC UW • Individual consumers • Custom open source code • Open data archive Maria Patterson @OpenSciPinay Putting it all together: ZTF Alert Distribution System (ZADS)
  • 63. UW Avro Archive IPAC UW Maria Patterson @OpenSciPinay Putting it all together: ZTF Alert Distribution System (ZADS)
  • 64. UW Avro Archive IPAC UW • UW cluster 16 partitions • Custom open source code • Subscribe to topics per night Maria Patterson @OpenSciPinay Putting it all together: ZTF Alert Distribution System (ZADS)
  • 65. UW Avro Archive IPAC UW Maria Patterson @OpenSciPinay Putting it all together: ZTF Alert Distribution System (ZADS)
  • 66. UW Avro Archive IPAC UW Maria Patterson @OpenSciPinay Putting it all together: ZTF Alert Distribution System (ZADS)
  • 67. UW Avro Archive IPAC UW Filter service Dockerized filters Maria Patterson @OpenSciPinay Putting it all together: ZTF Alert Distribution System (ZADS) • Cloud-based service • MirrorMaker - Docker. (confluentinc/cp-kafka)
  • 68. UW Avro Archive IPAC UW Filter service Dockerized filters Maria Patterson @OpenSciPinay Putting it all together: ZTF Alert Distribution System (ZADS)
  • 69. UW Avro Archive IPAC UW Filter service Dockerized filters Maria Patterson @OpenSciPinay Putting it all together: ZTF Alert Distribution System (ZADS)
  • 70. UW Avro Archive IPAC UW Filter service Dockerized filters Maria Patterson @OpenSciPinay Putting it all together: ZTF Alert Distribution System (ZADS)
  • 71. UW Avro Archive IPAC UW Filter service Dockerized filters Maria Patterson @OpenSciPinay Putting it all together: ZTF Alert Distribution System (ZADS)
  • 72. UW Avro Archive IPAC UW Filter service Dockerized filters Maria Patterson @OpenSciPinay Putting it all together: ZTF Alert Distribution System (ZADS)
  • 73. ZTF Alert Distribution System (ZADS) • 600k - 1.2 million alerts per night • Up to ~75 GB / night archived • ~20 minutes from image taken to available to consumers • ~4 seconds packaging/ transfer time • https://ztf.uw.edu/alerts/public/ Maria Patterson @OpenSciPinay
  • 74. Maria Patterson @OpenSciPinay "Design of the LSST Alert Distribution System": https://dmtn-093.lsst.io/ • 189 CCDs, processed in parallel • 4k x 4k, 3.2 billion.pixels per image • End of pipeline alert generator • 10,000 alerts every 39 seconds
  • 75. Maria Patterson @OpenSciPinay "Design of the LSST Alert Distribution System": https://dmtn-093.lsst.io/ • 189 CCDs, processed in parallel • 4k x 4k, 3.2 billion.pixels per image • End of pipeline alert generator • 10,000 alerts every 39 seconds Alert Hub Central node (or cluster)
  • 76. Maria Patterson @OpenSciPinay "Design of the LSST Alert Distribution System": https://dmtn-093.lsst.io/ • 189 CCDs, processed in parallel • 4k x 4k, 3.2 billion.pixels per image • End of pipeline alert generator • 10,000 alerts every 39 seconds Alert Hub Central node (or cluster) Archive
  • 77. Maria Patterson @OpenSciPinay "Design of the LSST Alert Distribution System": https://dmtn-093.lsst.io/ • 189 CCDs, processed in parallel • 4k x 4k, 3.2 billion.pixels per image • End of pipeline alert generator • 10,000 alerts every 39 seconds Alert Hub Central node (or cluster) Archive
  • 78. Maria Patterson @OpenSciPinay "Design of the LSST Alert Distribution System": https://dmtn-093.lsst.io/ • 189 CCDs, processed in parallel • 4k x 4k, 3.2 billion.pixels per image • End of pipeline alert generator • 10,000 alerts every 39 seconds Alert Hub Central node (or cluster) Archive
  • 79. Filter Cluster Maria Patterson @OpenSciPinay "Design of the LSST Alert Distribution System": https://dmtn-093.lsst.io/ • 189 CCDs, processed in parallel • 4k x 4k, 3.2 billion.pixels per image • End of pipeline alert generator • 10,000 alerts every 39 seconds Alert Hub Central node (or cluster) Archive
  • 80. Filter Submission Service User Interface Filter Cluster Maria Patterson @OpenSciPinay "Design of the LSST Alert Distribution System": https://dmtn-093.lsst.io/ • 189 CCDs, processed in parallel • 4k x 4k, 3.2 billion.pixels per image • End of pipeline alert generator • 10,000 alerts every 39 seconds Alert Hub Central node (or cluster) Archive
  • 81. Filter Submission Service User Interface Filter Cluster Maria Patterson @OpenSciPinay "Design of the LSST Alert Distribution System": https://dmtn-093.lsst.io/ • 189 CCDs, processed in parallel • 4k x 4k, 3.2 billion.pixels per image • End of pipeline alert generator • 10,000 alerts every 39 seconds Alert Hub Central node (or cluster) Archive
  • 82. Filter Submission Service User Interface Filter Cluster Maria Patterson @OpenSciPinay "Design of the LSST Alert Distribution System": https://dmtn-093.lsst.io/ • 189 CCDs, processed in parallel • 4k x 4k, 3.2 billion.pixels per image • End of pipeline alert generator • 10,000 alerts every 39 seconds Alert Hub Central node (or cluster) Archive Filter 1 Filter 2 Filter N
  • 83. Filter Submission Service User Interface Filter Cluster Maria Patterson @OpenSciPinay "Design of the LSST Alert Distribution System": https://dmtn-093.lsst.io/ • 189 CCDs, processed in parallel • 4k x 4k, 3.2 billion.pixels per image • End of pipeline alert generator • 10,000 alerts every 39 seconds Alert Hub Central node (or cluster) Archive Filter 1 Filter 2 Filter N
  • 84. Filter Submission Service User Interface Filter Cluster Maria Patterson @OpenSciPinay "Design of the LSST Alert Distribution System": https://dmtn-093.lsst.io/ • 189 CCDs, processed in parallel • 4k x 4k, 3.2 billion.pixels per image • End of pipeline alert generator • 10,000 alerts every 39 seconds Alert Hub Central node (or cluster) Archive Filter 1 Filter 2 Filter N
  • 85. Filter Submission Service User Interface Filter Cluster Maria Patterson @OpenSciPinay "Design of the LSST Alert Distribution System": https://dmtn-093.lsst.io/ • 189 CCDs, processed in parallel • 4k x 4k, 3.2 billion.pixels per image • End of pipeline alert generator • 10,000 alerts every 39 seconds Alert Hub Central node (or cluster) Archive Filter 1 Filter 2 Filter N
  • 86. Filter Submission Service User Interface Filter Cluster Maria Patterson @OpenSciPinay "Design of the LSST Alert Distribution System": https://dmtn-093.lsst.io/ • 189 CCDs, processed in parallel • 4k x 4k, 3.2 billion.pixels per image • End of pipeline alert generator • 10,000 alerts every 39 seconds Alert Hub Central node (or cluster) Archive Filter 1 Filter 2 Filter N
  • 88. See ztf.uw.edu for details Maria Patterson @OpenSciPinay
  • 89. For more information • lsst.org • Code repositories: dm.lsst.org/browse • github.com/lsst-dm repos: • alert_stream • sample-avro-alert Large Synoptic Survey Telescope • ztf.caltech.edu • dirac.astro.washington.edu • github.com/ZwickyTransientFacility repos: • ztf-avro-alert • alert_stream Zwicky Transient Facility ALL THE KUDOS! to the ZTF team and LSST DM team (esp., Eric Bellm, John Swinbank, and Simon Krughoff) Maria Patterson maria@highalpha.com @OpenSciPinay See also • mtpatter.github.io - 3 technical notes on architecture • github.com/mtpatter/postgres-kafka-demo