Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB Helps Vera C. Rubin Observatory Make the Deepest, Widest Image of the Universe | InfluxDays Virtual Experience NA 2020
Nov. 11, 2020•0 likes
0 likes
Be the first to like this
Show More
•292 views
views
Total views
0
On Slideshare
0
From embeds
0
Number of embeds
0
Download to read offline
Report
Technology
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB Helps Vera C. Rubin Observatory Make the Deepest, Widest Image of the Universe | InfluxDays Virtual Experience NA 2020
Similar to Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB Helps Vera C. Rubin Observatory Make the Deepest, Widest Image of the Universe | InfluxDays Virtual Experience NA 2020(20)
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB Helps Vera C. Rubin Observatory Make the Deepest, Widest Image of the Universe | InfluxDays Virtual Experience NA 2020
Angelo Fausti & Frossie Economou
Vera C Rubin Observatory
How InfluxDB is helping us in
our quest to make the deepest,
widest image of the universe
Space is in a
state of flux
• Comets and asteroids
vary in position
• (Super)novae, variable
stars vary in brightness
• Galaxies vary in age
• Dark energy varies in,
uh, spacetime?
maybe?
Subaru HSC colour composite of COSMOS field, NAOJ
How to understand the
changing universe in 5
[not very] easy steps
xkcd
1522
Step 2:
Build a large
but nimble
telescope
Media: Rubin Observatory
<- 8.4 meter continuous
surface primary-tertiary mirror
Step 3:
Haul everything
up a mountain
Media: Rubin Observatory
Yes there’s Internet
No you can’t count on it
Step 4:
Observe the Sky
Relentlessly
for 10 years;
Issue 10M Alerts
Every Night
Media: Rubin Observatory
• “All” sky 2x per week
• 60 seconds to produce
alerts
• 10-year images: 0.5 EB
• Final DB size: 15 PB
Legacy Survey of Space & Time (LSST)
observing cadence simulation
Step 5:
Get People
(also a data centre or three)
Write Software
Wait for 2022
Media: Rubin Observatory
And get yourself a data
centre or three…
All our own code is
💯% open source
github.com/lsst
github.com/lsst-sqre
photo: Wil O’Mullane
← ~ Oct 2019
We’ll hang
out on
#influxdays-
virtual
for more
Q&A
(@frossie
@afausti)
Over to
Angelo
How InfluxDB Helps Vera C. Rubin Observatory
Make the Deepest, Widest Image of the Universe
15
InfluxDays North America
November 2020
Frossie Economou
Technical Manager for Data Management,
Vera C. Rubin Observatory
Angelo Fausti
Software Engineer
Vera C. Rubin Observatory
Problems with our in-house solution
● A relational DB is not optimized for time series data
● Stuck with predefined dashboards and visualizations
● Limited exploratory analysis capabilities
● Our in-house development didn’t scale
● Use time more wisely: adopt an existing solution instead of
(re)inventing our own
24
Time (Years)
Adopting a TSDB, which one?
https://db-engines.com/en/ranking
25+
25
30+
log(Score)
“If it takes more than three days to get it
working it is not the right solution for you.”
Frossie Economou
26
Why InfluxDB?
● It is more than a TSDB, it is an innovative solution
● Open source software and community
● InfluxDB: efficient store for time series + InfluxQL and
Flux language
● Chronograf: postdefined visualizations
● Kapacitor: foster collaborative conversation (Slack)
27
InfluxDB schema design
FieldsTags
Results from the Data Release Production pipeline
● Measurement groups the results of the pipeline
● Timestamp is the time when the pipeline run finishes
● Tags are metadata associated to the pipeline run
● Fields are the metrics measured by the pipeline
Timestamp
28
First the Tags, then the Series
29
filter is the name of the optical filter used
at the telescope at a given time
drp,dataset=HSC,tract=509,filter=g {fields} timestamp
For each combination of tag values, there’s a new series.
A tract identifies a region in the
sky*
(*) https://pipelines.lsst.io/modules/lsst.skymap
Example of a Series
AM1: 6.42357
AM2: 6.48177
AM3: 4.62033
Time (run ID)
{field-set}i
Each point in a series contains the set of metrics measured by
the pipeline run and the results are grouped by the pipeline
name.
30
drp,dataset=HSC,tract=509,filter=g
49
US Data Facility
Urbana, IL
Project staff access
RP 10yr
TestStand
Tucson, AZ
Summit
Cerro Pachon, Chile
Restricted access
RP ~30 days
TestStand
Chilean Data Facility
La Serena, Chile
<10MB/s
raw stream
A preview of
operations
Data Aggregation in Kafka with Faust
https://kafka-aggregator.lsst.io
51
Faust agents compute summary statistics on non-
overlapping windows of N seconds.
Data Reduction factor R~10
What’s next
52
● Migration to InfluxDB 2.0
○ Conversation with InfluxData design team about Annotations in 2.0
○ Flux training for the Observatory Staff
○ Flux Tasks for downsampling and trend analysis
● Rubin Observatory Interim Data Facility on Google Cloud
● Project transition from Construction to Operations is happening
○ New opportunities for using InfluxDB
● Self-monitoring
● Scalability as we load more data, RPs, etc.
Learn more…
53
● Vera C. Rubin Observatory
● Data Processing
● Verification Framework
● Engineering and Facilities Database
● Kafka Aggregator
● Rubin Science Platform
● Rubin Technical Documentation