SlideShare a Scribd company logo
Introduction to Presto on
Docker at scale
2020
Federico Palladoro
About Me
Fede Palladoro
Devops & Data Infra Lead @ Jampp
@fedepalladoro
fede@jampp.com
● Intro to Jampp data stack
● Previous Presto setup on EMR
● Migration to containers
● Orchestrators: Nomad vs Kubernetes
● Presto monitoring
Agenda
What do we do at
Jampp?
#1 #2
User
Acquisition
Find more people to install
and use an app.
App
Retargeting
Re-engage existing users.
Real time bidding (RTB)
Ad exchange
ad spot available
impression
< 70ms
Max Latency
Win!
bid
Participant 1
Participant N
.
.
.
auction
Auction Bid Impression Click
Install /
Event
Ad-Tech funnel
■ Each step decreases volume by an order of magnitude
■ The data criticality increases with each step.
■ We can sample auctions to optimize costs but under no circumstances can we lose
clicks, installs or events.
■ Each table has different access patterns and needs a different partitioning scheme.
+1000/h
Presto Queries
1,8TB-6TB
Total cluster memory
150TB
Data processed by
ELBs per day
1M/s
Auctions received
+1.7 billion
Tracked events per day
3
Presto Clusters
Some numbers
An overview of our
Data Infrastructure
Our pipeline operational unit
■ One pipeline per event type.
■ Focused on modularity and separation of concerns.
■ Having them separated allows us to optimize for cost without fear of losing critical
messages.
Internal
Data Source
ETLs and data insertion
■ Spark and Hive are very reliable for ETLs and insertion.
■ We use the Hive Metastore as the main interface between engines.
■ Airflow is an amazing tool for scheduling and orchestration.
■ Storing data on S3 allows us to decouple compute from storage
Presto is the main interface with our Data Warehouse
Through the years it became the main method of
interacting with the Data Warehouse for every team in
the company.
● Feeding our Machine Learning algorithms
● Building automatic audience segments
● Ad-Hoc queries through Apache Superset
● Templated reports through a custom UI
● Monitoring data quality
Presto
AWS EMR clusters
■ 1 ETL cluster (Spark/Hive/Tez)
■ 2 or 3 Presto clusters
■ Data stored on S3, we don’t use
HDFS
■ Each cluster is auto scalable
depending the load
■ Shared EMRFS on DynamoDB table
■ Shared Hive Metastore on RDS
The bad
● Troublesome interaction between YARN (Hive, Spark) and non YARN
apps (Presto).
● Low update frequency for fast pacing applications.
● Limited Presto support (i.e: no monitoring, no autoscaling on fleets)
The good
● Provisions out of the box many popular Big Data tools.
● Flexibility to tune applications and shape clusters as needed.
● Mainstream applications are frequently added to the app catalog, like
PrestoSQL v338!
AWS Elastic
MapReduce The ugly
● They upgraded the OS to Amazon Linux 2 without EMR version change
Getting down to business
Moving Presto to
containers
What?
Why?
● Why self-managed and Docker?
○ Lower costs (no EMR fees, no cluster overhead)
○ Quicker version upgrades
○ Local/ci environments just like prod/stg
○ Simpler configuration management
● Why PrestoSQL?
○ Community and user focused
○ Growing at a faster pace, more active contributors
○ Some known bugs already fixed (like hive bucketed tables)
○ Improved features like Cost Based Optimizer (CBO) and Security
● We decided to do two mayor changes:
○ Switch from PrestoDB to PrestoSQL
○ Take ownership of cluster provisioning and maintenance
Building our
docker image
● Based on the offical PrestoSQL image
● Dynamic configuration
○ Presto config and catalog files with templated values
○ Parameters and secrets stored on AWS SSM Parameter
store
○ Segmentio’s chamber to load parameters as env vars on
runtime
○ Unix’s envsubs to render final config files
● Additional tools like java agent for monitoring
Dynamic
configuration
Orchestrator
candidates
● The Tao of HashiCorp
● Orchestration with low complexity
● Support for non-container workloads
● Limited community - less known
● We already have it running
● Great community and tool ecosystem
● Industry-standard solution and battle tested
● High complexity, lot of internal “moving parts”
● Simple to spin-up using EKS/GKE/AKS
Presto setup on Nomad:
Infra level
■ Elastic autoscaling group
for each component
■ Consul: Service discovery
+ Distributed KV
■ Control plane with Consul
& Nomad
■ Traefik as API Gateway /
HTTP Proxy
Presto setup
on Nomad:
App level
Extra
Features
● Nomad job templating with Hashicorp Levant
○ Terraform-like workflow using a single template and a variable
file per cluster/environment
● Autoscaling:
○ Application level: Nomad native support (CPU based)
○ Cluster level: Nomad official autoscaler agent
● Graceful scale-in of Presto workers
○ Autoscaling group hooks
○ Local node script
○ Put new status on presto node state endpoint /v1/info/state
Local
testing
Kubernetes
Operators
● Custom resource that extends k8s API
● Useful to ease maintenance on staful/complex workloads (a.k.a
Day 2)
● Presto operators:
○ Falarica’s presto operator (open source, just released)
○ Starburst presto operator (official, licenced/enterprise)
Helm charts
● Reusable templates of YAML artifacts
● Reduce duplicated code on multi-cluster environments
● Useful for resource creation/deployment (a.k.a Day 1)
● Presto on Helm:
○ PrestoSQL helm chart (non-official, open source)
○ Starburst helm chart (official, licenced/enterprise)
Kubernetes
on AWS EKS
Presto on
Kubernetes
operator
Presto on
Kubernetes
operator
Monitoring stack
● We expose low level metrics with JMX java agent for Prometheus.
● Developed a custom exporter to get user level usage metrics from
/v1/query endpoint
● Prometheus stack collects mbeans attributes.
● Grafana for dashboards and custom searches.
HTTP
Scraper
Low level (JMX)
● Memory pools, Heap usage.
● Garbage collection frequency and duration.
● Cluster size and nodes status.
● Active, Pending and Blocked queries.
User level (HTTP API)
● Finished, canceled and failed queries per user.
● Normalized query analytics to detect usage patterns.
Monitoring
relevant
metrics
● Leverage CBO to improve query performance.
● Evaluate the usage of a Presto gateway to manage query
routing to multiple clusters.
● Enable autoscaling from Prometheus metrics.
● Define SLI’s and SLO’s to measure reliability.
● Evaluate Presto on k8s + AWS Fargate (serverless containers)
Next
steps
● Segment.io chamber: https://github.com/segmentio/chamber
● The Tao of Hashicorp: https://www.hashicorp.com/tao-of-hashicorp
● Nomad tutorial: https://learn.hashicorp.com/tutorials/nomad/get-started-install
● PrestoSQL helm chart: https://hub.helm.sh/charts/stable/presto/0.2.1
● Starburst helm chart: https://docs.starburstdata.com/latest/k8s/overview.html
● Falarica’s presto operator: https://github.com/falarica/steerd-presto-operator
● Starburst presto operator:
https://docs.starburstdata.com/latest/kubernetes/overview.html
● AWS EKS Architecture:
https://aws.amazon.com/quickstart/architecture/amazon-eks/
Link
references
Thanks!!
fede@jampp.com
We are hiring!!
jampp.com/jobs
@fedepalladoro

More Related Content

What's hot

Distributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola ScaleDistributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola Scale
Apache Kafka TLV
 
uReplicator: Uber Engineering’s Scalable, Robust Kafka Replicator
uReplicator: Uber Engineering’s Scalable,  Robust Kafka ReplicatoruReplicator: Uber Engineering’s Scalable,  Robust Kafka Replicator
uReplicator: Uber Engineering’s Scalable, Robust Kafka Replicator
Michael Hongliang Xu
 
Easily Build a Smart Pulsar Stream Processor_Simon Crosby
Easily Build a Smart Pulsar Stream Processor_Simon CrosbyEasily Build a Smart Pulsar Stream Processor_Simon Crosby
Easily Build a Smart Pulsar Stream Processor_Simon Crosby
StreamNative
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paas
Monal Daxini
 
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...
HostedbyConfluent
 
Data pipeline with kafka
Data pipeline with kafkaData pipeline with kafka
Data pipeline with kafka
Mole Wong
 
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Monal Daxini
 
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/SecNetflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Peter Bakas
 
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
Paul Brebner
 
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at UberDisaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
confluent
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Monal Daxini
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipeline
Monal Daxini
 
Scalable complex event processing on samza @UBER
Scalable complex event processing on samza @UBERScalable complex event processing on samza @UBER
Scalable complex event processing on samza @UBER
Shuyi Chen
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streaming
datamantra
 
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per DayHadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Ankur Bansal
 
Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014
Monal Daxini
 
From a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised LandFrom a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised Land
Ran Silberman
 
Apache Kafka Streams
Apache Kafka StreamsApache Kafka Streams
Apache Kafka Streams
Apache Kafka TLV
 
The Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data ProblemsThe Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data Problems
Monal Daxini
 
Introduction to Streaming Distributed Processing with Storm
Introduction to Streaming Distributed Processing with StormIntroduction to Streaming Distributed Processing with Storm
Introduction to Streaming Distributed Processing with Storm
Brandon O'Brien
 

What's hot (20)

Distributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola ScaleDistributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola Scale
 
uReplicator: Uber Engineering’s Scalable, Robust Kafka Replicator
uReplicator: Uber Engineering’s Scalable,  Robust Kafka ReplicatoruReplicator: Uber Engineering’s Scalable,  Robust Kafka Replicator
uReplicator: Uber Engineering’s Scalable, Robust Kafka Replicator
 
Easily Build a Smart Pulsar Stream Processor_Simon Crosby
Easily Build a Smart Pulsar Stream Processor_Simon CrosbyEasily Build a Smart Pulsar Stream Processor_Simon Crosby
Easily Build a Smart Pulsar Stream Processor_Simon Crosby
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paas
 
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...
 
Data pipeline with kafka
Data pipeline with kafkaData pipeline with kafka
Data pipeline with kafka
 
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
 
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/SecNetflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
 
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
 
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at UberDisaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipeline
 
Scalable complex event processing on samza @UBER
Scalable complex event processing on samza @UBERScalable complex event processing on samza @UBER
Scalable complex event processing on samza @UBER
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streaming
 
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per DayHadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
 
Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014
 
From a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised LandFrom a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised Land
 
Apache Kafka Streams
Apache Kafka StreamsApache Kafka Streams
Apache Kafka Streams
 
The Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data ProblemsThe Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data Problems
 
Introduction to Streaming Distributed Processing with Storm
Introduction to Streaming Distributed Processing with StormIntroduction to Streaming Distributed Processing with Storm
Introduction to Streaming Distributed Processing with Storm
 

Similar to Big data Argentina meetup 2020-09: Intro to presto on docker

Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
aspyker
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
Steven Wu
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
javier ramirez
 
Processing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the processProcessing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the process
Jampp
 
Presto
PrestoPresto
Presto
Knoldus Inc.
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
Ruslan Meshenberg
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
Omid Vahdaty
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Monal Daxini
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
Nicolas Brousse
 
Building a real-time, scalable and intelligent programmatic ad buying platform
Building a real-time, scalable and intelligent programmatic ad buying platformBuilding a real-time, scalable and intelligent programmatic ad buying platform
Building a real-time, scalable and intelligent programmatic ad buying platform
Jampp
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
Rob Skillington
 
Amazon aws big data demystified | Introduction to streaming and messaging flu...
Amazon aws big data demystified | Introduction to streaming and messaging flu...Amazon aws big data demystified | Introduction to streaming and messaging flu...
Amazon aws big data demystified | Introduction to streaming and messaging flu...
Omid Vahdaty
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
GetInData
 
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
NETWAYS
 
Public Cloud Workshop
Public Cloud WorkshopPublic Cloud Workshop
Public Cloud Workshop
Amer Ather
 
Zero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesZero Downtime JEE Architectures
Zero Downtime JEE Architectures
Alexander Penev
 
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
Amazon Web Services
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
confluent
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”
Amazon Web Services
 
Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015
aspyker
 

Similar to Big data Argentina meetup 2020-09: Intro to presto on docker (20)

Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
 
Processing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the processProcessing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the process
 
Presto
PrestoPresto
Presto
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
 
Building a real-time, scalable and intelligent programmatic ad buying platform
Building a real-time, scalable and intelligent programmatic ad buying platformBuilding a real-time, scalable and intelligent programmatic ad buying platform
Building a real-time, scalable and intelligent programmatic ad buying platform
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
 
Amazon aws big data demystified | Introduction to streaming and messaging flu...
Amazon aws big data demystified | Introduction to streaming and messaging flu...Amazon aws big data demystified | Introduction to streaming and messaging flu...
Amazon aws big data demystified | Introduction to streaming and messaging flu...
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
 
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
 
Public Cloud Workshop
Public Cloud WorkshopPublic Cloud Workshop
Public Cloud Workshop
 
Zero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesZero Downtime JEE Architectures
Zero Downtime JEE Architectures
 
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”
 
Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015
 

Recently uploaded

To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 

Recently uploaded (20)

To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 

Big data Argentina meetup 2020-09: Intro to presto on docker

  • 1. Introduction to Presto on Docker at scale 2020 Federico Palladoro
  • 2. About Me Fede Palladoro Devops & Data Infra Lead @ Jampp @fedepalladoro fede@jampp.com
  • 3. ● Intro to Jampp data stack ● Previous Presto setup on EMR ● Migration to containers ● Orchestrators: Nomad vs Kubernetes ● Presto monitoring Agenda
  • 4. What do we do at Jampp?
  • 5. #1 #2 User Acquisition Find more people to install and use an app. App Retargeting Re-engage existing users.
  • 6. Real time bidding (RTB) Ad exchange ad spot available impression < 70ms Max Latency Win! bid Participant 1 Participant N . . . auction
  • 7. Auction Bid Impression Click Install / Event Ad-Tech funnel ■ Each step decreases volume by an order of magnitude ■ The data criticality increases with each step. ■ We can sample auctions to optimize costs but under no circumstances can we lose clicks, installs or events. ■ Each table has different access patterns and needs a different partitioning scheme.
  • 8. +1000/h Presto Queries 1,8TB-6TB Total cluster memory 150TB Data processed by ELBs per day 1M/s Auctions received +1.7 billion Tracked events per day 3 Presto Clusters Some numbers
  • 9. An overview of our Data Infrastructure
  • 10. Our pipeline operational unit ■ One pipeline per event type. ■ Focused on modularity and separation of concerns. ■ Having them separated allows us to optimize for cost without fear of losing critical messages. Internal Data Source
  • 11. ETLs and data insertion ■ Spark and Hive are very reliable for ETLs and insertion. ■ We use the Hive Metastore as the main interface between engines. ■ Airflow is an amazing tool for scheduling and orchestration. ■ Storing data on S3 allows us to decouple compute from storage
  • 12. Presto is the main interface with our Data Warehouse Through the years it became the main method of interacting with the Data Warehouse for every team in the company. ● Feeding our Machine Learning algorithms ● Building automatic audience segments ● Ad-Hoc queries through Apache Superset ● Templated reports through a custom UI ● Monitoring data quality Presto
  • 13. AWS EMR clusters ■ 1 ETL cluster (Spark/Hive/Tez) ■ 2 or 3 Presto clusters ■ Data stored on S3, we don’t use HDFS ■ Each cluster is auto scalable depending the load ■ Shared EMRFS on DynamoDB table ■ Shared Hive Metastore on RDS
  • 14. The bad ● Troublesome interaction between YARN (Hive, Spark) and non YARN apps (Presto). ● Low update frequency for fast pacing applications. ● Limited Presto support (i.e: no monitoring, no autoscaling on fleets) The good ● Provisions out of the box many popular Big Data tools. ● Flexibility to tune applications and shape clusters as needed. ● Mainstream applications are frequently added to the app catalog, like PrestoSQL v338! AWS Elastic MapReduce The ugly ● They upgraded the OS to Amazon Linux 2 without EMR version change
  • 15. Getting down to business Moving Presto to containers
  • 16. What? Why? ● Why self-managed and Docker? ○ Lower costs (no EMR fees, no cluster overhead) ○ Quicker version upgrades ○ Local/ci environments just like prod/stg ○ Simpler configuration management ● Why PrestoSQL? ○ Community and user focused ○ Growing at a faster pace, more active contributors ○ Some known bugs already fixed (like hive bucketed tables) ○ Improved features like Cost Based Optimizer (CBO) and Security ● We decided to do two mayor changes: ○ Switch from PrestoDB to PrestoSQL ○ Take ownership of cluster provisioning and maintenance
  • 17. Building our docker image ● Based on the offical PrestoSQL image ● Dynamic configuration ○ Presto config and catalog files with templated values ○ Parameters and secrets stored on AWS SSM Parameter store ○ Segmentio’s chamber to load parameters as env vars on runtime ○ Unix’s envsubs to render final config files ● Additional tools like java agent for monitoring
  • 19. Orchestrator candidates ● The Tao of HashiCorp ● Orchestration with low complexity ● Support for non-container workloads ● Limited community - less known ● We already have it running ● Great community and tool ecosystem ● Industry-standard solution and battle tested ● High complexity, lot of internal “moving parts” ● Simple to spin-up using EKS/GKE/AKS
  • 20. Presto setup on Nomad: Infra level ■ Elastic autoscaling group for each component ■ Consul: Service discovery + Distributed KV ■ Control plane with Consul & Nomad ■ Traefik as API Gateway / HTTP Proxy
  • 22. Extra Features ● Nomad job templating with Hashicorp Levant ○ Terraform-like workflow using a single template and a variable file per cluster/environment ● Autoscaling: ○ Application level: Nomad native support (CPU based) ○ Cluster level: Nomad official autoscaler agent ● Graceful scale-in of Presto workers ○ Autoscaling group hooks ○ Local node script ○ Put new status on presto node state endpoint /v1/info/state
  • 24. Kubernetes Operators ● Custom resource that extends k8s API ● Useful to ease maintenance on staful/complex workloads (a.k.a Day 2) ● Presto operators: ○ Falarica’s presto operator (open source, just released) ○ Starburst presto operator (official, licenced/enterprise) Helm charts ● Reusable templates of YAML artifacts ● Reduce duplicated code on multi-cluster environments ● Useful for resource creation/deployment (a.k.a Day 1) ● Presto on Helm: ○ PrestoSQL helm chart (non-official, open source) ○ Starburst helm chart (official, licenced/enterprise)
  • 28. Monitoring stack ● We expose low level metrics with JMX java agent for Prometheus. ● Developed a custom exporter to get user level usage metrics from /v1/query endpoint ● Prometheus stack collects mbeans attributes. ● Grafana for dashboards and custom searches. HTTP Scraper
  • 29. Low level (JMX) ● Memory pools, Heap usage. ● Garbage collection frequency and duration. ● Cluster size and nodes status. ● Active, Pending and Blocked queries. User level (HTTP API) ● Finished, canceled and failed queries per user. ● Normalized query analytics to detect usage patterns. Monitoring relevant metrics
  • 30. ● Leverage CBO to improve query performance. ● Evaluate the usage of a Presto gateway to manage query routing to multiple clusters. ● Enable autoscaling from Prometheus metrics. ● Define SLI’s and SLO’s to measure reliability. ● Evaluate Presto on k8s + AWS Fargate (serverless containers) Next steps
  • 31. ● Segment.io chamber: https://github.com/segmentio/chamber ● The Tao of Hashicorp: https://www.hashicorp.com/tao-of-hashicorp ● Nomad tutorial: https://learn.hashicorp.com/tutorials/nomad/get-started-install ● PrestoSQL helm chart: https://hub.helm.sh/charts/stable/presto/0.2.1 ● Starburst helm chart: https://docs.starburstdata.com/latest/k8s/overview.html ● Falarica’s presto operator: https://github.com/falarica/steerd-presto-operator ● Starburst presto operator: https://docs.starburstdata.com/latest/kubernetes/overview.html ● AWS EKS Architecture: https://aws.amazon.com/quickstart/architecture/amazon-eks/ Link references