SlideShare a Scribd company logo
Deploying Apache Kylin on AWS
And designing a task scheduler for it
Chase Zhang
Strikingly
Outline
Introduction
Strikingly
Analytics Service of Strikingly
Deploy Apache Kylin on AWS
Overview
Containerizing Kylin
Maintenance
Scheduler for Kylin System
Designing Goals
Basic Idea & Implementation
Tasks, Executors and Services
Concurrency and Fault Tolerance
Maintenance and Monitoring
Conclusion
Introduction
Strikingly
https://strikingly.com
https://sxl.com
Introduction
Strikingly
At Strikingly, we are devoted to provide convenient and one stop website building solution
to our customers.
Introduction
Analytics Service of Strikingly
The version 0 of our analytics service is Google Analytics
Strikingly
Google
Analytics
User Pages
User
Register / Get Track IDSet Track ID
Generate User's website
Collect Page Views Data
Serve User Query
Figure: Google Analytics
Introduction
Analytics Service of Strikingly
The version 1 of our analytics service is through Keen IO, a 3rd party service
Strikingly
User Pages
User
View Analytics
Generate User's website
Keen.IO
Serve User Query
Collect Page Views Data
Figure: Keen IO
Introduction
Analytics Service of Strikingly
The version 2 of our analytics services is combining Keen IO and Apache Kylin
Strikingly
User Pages
User
View Analytics
Generate User's website
Keen.IO
Collect Page Views Data
Apache
Kylin
Serve User Query
Figure: Keen IO + Apache Kylin
Deploy Apache Kylin on AWS
Overview
EMR ECS
7070
33345
33345Kylin
(query)
Kylin
(query)
Kylin
(query)
Kylin
(job)
Application
Load
Balancer
Hadoop Hive HBase
Hive HBase
S3
33347
Target
Group
80
Hadoop
Hadoop Hive
YARN
HBase
Query
Requests
Keen.IO
Figure: Deploy Apache Kylin on AWS
Deploy Apache Kylin on AWS
Containerizing Kylin
Hive HBase
HDFS
Apache Kylin
MapReduce / Spark
Figure: Apache Kylin is “Stateless”
Deploy Apache Kylin on AWS
Containerizing Kylin
Problem
We’d like to
▶ Deploy Kylin on multiple regions
▶ Customize behaviors with environment variables
▶ Build a single docker image and run everywhere
Deploy Apache Kylin on AWS
Containerizing Kylin
Hive HBase Hadoop YARN Kylin
Configuration Files Templates
Configuration Files
Substitute variables
Start running
Figure: Launching Kylin with customized script
Deploy Apache Kylin on AWS
Maintenance
Problem
Two problems while maintaining this system:
▶ Auto scale and dynamic ports
▶ Clean-up and back-up
Deploy Apache Kylin on AWS
Maintenance
ECS
33345
33345
xxxxx
Kylin
(query)
Kylin
(query)
Kylin
(query)
Application
Load
Balancer
33347
Target
Group
80
Query
Requests
Kylin
(query)7070Kylin
(job)
Figure: Auto Scale and Dynamic Listening Ports
Deploy Apache Kylin on AWS
Maintenance
./bin/metastore.sh backup
./bin/metastore.sh restore
./bin/metastore.sh clean
./bin/kylin.sh org.apache.kylin.tool.StorageCleanupJob
Figure: Clean-up and back-up tools
Deploy Apache Kylin on AWS
Maintenance
Solution
A customized task scheduler.
Scheduler for Kylin System
Designing Goals
▶ Customizing task scheduling
▶ Making system robust and fault tolerant
▶ Solving both previously mentioned maintenance problems
Scheduler for Kylin System
Basic Idea & Overall Design
The Systemd (Anti-UNIX) philosophy
▶ Scheduler works as a central service
▶ Other components work as RPC services
Scheduler for Kylin System
Basic Idea & Overal Design
Scheduler
Target Group
Kylin
(query)
Kylin
(query)
Kylin
(query)
Kylin
(query)
Kylin
(job)
HBase Hive
DynamoDB S3
Keen.IO
Kylin
(query)
Scheduler for Kylin System
Basic Idea & Overall Design
Implementation details:
▶ Applying FP9
and Actor Model10
ideas
▶ Implemented with Scala11
and Akka12
▶ Interact with Hadoop components through Java libraries
9
https://en.wikipedia.org/wiki/Functional_programming
10
https://en.wikipedia.org/wiki/Actor_model
11
http://scala-lang.org/
12
https://akka.io/
Scheduler for Kylin System
Basic Idea & Overall Design
Control Actor
Consistent
Hashing Router
Task Actor
Executor
Scheduler
1 2
1
22
3
3
3
1
2
3
1
Control Message
Task Message
Service
Figure: Scheduler’s Actor System
Scheduler for Kylin System
Tasks, Executors and Services
▶ Task = immutable message
▶ Task has a type for executor
▶ Executor call services to work
▶ Task categories: planning tasks, working tasks, maintaining tasks
Scheduler for Kylin System
Tasks, Executors and Services
PlanDataRefresh
PlanCubeMaintenance
HiveTableRefresh
KylinCubeBuild
KylinCubeRefresh
KylinCubeMerge
Hive Service
Kylin Service
Hourly
Daily
Need import new data?
Need build a new segment?
Need refresh old segments?
Need fill holes between segments?
Need merge segments?
Need fill holes in hive table?
Hive table has been refreshed, refresh segment
Planning Tasks Working Tasks Services
Message Storage
Service
Figure: Planning Tasks and Working Tasks
Scheduler for Kylin System
Tasks, Executors and Services
KylinMetadataBackup
KylinMetadataCleanup
KylinMetadataRestore
KylinHBaseTableCleanup
HBase Service
Kylin Service
AWS Service
S3
Apache Kylin
KYLIN_XWFQ12
kylin_metadata
kylin-metadata-backups
Update Cache
Get Cube Info
Delete Table
Read MetadataDelete Row
Write ZIP File
Read ZIP File
Write Table
Get Cube Info
Figure: Maintaining Tasks
Scheduler for Kylin System
Concurrency and Fault Tolerance
Problem
We’d like to execute tasks in order
▶ Maintaining tasks run exclusively
▶ Tasks of the same cube run execlusively
Scheduler for Kylin System
Concurrency and Fault Tolerance
Solution
Two manners to solve this problem:
▶ ReadWriteLock
▶ ConsistentHashingRouter
Scheduler for Kylin System
Concurrency and Fault Tolerance
Problem
We’d like to be fault tolerant:
1. Recovering from failures
2. Filling missed segment gaps
3. Recording history
Scheduler for Kylin System
Concurrency and Fault Tolerance
Solution
We’re taking multiple manners to solve this problem:
1. Assigning each task with a Unique ID
2. Persisting task message with progress to DynamoDB
3. Implementing planning and working tasks carefully to be issue aware
Scheduler for Kylin System
Concurrency and Fault Tolerance
ControlActor TaskActor
Executor
Consistent
HashingRouterinit running finish
error
DynamoDB DynamoDB DynamoDB
TaskMessage
Acquire Lock Release Lock
Figure: Concurrency and Message Persistent
Scheduler for Kylin System
Maintenance and Monitoring
Problem
We still have two trival problems to solve:
▶ Manually performing actions
▶ Task monitoring and error notification
Scheduler for Kylin System
Maintenance and Monitoring
How to design the user interface of scheduler?
Scheduler for Kylin System
Maintenance and Monitoring
Introducing scheduler slack bot...
Scheduler for Kylin System
Maintenance and Monitoring
Event Bus
Control Actor
Consistent
Hashing Router Task Actor Executor Service
SlackBot Actor
User Command
Figure: Scheduler Slack Bot
Scheduler for Kylin System
Maintenance and Monitoring
Figure: List task status
Scheduler for Kylin System
Maintenance and Monitoring
Figure: List Kylin Job Progress
Conclusion
▶ With Apache Kylin, we’re providing a sub-second web analytics service
▶ With little effort, we managed to deploy Apache Kylin with docker container
▶ With the scheduler, we deployed the system on AWS without losses of features
▶ We’ve made the system concurrency safe and robust
Conclusion
Version 3?
But wait, we still have a problem, don’t we?
Conclusion
Version 3?
User
Keen.IOS3
North America
S3
Tokyo, Japan
S3
Beijing, China
User
5 minutes10 minutes20 minutes
Page Views
Figure: Data Transfer Delay of Keen IO
Conclusion
Version 3?
User
S3
Tokyo, Japan
Application
Load Balancer
S3
Beijing, China
User
5 minutes
Application
Load Balancer
5 minutes
Page Views Page Views
Figure: Collecting Data with ALB?
Thank you!
BTW, we’re still hiring Data Platform
Engineer:
1. Writing Scala
2. Working on AWS
3. Working with Apache Kylin
4. Working on our “Project Manhattan”

More Related Content

What's hot

Docker for (Java) Developers
Docker for (Java) DevelopersDocker for (Java) Developers
Docker for (Java) Developers
Rafael Benevides
 
HP Advanced Technology Group: Docker and Ansible
HP Advanced Technology Group: Docker and AnsibleHP Advanced Technology Group: Docker and Ansible
HP Advanced Technology Group: Docker and Ansible
Patrick Galbraith
 
Openshift: The power of kubernetes for engineers - Riga Dev Days 18
Openshift: The power of kubernetes for engineers - Riga Dev Days 18Openshift: The power of kubernetes for engineers - Riga Dev Days 18
Openshift: The power of kubernetes for engineers - Riga Dev Days 18
Jorge Morales
 
Docker at Djangocon 2013 | Talk by Ken Cochrane
Docker at Djangocon 2013 | Talk by Ken CochraneDocker at Djangocon 2013 | Talk by Ken Cochrane
Docker at Djangocon 2013 | Talk by Ken Cochrane
dotCloud
 
Docker & JVM: A Perfect Match
Docker & JVM: A Perfect MatchDocker & JVM: A Perfect Match
Docker & JVM: A Perfect Match
Matthias Grüter
 
Continuous Integration using Docker & Jenkins
Continuous Integration using Docker & JenkinsContinuous Integration using Docker & Jenkins
Continuous Integration using Docker & Jenkins
B1 Systems GmbH
 
Docker - From Walking To Running
Docker - From Walking To RunningDocker - From Walking To Running
Docker - From Walking To Running
Giacomo Vacca
 
Docker-hanoi meetup #1: introduction about Docker
Docker-hanoi meetup #1: introduction about DockerDocker-hanoi meetup #1: introduction about Docker
Docker-hanoi meetup #1: introduction about Docker
Nguyen Anh Tu
 
Continuous Integration and Kamailio
Continuous Integration and KamailioContinuous Integration and Kamailio
Continuous Integration and Kamailio
Giacomo Vacca
 
Docker by Example - Basics
Docker by Example - Basics Docker by Example - Basics
Docker by Example - Basics
Ganesh Samarthyam
 
Introduction to Docker
Introduction to DockerIntroduction to Docker
Introduction to Docker
Luong Vo
 
Docker Registry + Basic Auth
Docker Registry + Basic AuthDocker Registry + Basic Auth
Docker Registry + Basic Auth
Remotty
 
Docker 101 - Intro to Docker
Docker 101 - Intro to DockerDocker 101 - Intro to Docker
Docker 101 - Intro to Docker
Adrian Otto
 
Python virtualenv & pip in 90 minutes
Python virtualenv & pip in 90 minutesPython virtualenv & pip in 90 minutes
Python virtualenv & pip in 90 minutes
Larry Cai
 
Docker, LinuX Container
Docker, LinuX ContainerDocker, LinuX Container
Docker, LinuX Container
Araf Karsh Hamid
 
Learn RabbitMQ with Python in 90mins
Learn RabbitMQ with Python in 90minsLearn RabbitMQ with Python in 90mins
Learn RabbitMQ with Python in 90mins
Larry Cai
 
Docker 101 2015-05-28
Docker 101 2015-05-28Docker 101 2015-05-28
Docker 101 2015-05-28
Adrian Otto
 
Enable Fig to deploy to multiple Docker servers by Willy Kuo
Enable Fig to deploy to multiple Docker servers by Willy KuoEnable Fig to deploy to multiple Docker servers by Willy Kuo
Enable Fig to deploy to multiple Docker servers by Willy Kuo
Docker, Inc.
 
Automate App Container Delivery with CI/CD and DevOps
Automate App Container Delivery with CI/CD and DevOpsAutomate App Container Delivery with CI/CD and DevOps
Automate App Container Delivery with CI/CD and DevOps
Daniel Oh
 
Perspectives on Docker
Perspectives on DockerPerspectives on Docker
Perspectives on Docker
RightScale
 

What's hot (20)

Docker for (Java) Developers
Docker for (Java) DevelopersDocker for (Java) Developers
Docker for (Java) Developers
 
HP Advanced Technology Group: Docker and Ansible
HP Advanced Technology Group: Docker and AnsibleHP Advanced Technology Group: Docker and Ansible
HP Advanced Technology Group: Docker and Ansible
 
Openshift: The power of kubernetes for engineers - Riga Dev Days 18
Openshift: The power of kubernetes for engineers - Riga Dev Days 18Openshift: The power of kubernetes for engineers - Riga Dev Days 18
Openshift: The power of kubernetes for engineers - Riga Dev Days 18
 
Docker at Djangocon 2013 | Talk by Ken Cochrane
Docker at Djangocon 2013 | Talk by Ken CochraneDocker at Djangocon 2013 | Talk by Ken Cochrane
Docker at Djangocon 2013 | Talk by Ken Cochrane
 
Docker & JVM: A Perfect Match
Docker & JVM: A Perfect MatchDocker & JVM: A Perfect Match
Docker & JVM: A Perfect Match
 
Continuous Integration using Docker & Jenkins
Continuous Integration using Docker & JenkinsContinuous Integration using Docker & Jenkins
Continuous Integration using Docker & Jenkins
 
Docker - From Walking To Running
Docker - From Walking To RunningDocker - From Walking To Running
Docker - From Walking To Running
 
Docker-hanoi meetup #1: introduction about Docker
Docker-hanoi meetup #1: introduction about DockerDocker-hanoi meetup #1: introduction about Docker
Docker-hanoi meetup #1: introduction about Docker
 
Continuous Integration and Kamailio
Continuous Integration and KamailioContinuous Integration and Kamailio
Continuous Integration and Kamailio
 
Docker by Example - Basics
Docker by Example - Basics Docker by Example - Basics
Docker by Example - Basics
 
Introduction to Docker
Introduction to DockerIntroduction to Docker
Introduction to Docker
 
Docker Registry + Basic Auth
Docker Registry + Basic AuthDocker Registry + Basic Auth
Docker Registry + Basic Auth
 
Docker 101 - Intro to Docker
Docker 101 - Intro to DockerDocker 101 - Intro to Docker
Docker 101 - Intro to Docker
 
Python virtualenv & pip in 90 minutes
Python virtualenv & pip in 90 minutesPython virtualenv & pip in 90 minutes
Python virtualenv & pip in 90 minutes
 
Docker, LinuX Container
Docker, LinuX ContainerDocker, LinuX Container
Docker, LinuX Container
 
Learn RabbitMQ with Python in 90mins
Learn RabbitMQ with Python in 90minsLearn RabbitMQ with Python in 90mins
Learn RabbitMQ with Python in 90mins
 
Docker 101 2015-05-28
Docker 101 2015-05-28Docker 101 2015-05-28
Docker 101 2015-05-28
 
Enable Fig to deploy to multiple Docker servers by Willy Kuo
Enable Fig to deploy to multiple Docker servers by Willy KuoEnable Fig to deploy to multiple Docker servers by Willy Kuo
Enable Fig to deploy to multiple Docker servers by Willy Kuo
 
Automate App Container Delivery with CI/CD and DevOps
Automate App Container Delivery with CI/CD and DevOpsAutomate App Container Delivery with CI/CD and DevOps
Automate App Container Delivery with CI/CD and DevOps
 
Perspectives on Docker
Perspectives on DockerPerspectives on Docker
Perspectives on Docker
 

Similar to Deploying Apache Kylin on AWS and designing a task scheduler for it

Sprint 70
Sprint 70Sprint 70
Sprint 70
ManageIQ
 
The MirAL Story
The MirAL StoryThe MirAL Story
The MirAL Story
Alan Griffiths
 
Ultimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on KubernetesUltimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on Kubernetes
kloia
 
Sprint 45 review
Sprint 45 reviewSprint 45 review
Sprint 45 review
ManageIQ
 
Simple stock market analysis
Simple stock market analysisSimple stock market analysis
Simple stock market analysis
lynneblue
 
CI/CD Pipeline with Kubernetes
CI/CD Pipeline with KubernetesCI/CD Pipeline with Kubernetes
CI/CD Pipeline with Kubernetes
Mukesh Singh
 
Sprint 71
Sprint 71Sprint 71
Sprint 71
ManageIQ
 
IBM Monitoring and Event Management Solutions
IBM Monitoring and Event Management SolutionsIBM Monitoring and Event Management Solutions
IBM Monitoring and Event Management Solutions
IBM Danmark
 
AKS: k8s e azure
AKS: k8s e azureAKS: k8s e azure
AKS: k8s e azure
Alessandro Melchiori
 
Cloud Composer workshop at Airflow Summit 2023.pdf
Cloud Composer workshop at Airflow Summit 2023.pdfCloud Composer workshop at Airflow Summit 2023.pdf
Cloud Composer workshop at Airflow Summit 2023.pdf
Leah Cole
 
Skill Petals - Google Associate Cloud Engineer GCP-ACE Syllabus.pdf
Skill Petals - Google Associate Cloud Engineer GCP-ACE Syllabus.pdfSkill Petals - Google Associate Cloud Engineer GCP-ACE Syllabus.pdf
Skill Petals - Google Associate Cloud Engineer GCP-ACE Syllabus.pdf
thinkcomtech
 
Krishna_Divagar_Kumaresan
Krishna_Divagar_KumaresanKrishna_Divagar_Kumaresan
Krishna_Divagar_Kumaresan
Krishna Divagar
 
Continuous Delivery of a Cloud Deployment at a Large Telecommunications Provider
Continuous Delivery of a Cloud Deployment at a Large Telecommunications ProviderContinuous Delivery of a Cloud Deployment at a Large Telecommunications Provider
Continuous Delivery of a Cloud Deployment at a Large Telecommunications Provider
M Kevin McHugh
 
Train, predict, serve: How to go into production your machine learning model
Train, predict, serve: How to go into production your machine learning modelTrain, predict, serve: How to go into production your machine learning model
Train, predict, serve: How to go into production your machine learning model
Cloudera Japan
 
Building Autonomous Operations for Kubernetes with keptn
Building Autonomous Operations for Kubernetes with keptnBuilding Autonomous Operations for Kubernetes with keptn
Building Autonomous Operations for Kubernetes with keptn
Johannes Bräuer
 
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
Lightbend
 
A fresh look at Google’s Cloud by Mandy Waite
A fresh look at Google’s Cloud by Mandy Waite A fresh look at Google’s Cloud by Mandy Waite
A fresh look at Google’s Cloud by Mandy Waite
Codemotion
 
ACDKOCHI19 - Turbocharge Developer productivity with platform build on K8S an...
ACDKOCHI19 - Turbocharge Developer productivity with platform build on K8S an...ACDKOCHI19 - Turbocharge Developer productivity with platform build on K8S an...
ACDKOCHI19 - Turbocharge Developer productivity with platform build on K8S an...
AWS User Group Kochi
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
James Anderson
 
Building and scaling a B2D service, the bootstrap way
Building and scaling a B2D service, the bootstrap wayBuilding and scaling a B2D service, the bootstrap way
Building and scaling a B2D service, the bootstrap way
Nadav Soferman
 

Similar to Deploying Apache Kylin on AWS and designing a task scheduler for it (20)

Sprint 70
Sprint 70Sprint 70
Sprint 70
 
The MirAL Story
The MirAL StoryThe MirAL Story
The MirAL Story
 
Ultimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on KubernetesUltimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on Kubernetes
 
Sprint 45 review
Sprint 45 reviewSprint 45 review
Sprint 45 review
 
Simple stock market analysis
Simple stock market analysisSimple stock market analysis
Simple stock market analysis
 
CI/CD Pipeline with Kubernetes
CI/CD Pipeline with KubernetesCI/CD Pipeline with Kubernetes
CI/CD Pipeline with Kubernetes
 
Sprint 71
Sprint 71Sprint 71
Sprint 71
 
IBM Monitoring and Event Management Solutions
IBM Monitoring and Event Management SolutionsIBM Monitoring and Event Management Solutions
IBM Monitoring and Event Management Solutions
 
AKS: k8s e azure
AKS: k8s e azureAKS: k8s e azure
AKS: k8s e azure
 
Cloud Composer workshop at Airflow Summit 2023.pdf
Cloud Composer workshop at Airflow Summit 2023.pdfCloud Composer workshop at Airflow Summit 2023.pdf
Cloud Composer workshop at Airflow Summit 2023.pdf
 
Skill Petals - Google Associate Cloud Engineer GCP-ACE Syllabus.pdf
Skill Petals - Google Associate Cloud Engineer GCP-ACE Syllabus.pdfSkill Petals - Google Associate Cloud Engineer GCP-ACE Syllabus.pdf
Skill Petals - Google Associate Cloud Engineer GCP-ACE Syllabus.pdf
 
Krishna_Divagar_Kumaresan
Krishna_Divagar_KumaresanKrishna_Divagar_Kumaresan
Krishna_Divagar_Kumaresan
 
Continuous Delivery of a Cloud Deployment at a Large Telecommunications Provider
Continuous Delivery of a Cloud Deployment at a Large Telecommunications ProviderContinuous Delivery of a Cloud Deployment at a Large Telecommunications Provider
Continuous Delivery of a Cloud Deployment at a Large Telecommunications Provider
 
Train, predict, serve: How to go into production your machine learning model
Train, predict, serve: How to go into production your machine learning modelTrain, predict, serve: How to go into production your machine learning model
Train, predict, serve: How to go into production your machine learning model
 
Building Autonomous Operations for Kubernetes with keptn
Building Autonomous Operations for Kubernetes with keptnBuilding Autonomous Operations for Kubernetes with keptn
Building Autonomous Operations for Kubernetes with keptn
 
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
 
A fresh look at Google’s Cloud by Mandy Waite
A fresh look at Google’s Cloud by Mandy Waite A fresh look at Google’s Cloud by Mandy Waite
A fresh look at Google’s Cloud by Mandy Waite
 
ACDKOCHI19 - Turbocharge Developer productivity with platform build on K8S an...
ACDKOCHI19 - Turbocharge Developer productivity with platform build on K8S an...ACDKOCHI19 - Turbocharge Developer productivity with platform build on K8S an...
ACDKOCHI19 - Turbocharge Developer productivity with platform build on K8S an...
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
Building and scaling a B2D service, the bootstrap way
Building and scaling a B2D service, the bootstrap wayBuilding and scaling a B2D service, the bootstrap way
Building and scaling a B2D service, the bootstrap way
 

More from Chase Zhang

AWS Summit: Strikingly analytics
AWS Summit:  Strikingly analyticsAWS Summit:  Strikingly analytics
AWS Summit: Strikingly analytics
Chase Zhang
 
Aws summit strikingly analytics
Aws summit   strikingly analyticsAws summit   strikingly analytics
Aws summit strikingly analytics
Chase Zhang
 
Pregel In Graphs - Models and Instances
Pregel In Graphs - Models and InstancesPregel In Graphs - Models and Instances
Pregel In Graphs - Models and Instances
Chase Zhang
 
Intro to Hadoop ecosystem and Apache Kylin
Intro to Hadoop ecosystem and Apache KylinIntro to Hadoop ecosystem and Apache Kylin
Intro to Hadoop ecosystem and Apache Kylin
Chase Zhang
 
Immutable, and More
Immutable, and MoreImmutable, and More
Immutable, and More
Chase Zhang
 
Intermediate Git
Intermediate GitIntermediate Git
Intermediate Git
Chase Zhang
 

More from Chase Zhang (6)

AWS Summit: Strikingly analytics
AWS Summit:  Strikingly analyticsAWS Summit:  Strikingly analytics
AWS Summit: Strikingly analytics
 
Aws summit strikingly analytics
Aws summit   strikingly analyticsAws summit   strikingly analytics
Aws summit strikingly analytics
 
Pregel In Graphs - Models and Instances
Pregel In Graphs - Models and InstancesPregel In Graphs - Models and Instances
Pregel In Graphs - Models and Instances
 
Intro to Hadoop ecosystem and Apache Kylin
Intro to Hadoop ecosystem and Apache KylinIntro to Hadoop ecosystem and Apache Kylin
Intro to Hadoop ecosystem and Apache Kylin
 
Immutable, and More
Immutable, and MoreImmutable, and More
Immutable, and More
 
Intermediate Git
Intermediate GitIntermediate Git
Intermediate Git
 

Recently uploaded

Q4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slideQ4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slide
mukulupadhayay1
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
uevausa
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
Rebecca Bilbro
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
osoyvvf
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
Vietnam Cotton & Spinning Association
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
Vietnam Cotton & Spinning Association
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
actyx
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
oaxefes
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
eoxhsaa
 
Senior Engineering Sample EM DOE - Sheet1.pdf
Senior Engineering Sample EM DOE  - Sheet1.pdfSenior Engineering Sample EM DOE  - Sheet1.pdf
Senior Engineering Sample EM DOE - Sheet1.pdf
Vineet
 
SAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content DocumentSAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content Document
newdirectionconsulta
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
TeukuEriSyahputra
 
Salesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - CanariasSalesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - Canarias
davidpietrzykowski1
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
aguty
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
eudsoh
 
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service LucknowCall Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
hiju9823
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
Timothy Spann
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
nhutnguyen355078
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 

Recently uploaded (20)

Q4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slideQ4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slide
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
 
Senior Engineering Sample EM DOE - Sheet1.pdf
Senior Engineering Sample EM DOE  - Sheet1.pdfSenior Engineering Sample EM DOE  - Sheet1.pdf
Senior Engineering Sample EM DOE - Sheet1.pdf
 
SAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content DocumentSAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content Document
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
 
Salesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - CanariasSalesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - Canarias
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
 
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service LucknowCall Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 

Deploying Apache Kylin on AWS and designing a task scheduler for it

  • 1. Deploying Apache Kylin on AWS And designing a task scheduler for it Chase Zhang Strikingly
  • 2. Outline Introduction Strikingly Analytics Service of Strikingly Deploy Apache Kylin on AWS Overview Containerizing Kylin Maintenance Scheduler for Kylin System Designing Goals Basic Idea & Implementation Tasks, Executors and Services Concurrency and Fault Tolerance Maintenance and Monitoring Conclusion
  • 4. Introduction Strikingly At Strikingly, we are devoted to provide convenient and one stop website building solution to our customers.
  • 5. Introduction Analytics Service of Strikingly The version 0 of our analytics service is Google Analytics Strikingly Google Analytics User Pages User Register / Get Track IDSet Track ID Generate User's website Collect Page Views Data Serve User Query Figure: Google Analytics
  • 6. Introduction Analytics Service of Strikingly The version 1 of our analytics service is through Keen IO, a 3rd party service Strikingly User Pages User View Analytics Generate User's website Keen.IO Serve User Query Collect Page Views Data Figure: Keen IO
  • 7. Introduction Analytics Service of Strikingly The version 2 of our analytics services is combining Keen IO and Apache Kylin Strikingly User Pages User View Analytics Generate User's website Keen.IO Collect Page Views Data Apache Kylin Serve User Query Figure: Keen IO + Apache Kylin
  • 8. Deploy Apache Kylin on AWS Overview EMR ECS 7070 33345 33345Kylin (query) Kylin (query) Kylin (query) Kylin (job) Application Load Balancer Hadoop Hive HBase Hive HBase S3 33347 Target Group 80 Hadoop Hadoop Hive YARN HBase Query Requests Keen.IO Figure: Deploy Apache Kylin on AWS
  • 9. Deploy Apache Kylin on AWS Containerizing Kylin Hive HBase HDFS Apache Kylin MapReduce / Spark Figure: Apache Kylin is “Stateless”
  • 10. Deploy Apache Kylin on AWS Containerizing Kylin Problem We’d like to ▶ Deploy Kylin on multiple regions ▶ Customize behaviors with environment variables ▶ Build a single docker image and run everywhere
  • 11. Deploy Apache Kylin on AWS Containerizing Kylin Hive HBase Hadoop YARN Kylin Configuration Files Templates Configuration Files Substitute variables Start running Figure: Launching Kylin with customized script
  • 12. Deploy Apache Kylin on AWS Maintenance Problem Two problems while maintaining this system: ▶ Auto scale and dynamic ports ▶ Clean-up and back-up
  • 13. Deploy Apache Kylin on AWS Maintenance ECS 33345 33345 xxxxx Kylin (query) Kylin (query) Kylin (query) Application Load Balancer 33347 Target Group 80 Query Requests Kylin (query)7070Kylin (job) Figure: Auto Scale and Dynamic Listening Ports
  • 14. Deploy Apache Kylin on AWS Maintenance ./bin/metastore.sh backup ./bin/metastore.sh restore ./bin/metastore.sh clean ./bin/kylin.sh org.apache.kylin.tool.StorageCleanupJob Figure: Clean-up and back-up tools
  • 15. Deploy Apache Kylin on AWS Maintenance Solution A customized task scheduler.
  • 16. Scheduler for Kylin System Designing Goals ▶ Customizing task scheduling ▶ Making system robust and fault tolerant ▶ Solving both previously mentioned maintenance problems
  • 17. Scheduler for Kylin System Basic Idea & Overall Design The Systemd (Anti-UNIX) philosophy ▶ Scheduler works as a central service ▶ Other components work as RPC services
  • 18. Scheduler for Kylin System Basic Idea & Overal Design Scheduler Target Group Kylin (query) Kylin (query) Kylin (query) Kylin (query) Kylin (job) HBase Hive DynamoDB S3 Keen.IO Kylin (query)
  • 19. Scheduler for Kylin System Basic Idea & Overall Design Implementation details: ▶ Applying FP9 and Actor Model10 ideas ▶ Implemented with Scala11 and Akka12 ▶ Interact with Hadoop components through Java libraries 9 https://en.wikipedia.org/wiki/Functional_programming 10 https://en.wikipedia.org/wiki/Actor_model 11 http://scala-lang.org/ 12 https://akka.io/
  • 20. Scheduler for Kylin System Basic Idea & Overall Design Control Actor Consistent Hashing Router Task Actor Executor Scheduler 1 2 1 22 3 3 3 1 2 3 1 Control Message Task Message Service Figure: Scheduler’s Actor System
  • 21. Scheduler for Kylin System Tasks, Executors and Services ▶ Task = immutable message ▶ Task has a type for executor ▶ Executor call services to work ▶ Task categories: planning tasks, working tasks, maintaining tasks
  • 22. Scheduler for Kylin System Tasks, Executors and Services PlanDataRefresh PlanCubeMaintenance HiveTableRefresh KylinCubeBuild KylinCubeRefresh KylinCubeMerge Hive Service Kylin Service Hourly Daily Need import new data? Need build a new segment? Need refresh old segments? Need fill holes between segments? Need merge segments? Need fill holes in hive table? Hive table has been refreshed, refresh segment Planning Tasks Working Tasks Services Message Storage Service Figure: Planning Tasks and Working Tasks
  • 23. Scheduler for Kylin System Tasks, Executors and Services KylinMetadataBackup KylinMetadataCleanup KylinMetadataRestore KylinHBaseTableCleanup HBase Service Kylin Service AWS Service S3 Apache Kylin KYLIN_XWFQ12 kylin_metadata kylin-metadata-backups Update Cache Get Cube Info Delete Table Read MetadataDelete Row Write ZIP File Read ZIP File Write Table Get Cube Info Figure: Maintaining Tasks
  • 24. Scheduler for Kylin System Concurrency and Fault Tolerance Problem We’d like to execute tasks in order ▶ Maintaining tasks run exclusively ▶ Tasks of the same cube run execlusively
  • 25. Scheduler for Kylin System Concurrency and Fault Tolerance Solution Two manners to solve this problem: ▶ ReadWriteLock ▶ ConsistentHashingRouter
  • 26. Scheduler for Kylin System Concurrency and Fault Tolerance Problem We’d like to be fault tolerant: 1. Recovering from failures 2. Filling missed segment gaps 3. Recording history
  • 27. Scheduler for Kylin System Concurrency and Fault Tolerance Solution We’re taking multiple manners to solve this problem: 1. Assigning each task with a Unique ID 2. Persisting task message with progress to DynamoDB 3. Implementing planning and working tasks carefully to be issue aware
  • 28. Scheduler for Kylin System Concurrency and Fault Tolerance ControlActor TaskActor Executor Consistent HashingRouterinit running finish error DynamoDB DynamoDB DynamoDB TaskMessage Acquire Lock Release Lock Figure: Concurrency and Message Persistent
  • 29. Scheduler for Kylin System Maintenance and Monitoring Problem We still have two trival problems to solve: ▶ Manually performing actions ▶ Task monitoring and error notification
  • 30. Scheduler for Kylin System Maintenance and Monitoring How to design the user interface of scheduler?
  • 31. Scheduler for Kylin System Maintenance and Monitoring Introducing scheduler slack bot...
  • 32. Scheduler for Kylin System Maintenance and Monitoring Event Bus Control Actor Consistent Hashing Router Task Actor Executor Service SlackBot Actor User Command Figure: Scheduler Slack Bot
  • 33. Scheduler for Kylin System Maintenance and Monitoring Figure: List task status
  • 34. Scheduler for Kylin System Maintenance and Monitoring Figure: List Kylin Job Progress
  • 35. Conclusion ▶ With Apache Kylin, we’re providing a sub-second web analytics service ▶ With little effort, we managed to deploy Apache Kylin with docker container ▶ With the scheduler, we deployed the system on AWS without losses of features ▶ We’ve made the system concurrency safe and robust
  • 36. Conclusion Version 3? But wait, we still have a problem, don’t we?
  • 37. Conclusion Version 3? User Keen.IOS3 North America S3 Tokyo, Japan S3 Beijing, China User 5 minutes10 minutes20 minutes Page Views Figure: Data Transfer Delay of Keen IO
  • 38. Conclusion Version 3? User S3 Tokyo, Japan Application Load Balancer S3 Beijing, China User 5 minutes Application Load Balancer 5 minutes Page Views Page Views Figure: Collecting Data with ALB?
  • 39. Thank you! BTW, we’re still hiring Data Platform Engineer: 1. Writing Scala 2. Working on AWS 3. Working with Apache Kylin 4. Working on our “Project Manhattan”