Infrastructure for auto scaling distributed system

Kai Sasaki
Kai SasakiSoftware engineer - Treasure Data
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Big Data Conference in Vilnius 2018
Kai Sasaki
Infrastructure for
Auto Scaling
Distributed System
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Bio
Kai Sasaki (佐々木 海)
• Senior Software Engineer at Arm Treasure Data since 2015
• Hadoop, Presto, Spark, TensorFlow.js, Apache Hivemall
• Books
– Available as paperback
and ebook.
• Twitter
– @Lewuathe
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Agenda
• Who is Treasure Data?
• What is distributed data analysis?
• What kind of challenges we have?
– Operational Cost
– Stability and Scalability
• Our Approach
– AWS CodeDeploy & Auto Scaling Group
– Query Simulation
– Graceful/Force Shutdown
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Who is Treasure Data?
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Treasure Data
Founded in Dec, 2011 in Silicon Valley
• Mountain View, CA
• DMP, eCDP, IoT, Cloud
• We joined Arm Oct, 2018
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Treasure Data
We are providing end-to-end integrated data analysis platform.
• Data Ingestion
– Mobile Device, Automotive, IoT
• Enterprise Customer Data Platform
• Service Integration
– BI tool (e.g. Tableau)
– Marketing tool
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Treasure Data
Open Source Lover
• Fluentd
• Embulk
• Digdag
• Apache Hivemall
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Enterprise Data Analysis
• Scalable processing
• Reliable platform
• Secure data protection
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Arm Pelion Platform
Treasure Data is a part of Arm Pelion IoT Platform
• Flexibility in connectivity management
• Efficient data processing
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Distributed Data
Analysis
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Distributed Data Analysis
Service component that enables us to process huge dataset
Scalability Throughput Data Consistency
• Easy to do horizontal scaling
• Flexible to the business
requirement
– Interface (e.g. SQL)
– Data Format
• Impossible scale with single
node machine
• Business requirement for batch
processing (e.g. daily batch)
• Write side operation is possible
– INSERT, DELETE, UPDATE
• Correct measurement is the
key for data analysis
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Distributed Processing Engines
Bunch of open source softwares are available for distributed processing
• Hadoop
• Presto
• Spark
• Kafka
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Typical Architecture
Master-Worker Model
https://www.tutorialspoint.com/apache_presto/apache_presto_architecture.htm
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Distributed Plan
select
t1.class,
t2.features,
count(1)
from iris t1
join iris t2
on t1.class = t2.class
group by 1, 2;
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Challenges
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Challenges for Distributed Data Analysis
Maintaining distributed data analysis platform in real world is not easy.
• Operation
– Deployment
– Logging Investigation
– Monitoring
• Money
– Large Scale Cluster
– Network Cost
• Stability
– Capacity Sufficiency
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Challenges for Distributed Data Analysis
Manual launch/termination?
Capacity estimation is correct?
Which version is deployed?
What kind of metrics do we
need to monitor?
How much does it cost?
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Challenges for Distributed Data Analysis
Manual launch/termination?
Capacity estimation is correct?
Which version is deployed?
What kind of metrics do we
need to monitor?
How much does it cost?
MANUALLY
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Our Approach
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Our Approach
Practical solutions by taking full advantage of public cloud services
• AWS CodeDeploy
– Integration with Auto Scaling Group
• EC2 Auto Scaling Group
– Load test by Query Simulation
– Metric Based Capacity Estimation
– Graceful/Force Instance Termination
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
CodeDeploy
Deployment Service for Deployment in AWS
• Easy to Integrate with Auto Scaling Group
• Available Everywhere
– Supporting On-Premise Instances
• Scalable for distributed system use cases
• https://docs.aws.amazon.com/codedeploy/index.html
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Auto Scaling System
System should be scaled automatically without any manual operation
• Load test by Query Simulation
• Metric Based Capacity Estimation
• Graceful Termination & Force Termination
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Query Simulation
Load test should be based on the real world workload.
• Get query list from the past history of our customer
• Query signature clustering
• Construct data set and query list based on the list
• That enables us to do load test easily based on production workload
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Query Signature
Query signature represents a query in a shortened format.
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Query Simulation
Conductor
c5.9xlarge
1. Get raw query list 2. Construct test data and query list
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Metric Based Capacity Estimation
Designed to achieve target metric value by adjusting capacity
• Add/reduce instances proportional to the target metric value
• e.g. Target average CPU usage = 40%
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Metric Based Capacity Estimation
Designed to achieve target metric value by adjusting capacity
• 40% is the threshold to balance the cost and performance
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Graceful Termination
Terminating instances gracefully
• Avoid making worse user experience
• Lifecycle hook in auto scaling group
• Cron job to check running tasks
– Number of tasks in the worker
– Send completion to lifecycle hook
https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroupLifecycle.html
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Graceful Termination
Terminating instances gracefully
1. Instance is moved to Terminating:Wait status
2. Cron job make the state transition to Terminating:Proceed
3. The instance is gracefully terminated
Send complete lifecycle hook
ASG terminate the instance
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Force Termination
Long running task can block graceful termination
• Put “timeout” limitation
• Simulate “how long it takes to terminate gracefully”
Date Time
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Instance Termination
Balance between customer experience and cost optimization.
Graceful Termination
Keep queries running as much as possible
satisfies customer expectation.
• Non fault tolerant system such as Presto
• Distributed analysis workload tends to be too long
to be retried
Force Termination
Cost optimization is one of the primary
goal of auto scaling
• Auto scale out/in around 10 minutes does not lose
agility for capacity adjustment.
• Force termination happening only over 10 mins
queries is acceptable
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Recap
• Who is Treasure Data?
• What is distributed data analysis?
• What kind of challenges we have?
– Operational Cost
– Stability and Scalability
• Our Approach
– AWS CodeDeploy & Auto Scaling Group
– Query Simulation
– Graceful/Force Shutdown
Thank You!
Danke!
Merci!
谢谢!
Gracias!
Kiitos!
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
1 of 33

Recommended

Stories from the Frontline: The Best Laid Plans, Gotchas and Business Outcom... by
Stories from the Frontline:  The Best Laid Plans, Gotchas and Business Outcom...Stories from the Frontline:  The Best Laid Plans, Gotchas and Business Outcom...
Stories from the Frontline: The Best Laid Plans, Gotchas and Business Outcom...Amazon Web Services
363 views15 slides
Migrating Critical Workloads at Scale: Best Practice for SAP Migration - AWS ... by
Migrating Critical Workloads at Scale: Best Practice for SAP Migration - AWS ...Migrating Critical Workloads at Scale: Best Practice for SAP Migration - AWS ...
Migrating Critical Workloads at Scale: Best Practice for SAP Migration - AWS ...Amazon Web Services
1.1K views32 slides
The Real AWS Migration Opportunity by
The Real AWS Migration OpportunityThe Real AWS Migration Opportunity
The Real AWS Migration OpportunityAmazon Web Services
460 views18 slides
Getting Started on Your AWS Migration Journey - AWS Summit Sydney 2018 by
Getting Started on Your AWS Migration Journey - AWS Summit Sydney 2018Getting Started on Your AWS Migration Journey - AWS Summit Sydney 2018
Getting Started on Your AWS Migration Journey - AWS Summit Sydney 2018Amazon Web Services
661 views41 slides
Beyond Infrastructure for SAP on AWS (GPSTEC322) - AWS re:Invent 2018 by
Beyond Infrastructure for SAP on AWS (GPSTEC322) - AWS re:Invent 2018Beyond Infrastructure for SAP on AWS (GPSTEC322) - AWS re:Invent 2018
Beyond Infrastructure for SAP on AWS (GPSTEC322) - AWS re:Invent 2018Amazon Web Services
820 views32 slides
Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se... by
Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se...Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se...
Discover & Migrate at Scale with AWS Migration Hub & Application Discovery Se...Amazon Web Services
496 views22 slides

More Related Content

What's hot

Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS... by
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...Amazon Web Services
2.4K views21 slides
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018 by
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018Amazon Web Services
1.9K views26 slides
Migration Planning with AWS Application Discovery Service - ENT308 - Chicago ... by
Migration Planning with AWS Application Discovery Service - ENT308 - Chicago ...Migration Planning with AWS Application Discovery Service - ENT308 - Chicago ...
Migration Planning with AWS Application Discovery Service - ENT308 - Chicago ...Amazon Web Services
2.2K views52 slides
One-stop Solution for Mass Migration with Disaster Recovery Methodology with ... by
One-stop Solution for Mass Migration with Disaster Recovery Methodology with ...One-stop Solution for Mass Migration with Disaster Recovery Methodology with ...
One-stop Solution for Mass Migration with Disaster Recovery Methodology with ...Amazon Web Services
2K views140 slides
Easy and Efficient Batch Computing on AWS by
Easy and Efficient Batch Computing on AWSEasy and Efficient Batch Computing on AWS
Easy and Efficient Batch Computing on AWSAmazon Web Services
152 views24 slides
Migrare a AWS per ridurre il debito tecnico e focalizzarsi sull'innovazione by
Migrare a AWS per ridurre il debito tecnico e focalizzarsi sull'innovazioneMigrare a AWS per ridurre il debito tecnico e focalizzarsi sull'innovazione
Migrare a AWS per ridurre il debito tecnico e focalizzarsi sull'innovazioneAmazon Web Services
184 views25 slides

What's hot(20)

Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS... by Amazon Web Services
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
Amazon Web Services2.4K views
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018 by Amazon Web Services
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018
Amazon Web Services1.9K views
Migration Planning with AWS Application Discovery Service - ENT308 - Chicago ... by Amazon Web Services
Migration Planning with AWS Application Discovery Service - ENT308 - Chicago ...Migration Planning with AWS Application Discovery Service - ENT308 - Chicago ...
Migration Planning with AWS Application Discovery Service - ENT308 - Chicago ...
Amazon Web Services2.2K views
One-stop Solution for Mass Migration with Disaster Recovery Methodology with ... by Amazon Web Services
One-stop Solution for Mass Migration with Disaster Recovery Methodology with ...One-stop Solution for Mass Migration with Disaster Recovery Methodology with ...
One-stop Solution for Mass Migration with Disaster Recovery Methodology with ...
Migrare a AWS per ridurre il debito tecnico e focalizzarsi sull'innovazione by Amazon Web Services
Migrare a AWS per ridurre il debito tecnico e focalizzarsi sull'innovazioneMigrare a AWS per ridurre il debito tecnico e focalizzarsi sull'innovazione
Migrare a AWS per ridurre il debito tecnico e focalizzarsi sull'innovazione
Hands-On: Building a Migration Strategy for SQL Server on AWS (WIN310) - AWS ... by Amazon Web Services
Hands-On: Building a Migration Strategy for SQL Server on AWS (WIN310) - AWS ...Hands-On: Building a Migration Strategy for SQL Server on AWS (WIN310) - AWS ...
Hands-On: Building a Migration Strategy for SQL Server on AWS (WIN310) - AWS ...
Accelerating Your Portfolio Migration to AWS Using AWS Migration Hub - ENT321... by Amazon Web Services
Accelerating Your Portfolio Migration to AWS Using AWS Migration Hub - ENT321...Accelerating Your Portfolio Migration to AWS Using AWS Migration Hub - ENT321...
Accelerating Your Portfolio Migration to AWS Using AWS Migration Hub - ENT321...
Amazon Web Services1.7K views
Migrate from Netezza to Amazon Redshift: Best Practices with Financial Engine... by Amazon Web Services
Migrate from Netezza to Amazon Redshift: Best Practices with Financial Engine...Migrate from Netezza to Amazon Redshift: Best Practices with Financial Engine...
Migrate from Netezza to Amazon Redshift: Best Practices with Financial Engine...
Amazon Web Services3.1K views
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ... by Amazon Web Services
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
Amazon Web Services1.4K views
Migrating Databases to the Cloud with AWS Database Migration Service (DAT207)... by Amazon Web Services
Migrating Databases to the Cloud with AWS Database Migration Service (DAT207)...Migrating Databases to the Cloud with AWS Database Migration Service (DAT207)...
Migrating Databases to the Cloud with AWS Database Migration Service (DAT207)...
Amazon Web Services2.6K views
Getting Started with Amazon Database Migration Service by Amazon Web Services
Getting Started with Amazon Database Migration ServiceGetting Started with Amazon Database Migration Service
Getting Started with Amazon Database Migration Service
Amazon Web Services1.2K views
AWS re:Invent 2016: Fueling Migration: Shortcutting your Application Portfoli... by Amazon Web Services
AWS re:Invent 2016: Fueling Migration: Shortcutting your Application Portfoli...AWS re:Invent 2016: Fueling Migration: Shortcutting your Application Portfoli...
AWS re:Invent 2016: Fueling Migration: Shortcutting your Application Portfoli...
Amazon Web Services1.7K views
Accelerate SAP Workloads on AWS High-Memory Instances Powered by Intel (BAP34... by Amazon Web Services
Accelerate SAP Workloads on AWS High-Memory Instances Powered by Intel (BAP34...Accelerate SAP Workloads on AWS High-Memory Instances Powered by Intel (BAP34...
Accelerate SAP Workloads on AWS High-Memory Instances Powered by Intel (BAP34...
Hands-On Building and Deploying .NET Applications on AWS (DEV331-R1) - AWS re... by Amazon Web Services
Hands-On Building and Deploying .NET Applications on AWS (DEV331-R1) - AWS re...Hands-On Building and Deploying .NET Applications on AWS (DEV331-R1) - AWS re...
Hands-On Building and Deploying .NET Applications on AWS (DEV331-R1) - AWS re...
Deep Dive on Amazon Aurora PostgreSQL Performance Tuning (DAT428-R1) - AWS re... by Amazon Web Services
Deep Dive on Amazon Aurora PostgreSQL Performance Tuning (DAT428-R1) - AWS re...Deep Dive on Amazon Aurora PostgreSQL Performance Tuning (DAT428-R1) - AWS re...
Deep Dive on Amazon Aurora PostgreSQL Performance Tuning (DAT428-R1) - AWS re...
Amazon Web Services4.4K views
Best Practices for Migrating Oracle Databases to the Cloud - AWS Online Tech ... by Amazon Web Services
Best Practices for Migrating Oracle Databases to the Cloud - AWS Online Tech ...Best Practices for Migrating Oracle Databases to the Cloud - AWS Online Tech ...
Best Practices for Migrating Oracle Databases to the Cloud - AWS Online Tech ...
Amazon Web Services3.4K views
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018 by Amazon Web Services
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018

Similar to Infrastructure for auto scaling distributed system

Data freedom: come migrare i carichi di lavoro Big Data su AWS by
Data freedom: come migrare i carichi di lavoro Big Data su AWSData freedom: come migrare i carichi di lavoro Big Data su AWS
Data freedom: come migrare i carichi di lavoro Big Data su AWSAmazon Web Services
225 views37 slides
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -... by
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...Amazon Web Services
1.4K views95 slides
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit... by
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Amazon Web Services
2.5K views57 slides
How Amazon.com Migrates Inventory Management Systems (DAT346) - AWS re:Invent... by
How Amazon.com Migrates Inventory Management Systems (DAT346) - AWS re:Invent...How Amazon.com Migrates Inventory Management Systems (DAT346) - AWS re:Invent...
How Amazon.com Migrates Inventory Management Systems (DAT346) - AWS re:Invent...Amazon Web Services
1.1K views47 slides
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He... by
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...Amazon Web Services
463 views30 slides
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320... by
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...Amazon Web Services
1.1K views55 slides

Similar to Infrastructure for auto scaling distributed system(20)

Data freedom: come migrare i carichi di lavoro Big Data su AWS by Amazon Web Services
Data freedom: come migrare i carichi di lavoro Big Data su AWSData freedom: come migrare i carichi di lavoro Big Data su AWS
Data freedom: come migrare i carichi di lavoro Big Data su AWS
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -... by Amazon Web Services
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...
Amazon Web Services1.4K views
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit... by Amazon Web Services
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Amazon Web Services2.5K views
How Amazon.com Migrates Inventory Management Systems (DAT346) - AWS re:Invent... by Amazon Web Services
How Amazon.com Migrates Inventory Management Systems (DAT346) - AWS re:Invent...How Amazon.com Migrates Inventory Management Systems (DAT346) - AWS re:Invent...
How Amazon.com Migrates Inventory Management Systems (DAT346) - AWS re:Invent...
Amazon Web Services1.1K views
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He... by Amazon Web Services
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320... by Amazon Web Services
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
Amazon Web Services1.1K views
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv... by Amazon Web Services
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Amazon Web Services2.6K views
Migrazione di Database e Data Warehouse su AWS by Amazon Web Services
Migrazione di Database e Data Warehouse su AWSMigrazione di Database e Data Warehouse su AWS
Migrazione di Database e Data Warehouse su AWS
Amazon Web Services1.2K views
Rodney Lester: Well-Architected - Reliability Instructor Led Lab.pdf by Amazon Web Services
Rodney Lester: Well-Architected - Reliability Instructor Led Lab.pdfRodney Lester: Well-Architected - Reliability Instructor Led Lab.pdf
Rodney Lester: Well-Architected - Reliability Instructor Led Lab.pdf
Deploying Microservices using AWS Fargate (CON315-R1) - AWS re:Invent 2018 by Amazon Web Services
Deploying Microservices using AWS Fargate (CON315-R1) - AWS re:Invent 2018Deploying Microservices using AWS Fargate (CON315-R1) - AWS re:Invent 2018
Deploying Microservices using AWS Fargate (CON315-R1) - AWS re:Invent 2018
Amazon Web Services1.3K views
Data Transformation Patterns in AWS - AWS Online Tech Talks by Amazon Web Services
Data Transformation Patterns in AWS - AWS Online Tech TalksData Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech Talks
Amazon Web Services1.6K views
Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018 by Amazon Web Services
Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018
Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale... by Amazon Web Services
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Amazon Web Services2.3K views
Gluecon 2018 - The Best Practices and Hard Lessons Learned of Serverless Appl... by Chris Munns
Gluecon 2018 - The Best Practices and Hard Lessons Learned of Serverless Appl...Gluecon 2018 - The Best Practices and Hard Lessons Learned of Serverless Appl...
Gluecon 2018 - The Best Practices and Hard Lessons Learned of Serverless Appl...
Chris Munns477 views
How Nubank Automates Fine-Grained Security with IAM, AWS Lambda, and CI/CD (F... by Amazon Web Services
How Nubank Automates Fine-Grained Security with IAM, AWS Lambda, and CI/CD (F...How Nubank Automates Fine-Grained Security with IAM, AWS Lambda, and CI/CD (F...
How Nubank Automates Fine-Grained Security with IAM, AWS Lambda, and CI/CD (F...
Amazon Web Services1.3K views
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018 by Amazon Web Services
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
Resiliency Testing: Verify That Your System Is as Reliable as You Think (ARC4... by Amazon Web Services
Resiliency Testing: Verify That Your System Is as Reliable as You Think (ARC4...Resiliency Testing: Verify That Your System Is as Reliable as You Think (ARC4...
Resiliency Testing: Verify That Your System Is as Reliable as You Think (ARC4...
Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ... by Amazon Web Services
Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...
Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...
Amazon Web Services1.8K views

More from Kai Sasaki

Graviton 2で実現する
コスト効率のよいCDP基盤 by
Graviton 2で実現する
コスト効率のよいCDP基盤Graviton 2で実現する
コスト効率のよいCDP基盤
Graviton 2で実現する
コスト効率のよいCDP基盤Kai Sasaki
2.1K views27 slides
Continuous Optimization for Distributed BigData Analysis by
Continuous Optimization for Distributed BigData AnalysisContinuous Optimization for Distributed BigData Analysis
Continuous Optimization for Distributed BigData AnalysisKai Sasaki
1.2K views38 slides
Recent Changes and Challenges for Future Presto by
Recent Changes and Challenges for Future PrestoRecent Changes and Challenges for Future Presto
Recent Changes and Challenges for Future PrestoKai Sasaki
1.3K views32 slides
Real World Storage in Treasure Data by
Real World Storage in Treasure DataReal World Storage in Treasure Data
Real World Storage in Treasure DataKai Sasaki
543 views67 slides
20180522 infra autoscaling_system by
20180522 infra autoscaling_system20180522 infra autoscaling_system
20180522 infra autoscaling_systemKai Sasaki
1.2K views33 slides
User Defined Partitioning on PlazmaDB by
User Defined Partitioning on PlazmaDBUser Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDBKai Sasaki
1.4K views28 slides

More from Kai Sasaki(20)

Graviton 2で実現する
コスト効率のよいCDP基盤 by Kai Sasaki
Graviton 2で実現する
コスト効率のよいCDP基盤Graviton 2で実現する
コスト効率のよいCDP基盤
Graviton 2で実現する
コスト効率のよいCDP基盤
Kai Sasaki2.1K views
Continuous Optimization for Distributed BigData Analysis by Kai Sasaki
Continuous Optimization for Distributed BigData AnalysisContinuous Optimization for Distributed BigData Analysis
Continuous Optimization for Distributed BigData Analysis
Kai Sasaki1.2K views
Recent Changes and Challenges for Future Presto by Kai Sasaki
Recent Changes and Challenges for Future PrestoRecent Changes and Challenges for Future Presto
Recent Changes and Challenges for Future Presto
Kai Sasaki1.3K views
Real World Storage in Treasure Data by Kai Sasaki
Real World Storage in Treasure DataReal World Storage in Treasure Data
Real World Storage in Treasure Data
Kai Sasaki543 views
20180522 infra autoscaling_system by Kai Sasaki
20180522 infra autoscaling_system20180522 infra autoscaling_system
20180522 infra autoscaling_system
Kai Sasaki1.2K views
User Defined Partitioning on PlazmaDB by Kai Sasaki
User Defined Partitioning on PlazmaDBUser Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDB
Kai Sasaki1.4K views
Deep dive into deeplearn.js by Kai Sasaki
Deep dive into deeplearn.jsDeep dive into deeplearn.js
Deep dive into deeplearn.js
Kai Sasaki2.9K views
Optimizing Presto Connector on Cloud Storage by Kai Sasaki
Optimizing Presto Connector on Cloud StorageOptimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud Storage
Kai Sasaki2.4K views
Presto updates to 0.178 by Kai Sasaki
Presto updates to 0.178Presto updates to 0.178
Presto updates to 0.178
Kai Sasaki1.3K views
How to ensure Presto scalability 
in multi use case by Kai Sasaki
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case
Kai Sasaki4.2K views
Managing multi tenant resource toward Hive 2.0 by Kai Sasaki
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0
Kai Sasaki2.2K views
Embulk makes Japan visible by Kai Sasaki
Embulk makes Japan visibleEmbulk makes Japan visible
Embulk makes Japan visible
Kai Sasaki4.3K views
Maintainable cloud architecture_of_hadoop by Kai Sasaki
Maintainable cloud architecture_of_hadoopMaintainable cloud architecture_of_hadoop
Maintainable cloud architecture_of_hadoop
Kai Sasaki4.3K views
図でわかるHDFS Erasure Coding by Kai Sasaki
図でわかるHDFS Erasure Coding図でわかるHDFS Erasure Coding
図でわかるHDFS Erasure Coding
Kai Sasaki4.8K views
Spark MLlib code reading ~optimization~ by Kai Sasaki
Spark MLlib code reading ~optimization~Spark MLlib code reading ~optimization~
Spark MLlib code reading ~optimization~
Kai Sasaki835 views
How I tried MADE by Kai Sasaki
How I tried MADEHow I tried MADE
How I tried MADE
Kai Sasaki1.2K views
Reading kernel org by Kai Sasaki
Reading kernel orgReading kernel org
Reading kernel org
Kai Sasaki817 views
Reading drill by Kai Sasaki
Reading drillReading drill
Reading drill
Kai Sasaki1.1K views
Kernel ext4 by Kai Sasaki
Kernel ext4Kernel ext4
Kernel ext4
Kai Sasaki1.6K views
Kernel bootstrap by Kai Sasaki
Kernel bootstrapKernel bootstrap
Kernel bootstrap
Kai Sasaki1.3K views

Recently uploaded

360 graden fabriek by
360 graden fabriek360 graden fabriek
360 graden fabriekinfo33492
143 views25 slides
ShortStory_qlora.pptx by
ShortStory_qlora.pptxShortStory_qlora.pptx
ShortStory_qlora.pptxpranathikrishna22
5 views10 slides
SAP FOR TYRE INDUSTRY.pdf by
SAP FOR TYRE INDUSTRY.pdfSAP FOR TYRE INDUSTRY.pdf
SAP FOR TYRE INDUSTRY.pdfVirendra Rai, PMP
28 views3 slides
SAP FOR CONTRACT MANUFACTURING.pdf by
SAP FOR CONTRACT MANUFACTURING.pdfSAP FOR CONTRACT MANUFACTURING.pdf
SAP FOR CONTRACT MANUFACTURING.pdfVirendra Rai, PMP
13 views2 slides
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium... by
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...Lisi Hocke
35 views124 slides
JioEngage_Presentation.pptx by
JioEngage_Presentation.pptxJioEngage_Presentation.pptx
JioEngage_Presentation.pptxadmin125455
6 views4 slides

Recently uploaded(20)

360 graden fabriek by info33492
360 graden fabriek360 graden fabriek
360 graden fabriek
info33492143 views
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium... by Lisi Hocke
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...
Lisi Hocke35 views
JioEngage_Presentation.pptx by admin125455
JioEngage_Presentation.pptxJioEngage_Presentation.pptx
JioEngage_Presentation.pptx
admin1254556 views
Sprint 226 by ManageIQ
Sprint 226Sprint 226
Sprint 226
ManageIQ10 views
Airline Booking Software by SharmiMehta
Airline Booking SoftwareAirline Booking Software
Airline Booking Software
SharmiMehta7 views
Ports-and-Adapters Architecture for Embedded HMI by Burkhard Stubert
Ports-and-Adapters Architecture for Embedded HMIPorts-and-Adapters Architecture for Embedded HMI
Ports-and-Adapters Architecture for Embedded HMI
Burkhard Stubert26 views
BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports by Ra'Fat Al-Msie'deen
BushraDBR: An Automatic Approach to Retrieving Duplicate Bug ReportsBushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports
BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports
Dapr Unleashed: Accelerating Microservice Development by Miroslav Janeski
Dapr Unleashed: Accelerating Microservice DevelopmentDapr Unleashed: Accelerating Microservice Development
Dapr Unleashed: Accelerating Microservice Development
Miroslav Janeski12 views
FIMA 2023 Neo4j & FS - Entity Resolution.pptx by Neo4j
FIMA 2023 Neo4j & FS - Entity Resolution.pptxFIMA 2023 Neo4j & FS - Entity Resolution.pptx
FIMA 2023 Neo4j & FS - Entity Resolution.pptx
Neo4j17 views
Software evolution understanding: Automatic extraction of software identifier... by Ra'Fat Al-Msie'deen
Software evolution understanding: Automatic extraction of software identifier...Software evolution understanding: Automatic extraction of software identifier...
Software evolution understanding: Automatic extraction of software identifier...
FOSSLight Community Day 2023-11-30 by Shane Coughlan
FOSSLight Community Day 2023-11-30FOSSLight Community Day 2023-11-30
FOSSLight Community Day 2023-11-30
Shane Coughlan6 views
Top-5-production-devconMunich-2023.pptx by Tier1 app
Top-5-production-devconMunich-2023.pptxTop-5-production-devconMunich-2023.pptx
Top-5-production-devconMunich-2023.pptx
Tier1 app8 views
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated... by TomHalpin9
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...
TomHalpin96 views
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with... by sparkfabrik
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...
sparkfabrik8 views

Infrastructure for auto scaling distributed system

  • 1. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Big Data Conference in Vilnius 2018 Kai Sasaki Infrastructure for Auto Scaling Distributed System
  • 2. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Bio Kai Sasaki (佐々木 海) • Senior Software Engineer at Arm Treasure Data since 2015 • Hadoop, Presto, Spark, TensorFlow.js, Apache Hivemall • Books – Available as paperback and ebook. • Twitter – @Lewuathe
  • 3. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Agenda • Who is Treasure Data? • What is distributed data analysis? • What kind of challenges we have? – Operational Cost – Stability and Scalability • Our Approach – AWS CodeDeploy & Auto Scaling Group – Query Simulation – Graceful/Force Shutdown
  • 4. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Who is Treasure Data?
  • 5. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Treasure Data Founded in Dec, 2011 in Silicon Valley • Mountain View, CA • DMP, eCDP, IoT, Cloud • We joined Arm Oct, 2018
  • 6. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Treasure Data We are providing end-to-end integrated data analysis platform. • Data Ingestion – Mobile Device, Automotive, IoT • Enterprise Customer Data Platform • Service Integration – BI tool (e.g. Tableau) – Marketing tool
  • 7. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Treasure Data Open Source Lover • Fluentd • Embulk • Digdag • Apache Hivemall
  • 8. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Enterprise Data Analysis • Scalable processing • Reliable platform • Secure data protection
  • 9. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Arm Pelion Platform Treasure Data is a part of Arm Pelion IoT Platform • Flexibility in connectivity management • Efficient data processing
  • 10. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Distributed Data Analysis
  • 11. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Distributed Data Analysis Service component that enables us to process huge dataset Scalability Throughput Data Consistency • Easy to do horizontal scaling • Flexible to the business requirement – Interface (e.g. SQL) – Data Format • Impossible scale with single node machine • Business requirement for batch processing (e.g. daily batch) • Write side operation is possible – INSERT, DELETE, UPDATE • Correct measurement is the key for data analysis
  • 12. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Distributed Processing Engines Bunch of open source softwares are available for distributed processing • Hadoop • Presto • Spark • Kafka
  • 13. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Typical Architecture Master-Worker Model https://www.tutorialspoint.com/apache_presto/apache_presto_architecture.htm
  • 14. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Distributed Plan select t1.class, t2.features, count(1) from iris t1 join iris t2 on t1.class = t2.class group by 1, 2;
  • 15. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Challenges
  • 16. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Challenges for Distributed Data Analysis Maintaining distributed data analysis platform in real world is not easy. • Operation – Deployment – Logging Investigation – Monitoring • Money – Large Scale Cluster – Network Cost • Stability – Capacity Sufficiency
  • 17. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Challenges for Distributed Data Analysis Manual launch/termination? Capacity estimation is correct? Which version is deployed? What kind of metrics do we need to monitor? How much does it cost?
  • 18. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Challenges for Distributed Data Analysis Manual launch/termination? Capacity estimation is correct? Which version is deployed? What kind of metrics do we need to monitor? How much does it cost? MANUALLY
  • 19. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Our Approach
  • 20. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Our Approach Practical solutions by taking full advantage of public cloud services • AWS CodeDeploy – Integration with Auto Scaling Group • EC2 Auto Scaling Group – Load test by Query Simulation – Metric Based Capacity Estimation – Graceful/Force Instance Termination
  • 21. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. CodeDeploy Deployment Service for Deployment in AWS • Easy to Integrate with Auto Scaling Group • Available Everywhere – Supporting On-Premise Instances • Scalable for distributed system use cases • https://docs.aws.amazon.com/codedeploy/index.html
  • 22. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Auto Scaling System System should be scaled automatically without any manual operation • Load test by Query Simulation • Metric Based Capacity Estimation • Graceful Termination & Force Termination
  • 23. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Query Simulation Load test should be based on the real world workload. • Get query list from the past history of our customer • Query signature clustering • Construct data set and query list based on the list • That enables us to do load test easily based on production workload
  • 24. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Query Signature Query signature represents a query in a shortened format.
  • 25. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Query Simulation Conductor c5.9xlarge 1. Get raw query list 2. Construct test data and query list
  • 26. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Metric Based Capacity Estimation Designed to achieve target metric value by adjusting capacity • Add/reduce instances proportional to the target metric value • e.g. Target average CPU usage = 40%
  • 27. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Metric Based Capacity Estimation Designed to achieve target metric value by adjusting capacity • 40% is the threshold to balance the cost and performance
  • 28. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Graceful Termination Terminating instances gracefully • Avoid making worse user experience • Lifecycle hook in auto scaling group • Cron job to check running tasks – Number of tasks in the worker – Send completion to lifecycle hook https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroupLifecycle.html
  • 29. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Graceful Termination Terminating instances gracefully 1. Instance is moved to Terminating:Wait status 2. Cron job make the state transition to Terminating:Proceed 3. The instance is gracefully terminated Send complete lifecycle hook ASG terminate the instance
  • 30. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Force Termination Long running task can block graceful termination • Put “timeout” limitation • Simulate “how long it takes to terminate gracefully” Date Time
  • 31. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Instance Termination Balance between customer experience and cost optimization. Graceful Termination Keep queries running as much as possible satisfies customer expectation. • Non fault tolerant system such as Presto • Distributed analysis workload tends to be too long to be retried Force Termination Cost optimization is one of the primary goal of auto scaling • Auto scale out/in around 10 minutes does not lose agility for capacity adjustment. • Force termination happening only over 10 mins queries is acceptable
  • 32. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Recap • Who is Treasure Data? • What is distributed data analysis? • What kind of challenges we have? – Operational Cost – Stability and Scalability • Our Approach – AWS CodeDeploy & Auto Scaling Group – Query Simulation – Graceful/Force Shutdown
  • 33. Thank You! Danke! Merci! 谢谢! Gracias! Kiitos! Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.