Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Big Data Conference in Vilnius 2018
Kai Sasaki
I...
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Bio
Kai Sasaki (佐々木 海)
• Senior Software Enginee...
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Agenda
• Who is Treasure Data?
• What is distrib...
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Who is Treasure Data?
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Treasure Data
Founded in Dec, 2011 in Silicon Va...
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Treasure Data
We are providing end-to-end integr...
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Treasure Data
Open Source Lover
• Fluentd
• Embu...
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Enterprise Data Analysis
• Scalable processing
•...
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Arm Pelion Platform
Treasure Data is a part of A...
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Distributed Data
Analysis
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Distributed Data Analysis
Service component that...
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Distributed Processing Engines
Bunch of open sou...
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Typical Architecture
Master-Worker Model
https:/...
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Distributed Plan
select
t1.class,
t2.features,
c...
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Challenges
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Challenges for Distributed Data Analysis
Maintai...
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Challenges for Distributed Data Analysis
Manual ...
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Challenges for Distributed Data Analysis
Manual ...
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Our Approach
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Our Approach
Practical solutions by taking full ...
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
CodeDeploy
Deployment Service for Deployment in ...
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Auto Scaling System
System should be scaled auto...
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Query Simulation
Load test should be based on th...
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Query Signature
Query signature represents a que...
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Query Simulation
Conductor
c5.9xlarge
1. Get raw...
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Metric Based Capacity Estimation
Designed to ach...
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Metric Based Capacity Estimation
Designed to ach...
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Graceful Termination
Terminating instances grace...
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Graceful Termination
Terminating instances grace...
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Force Termination
Long running task can block gr...
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Instance Termination
Balance between customer ex...
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Recap
• Who is Treasure Data?
• What is distribu...
Thank You!
Danke!
Merci!
谢谢!
Gracias!
Kiitos!
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Upcoming SlideShare
Loading in …5
×

Infrastructure for auto scaling distributed system

675 views

Published on

For Big Data Conference Vilnius 2018

Published in: Software
  • Be the first to comment

  • Be the first to like this

Infrastructure for auto scaling distributed system

  1. 1. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Big Data Conference in Vilnius 2018 Kai Sasaki Infrastructure for Auto Scaling Distributed System
  2. 2. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Bio Kai Sasaki (佐々木 海) • Senior Software Engineer at Arm Treasure Data since 2015 • Hadoop, Presto, Spark, TensorFlow.js, Apache Hivemall • Books – Available as paperback and ebook. • Twitter – @Lewuathe
  3. 3. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Agenda • Who is Treasure Data? • What is distributed data analysis? • What kind of challenges we have? – Operational Cost – Stability and Scalability • Our Approach – AWS CodeDeploy & Auto Scaling Group – Query Simulation – Graceful/Force Shutdown
  4. 4. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Who is Treasure Data?
  5. 5. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Treasure Data Founded in Dec, 2011 in Silicon Valley • Mountain View, CA • DMP, eCDP, IoT, Cloud • We joined Arm Oct, 2018
  6. 6. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Treasure Data We are providing end-to-end integrated data analysis platform. • Data Ingestion – Mobile Device, Automotive, IoT • Enterprise Customer Data Platform • Service Integration – BI tool (e.g. Tableau) – Marketing tool
  7. 7. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Treasure Data Open Source Lover • Fluentd • Embulk • Digdag • Apache Hivemall
  8. 8. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Enterprise Data Analysis • Scalable processing • Reliable platform • Secure data protection
  9. 9. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Arm Pelion Platform Treasure Data is a part of Arm Pelion IoT Platform • Flexibility in connectivity management • Efficient data processing
  10. 10. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Distributed Data Analysis
  11. 11. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Distributed Data Analysis Service component that enables us to process huge dataset Scalability Throughput Data Consistency • Easy to do horizontal scaling • Flexible to the business requirement – Interface (e.g. SQL) – Data Format • Impossible scale with single node machine • Business requirement for batch processing (e.g. daily batch) • Write side operation is possible – INSERT, DELETE, UPDATE • Correct measurement is the key for data analysis
  12. 12. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Distributed Processing Engines Bunch of open source softwares are available for distributed processing • Hadoop • Presto • Spark • Kafka
  13. 13. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Typical Architecture Master-Worker Model https://www.tutorialspoint.com/apache_presto/apache_presto_architecture.htm
  14. 14. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Distributed Plan select t1.class, t2.features, count(1) from iris t1 join iris t2 on t1.class = t2.class group by 1, 2;
  15. 15. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Challenges
  16. 16. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Challenges for Distributed Data Analysis Maintaining distributed data analysis platform in real world is not easy. • Operation – Deployment – Logging Investigation – Monitoring • Money – Large Scale Cluster – Network Cost • Stability – Capacity Sufficiency
  17. 17. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Challenges for Distributed Data Analysis Manual launch/termination? Capacity estimation is correct? Which version is deployed? What kind of metrics do we need to monitor? How much does it cost?
  18. 18. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Challenges for Distributed Data Analysis Manual launch/termination? Capacity estimation is correct? Which version is deployed? What kind of metrics do we need to monitor? How much does it cost? MANUALLY
  19. 19. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Our Approach
  20. 20. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Our Approach Practical solutions by taking full advantage of public cloud services • AWS CodeDeploy – Integration with Auto Scaling Group • EC2 Auto Scaling Group – Load test by Query Simulation – Metric Based Capacity Estimation – Graceful/Force Instance Termination
  21. 21. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. CodeDeploy Deployment Service for Deployment in AWS • Easy to Integrate with Auto Scaling Group • Available Everywhere – Supporting On-Premise Instances • Scalable for distributed system use cases • https://docs.aws.amazon.com/codedeploy/index.html
  22. 22. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Auto Scaling System System should be scaled automatically without any manual operation • Load test by Query Simulation • Metric Based Capacity Estimation • Graceful Termination & Force Termination
  23. 23. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Query Simulation Load test should be based on the real world workload. • Get query list from the past history of our customer • Query signature clustering • Construct data set and query list based on the list • That enables us to do load test easily based on production workload
  24. 24. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Query Signature Query signature represents a query in a shortened format.
  25. 25. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Query Simulation Conductor c5.9xlarge 1. Get raw query list 2. Construct test data and query list
  26. 26. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Metric Based Capacity Estimation Designed to achieve target metric value by adjusting capacity • Add/reduce instances proportional to the target metric value • e.g. Target average CPU usage = 40%
  27. 27. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Metric Based Capacity Estimation Designed to achieve target metric value by adjusting capacity • 40% is the threshold to balance the cost and performance
  28. 28. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Graceful Termination Terminating instances gracefully • Avoid making worse user experience • Lifecycle hook in auto scaling group • Cron job to check running tasks – Number of tasks in the worker – Send completion to lifecycle hook https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroupLifecycle.html
  29. 29. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Graceful Termination Terminating instances gracefully 1. Instance is moved to Terminating:Wait status 2. Cron job make the state transition to Terminating:Proceed 3. The instance is gracefully terminated Send complete lifecycle hook ASG terminate the instance
  30. 30. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Force Termination Long running task can block graceful termination • Put “timeout” limitation • Simulate “how long it takes to terminate gracefully” Date Time
  31. 31. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Instance Termination Balance between customer experience and cost optimization. Graceful Termination Keep queries running as much as possible satisfies customer expectation. • Non fault tolerant system such as Presto • Distributed analysis workload tends to be too long to be retried Force Termination Cost optimization is one of the primary goal of auto scaling • Auto scale out/in around 10 minutes does not lose agility for capacity adjustment. • Force termination happening only over 10 mins queries is acceptable
  32. 32. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Recap • Who is Treasure Data? • What is distributed data analysis? • What kind of challenges we have? – Operational Cost – Stability and Scalability • Our Approach – AWS CodeDeploy & Auto Scaling Group – Query Simulation – Graceful/Force Shutdown
  33. 33. Thank You! Danke! Merci! 谢谢! Gracias! Kiitos! Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

×