Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Fine-Grained Scheduling
with Helix
Kanak Biscuitwala
Jason Zhang
Apache Helix Committers @ LinkedIn
helix.apache.org
@apac...
About Helix
Helix automates assignment of partitioned and
replicated distributed tasks in the face of node
failure and rec...
Helix at LinkedIn
Oracle
Oracle
OracleDB
Change Capture
Change
Consumers
Index Search Index
User Writes
Data Replicator
Ba...
Helix at LinkedIn
In Production
Over 1000 instances covering over 30000
database partitions
Over 1000 instances for change...
Others Using Helix
Datacenter Diversity
Intersection of Job Types
OracleDB OracleDB
Intersection of Job Types
OracleDB OracleDB
BackupBackup
Intersection of Job Types
OracleDB OracleDB
BackupBackup
HDFS
ETL ETL
Intersection of Job Types
OracleDB OracleDB
BackupBackup
HDFS
ETL ETL
Long-running and batch jobs running together!
Cloud Deployment
A
B
online
nearline
C batch
A1 A1
A2 A3B1
C1 C2
C3
B2 B3
C2
B4 B5
C2 C4
Applications with diverse require...
Cloud Deployment
A
B
C
A1 A1
A2 A3B1
C1 C2
C3
B2 B3
C2
B4 B5
C2 C4
Applications with diverse requirements running
together...
Processes on Machines
Machine ContainerProcess VM
Processes on Machines
TaskTaskProcess
No Isolation
Machine ContainerProcess VM
Processes on Machines
TaskTaskProcess
128 MB
128 MB
128 MB
Process
Process
Process
No Isolation VM-based Isolation
Machine...
Processes on Machines
TaskTaskProcess
256 MB
64 MB
128 MB
128 MB
128 MB
Process
Process
Process Process
Process
No Isolati...
• Run as individual processes
– Poor isolation or poor utilization
• Virtual machines
– Better isolation
– Xen, Hyper-V, E...
Processes on Machines
Virtualization and containerization significantly improve
process isolation and open up possibilities...
Container-Based Solution
Container-Based Solution
System Requirements
A
B
C
64 MB 64 MB 64 MB
128 MB 128 MB
256 MB
Container-Based Solution
Allocation
64 MB
64 MB
128 MB
256 MB
128 MB
64 MB
Machine
Container
Container-Based Solution
Allocation
64 MB
64 MB
128 MB
256 MB
128 MB
64 MB
Machine
Container
A
A
A
B
B
C
Process
Container-Based Solution
Allocation
64 MB
64 MB
128 MB
256 MB
128 MB
64 MB
Containerization is powerful!
Machine
Container...
Container-Based Solution
Allocation
64 MB
64 MB
128 MB
256 MB
128 MB
64 MB
Containerization is powerful!
Machine
Container...
Over-Utilization
256 MB
Container-Based Solution Machine
ContainerProcess
Over-Utilization
256 MB
Process 1
Container-Based Solution Machine
ContainerProcess
Over-Utilization
Outcome: Preemption and relaunch
256 MB
Process 1
Container-Based Solution Machine
ContainerProcess
Over-Utilization
Outcome: Preemption and relaunch
Container-Based Solution
384 MB
Machine
ContainerProcess
Over-Utilization
Outcome: Preemption and relaunch
Container-Based Solution
384 MBProcess 1
Machine
ContainerProcess
Under-Utilization
384 MB
128 MB
Container-Based Solution Machine
ContainerProcess
Under-Utilization
Outcome: Over-provisioned until restart
384 MB
Process 1
128 MB
Container-Based Solution Machine
Contain...
Container-Based Solution
Failure
64 MB
64 MB
128 MB
256 MB
128 MB
64 MB
Machine
Container
A
A
A
B
B
C
Process
Container-Based Solution
Failure
64 MB
64 MB
128 MB
128 MB
Machine
Container
A
A
B
B
Process
Container-Based Solution
Failure
64 MB
64 MB
128 MB
128 MB
Outcome: Launch containers elsewhere
Machine
Container
A
A
B
B
...
Container-Based Solution
Failure
64 MB
64 MB
128 MB
256 MB
128 MB
64 MB
Machine
Container
SLAVE
SLAVE
MASTER
B
B
C
Process
Container-Based Solution
Failure
64 MB
64 MB
128 MB
128 MB
Without additional information, the
master is unavailable until...
Scaling
Container-Based Solution Machine
ContainerProcess
256 MB50% 256 MB50%
Scaling
Container-Based Solution Machine
ContainerProcess
Scaling
Container-Based Solution Machine
ContainerProcess
128 MB33% 128 MB33% 128 MB33%
Outcome: Relaunch with new sharding
Container-Based Solution
Container-Based Solution
Utilization
Application requirements define container
size
Fault Toleranc...
Container-Based Solution
We need something finer-grained
The container model provides flexibility within machines,
but assum...
Task-Based Solution
Task-Based Solution
System Requirements
A
B
C
complete in less than 5 hours
always have 2 containers running
response time...
Task-Based Solution
Allocation
Machine
Container
A
A
B
Task
B
C
C
Over-Utilization
Task-Based Solution Machine
ContainerTask
Over-Utilization
Task-Based Solution
Task 1
Machine
ContainerTask
Over-Utilization
Task-Based Solution
Task 1
Machine
ContainerTask
Over-Utilization
Task-Based Solution
Task 1
Machine
ContainerTask
Task 1
Over-Utilization
Task-Based Solution
Hide the overhead of a container restart
Machine
ContainerTask
Task 1
Under-Utilization
384 MB
128 MB
Task-Based Solution Machine
ContainerTask
Under-Utilization
384 MB
Task 1
128 MB
Task 2
Task-Based Solution Machine
ContainerTask
Under-Utilization
Optimize container allocations based on usage
384 MB
Task 1
Task 2
Task-Based Solution Machine
Container...
Task-Based Solution
Failure
Task 1
Leader
Task 2
Leader
Task 3
Leader
Task 2
Standby
Task 3
Standby
Task 1
Standby
Task 2
...
Task-Based Solution
Failure
Task 1
Leader
Task 2
Leader
Task 2
Standby
Task 3
Standby
Task 1
Standby
Task 3
Standby
Task 3...
Task-Based Solution
Failure
Some systems cannot wait for new
containers to start
Task 1
Leader
Task 2
Leader
Task 2
Standb...
Task-Based Solution
Discovery
Task 1
Leader
Task 2
Leader
Task 2
Standby
Machine
Container
Task 1:!
Leader at N1
Standby a...
Task-Based Solution
Discovery
Task 1
Leader
Task 2
Leader
Task 2
Standby
Machine
Container
Learn where everything runs, an...
Scaling
Task-Based Solution
T4
T5
T6
T1
T2
T3
Machine
ContainerTask
Scaling
Task-Based Solution
T4
T5
T6
T1
T2
T3
Machine
ContainerTask
Scaling
Task-Based Solution
T4
T5
T6
T1
T2
T3
Machine
ContainerTask
Scaling
Task-Based Solution
T4
T5
T6
T1
T2
T3
Machine
ContainerTask
Comparing Solutions
Container Solution Task + Container Solution
Utilization
Application requirements
define container size...
Benefits of a Task-Based Solution
Comparing Solutions
Container reuse
Minimize overhead of container relaunch
Fine-grained ...
Benefits of a Task-Based Solution
Comparing Solutions
Container reuse
Minimize overhead of container relaunch
Fine-grained ...
Working at task granularity is powerful
We need a performance-centric approach to resource
assignment
Comparing Solutions
Working at task granularity is powerful
How can Helix help?
We need a performance-centric approach to resource
assignment
...
Working at task granularity is powerful
How can Helix help?
We need a performance-centric approach to resource
assignment
...
Task Management with Helix
Application Lifecycle
Capacity
Planning
Provisioning
Fault
Tolerance
State
Management
Allocating physical resources for yo...
Resource
Partition PartitionPartition
Helix Overview
Resources (Task Groups)
master
slave
offline
All partitions can be rep...
Offline Master
Slave
State Model and Constraints
Helix Overview
State
Constraints
Transition
Constraints
Partition
Master: ...
Offline Master
Slave
State Model and Constraints
Helix Overview
State
Constraints
Transition
Constraints
Partition
Master: ...
Controller PARTICIPANTS
Spectators
Controller
Controller
Manage
TASKS
Helix Overview
Cluster Roles
Helix Controller
High-Level Overview
Rebalancer
Task Assignment
Constraints
Nodes
“single master”
“no more than 3 tasks
pe...
Helix Controller
Rebalancer
ResourceAssignment computeResourceMapping(	
RebalancerConfig rebalancerConfig,	
ResourceAssign...
Helix Controller
Rebalancer
ResourceAssignment computeResourceMapping(	
RebalancerConfig rebalancerConfig,	
ResourceAssign...
Helix Controller
What is Missing?
Dynamic Container
Allocation
Container Isolation
Automated Service
Deployment
Resource U...
Helix Controller
Target Provider
Based on some constraints, determine how many
containers are required in this system
Fixe...
Helix Controller
Target Provider
Based on some constraints, determine how many
containers are required in this system
Targ...
Helix Controller
Adding a Target Provider
Rebalancer
Task Assignment
Constraints
Nodes
Target Provider
Helix Controller
Adding a Target Provider
Rebalancer
Task Assignment
Constraints
Nodes
Target Provider
How do we use the t...
Helix Controller
Container Provider
Given the container requirements, ensure that number
of containers are running
YARN
Me...
Helix Controller
Container Provider
Given the container requirements, ensure that number
of containers are running
Listena...
Helix Controller
Logical Container Provider
Helix Controller
Logical Container Provider
use
use
startContainer
Helix Controller
Adding a Container Provider
Rebalancer
Task Assignment
Constraints
Nodes
Target Provider
Container Provid...
Application Lifecycle
Capacity
Planning
Provisioning
Fault
Tolerance
State
Management
Target Provider
Container Provider
E...
System Architecture
System Architecture
Resource Provider
System Architecture
submit job
Resource ProviderClient
System Architecture
submit job
Resource Provider
Controller Container
Provisioner
Rebalancer
Client
App Launcher
System Architecture
submit job
Resource Provider
Controller Container
Provisioner
Rebalancer
Client
container
request
App ...
System Architecture
submit job
Resource Provider
Controller Container
Provisioner
Rebalancer
Client
container
request
Part...
System Architecture
submit job
Resource Provider
Controller Container
Provisioner
Rebalancer
Client
container
request
Part...
HDFS/Common Area
Helix + YARN
YARN Architecture
Client
Resource
Manager
Application Master Container
Node Manager Node Man...
HDFS/Common Area
Helix + YARN
Helix + YARN Architecture
Client
Resource
Manager
Application Master Container
Node Manager ...
HDFS/Common Area
Scheduler Slave
Helix + Mesos
Mesos Architecture
Scheduler
Mesos
Master
Slave Machine Slave Machine
Mesos...
Scheduler Slave
Helix Controller
Helix + Mesos
Helix + Mesos Architecture
Scheduler
Mesos
Master
Slave Machine Slave Machi...
Example
Distributed Document Store
Overview
Oracle
Partition 0
Partition 1
Partition 2 Oracle
Partition 0
Partition 1
Partition 2
...
Distributed Document Store
Overview
Oracle
Partition 0
Partition 1
Partition 2 Oracle
Partition 0
Partition 1
Partition 2
...
Distributed Document Store
YARN Example
Client
Resource
Manager
submit job
container
request
assign work
status
node statu...
YAML Specification
appConfig: { config: { k1: v1 } }	
appPackageUri: 'file://path/to/myApp-pkg.tar'	
appName: myApp	
servic...
YAML Specification
appConfig: { config: { k1: v1 } }	
appPackageUri: 'file://path/to/myApp-pkg.tar'	
appName: myApp	
servic...
Service/Container Implementation
public class MyQueuerService	
extends StatelessParticipantService {	
@Override	
public vo...
Task Implementation
public class BackupTask extends Task {	
@Override	
public ListenableFuture<Status> start() { ... }	
!
...
Distributed Document Store
State Model-Style Callbacks
public class StoreStateModel extends StateModel {	
public void onBe...
class	
  RoutingLogic	
  {	
  
	
  	
  	
  public	
  void	
  write(Request	
  request)	
  {	
  
	
  	
  	
  	
  	
  partit...
Recap
• Container abstraction has become a huge win
• With Helix, we can go a step further and make
tasks the unit of work...
Questions?
Jason zzhang@apache.org
Kanak kanak@apache.org; @kinselofant
Website helix.apache.org
Dev Mailing List dev@heli...
Fine-Grained Scheduling with Helix (ApacheCon NA 2014)
Upcoming SlideShare
Loading in …5
×

Fine-Grained Scheduling with Helix (ApacheCon NA 2014)

1,956 views

Published on

Talk from ApacheCon 2014. How Helix can integrate with provisioners like YARN or Mesos to manage the entire lifecycle of a distributed system.

Published in: Technology
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://tinyurl.com/umkjtl9 } ......................................................................................................................... Download Full EPUB Ebook here { https://tinyurl.com/umkjtl9 } ......................................................................................................................... Download Full doc Ebook here { https://tinyurl.com/umkjtl9 } ......................................................................................................................... Download PDF EBOOK here { https://tinyurl.com/umkjtl9 } ......................................................................................................................... Download EPUB Ebook here { https://tinyurl.com/umkjtl9 } ......................................................................................................................... Download doc Ebook here { https://tinyurl.com/umkjtl9 } ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://tinyurl.com/umkjtl9 } ......................................................................................................................... Download Full EPUB Ebook here { https://tinyurl.com/umkjtl9 } ......................................................................................................................... Download Full doc Ebook here { https://tinyurl.com/umkjtl9 } ......................................................................................................................... Download PDF EBOOK here { https://tinyurl.com/umkjtl9 } ......................................................................................................................... Download EPUB Ebook here { https://tinyurl.com/umkjtl9 } ......................................................................................................................... Download doc Ebook here { https://tinyurl.com/umkjtl9 } ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download Full EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download Full doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THIS can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THIS is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THIS Helped Ignite a Movement,-- Atomic Habits: An Easy &amp; Proven Way to Build Good Habits &amp; Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THIS the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THIS Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download Full EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download Full doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THIS can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THIS is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THIS Helped Ignite a Movement,-- Atomic Habits: An Easy &amp; Proven Way to Build Good Habits &amp; Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THIS the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THIS Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • accessibility Books Library allowing access to top content, including thousands of title from favorite author, plus the ability to read or download a huge selection of books for your pc or smartphone within minutes.........ACCESS WEBSITE Over for All Ebooks ..... (Unlimited) ......................................................................................................................... Download FULL PDF EBOOK here { https://urlzs.com/UABbn } .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Fine-Grained Scheduling with Helix (ApacheCon NA 2014)

  1. 1. Fine-Grained Scheduling with Helix Kanak Biscuitwala Jason Zhang Apache Helix Committers @ LinkedIn helix.apache.org @apachehelix
  2. 2. About Helix Helix automates assignment of partitioned and replicated distributed tasks in the face of node failure and recovery, cluster expansion, and reconfiguration.
  3. 3. Helix at LinkedIn Oracle Oracle OracleDB Change Capture Change Consumers Index Search Index User Writes Data Replicator Backup/Restore In Production ETL HDFS Analytics
  4. 4. Helix at LinkedIn In Production Over 1000 instances covering over 30000 database partitions Over 1000 instances for change capture consumers As many as 500 instances in a single Helix cluster (all numbers are per-datacenter)
  5. 5. Others Using Helix
  6. 6. Datacenter Diversity
  7. 7. Intersection of Job Types OracleDB OracleDB
  8. 8. Intersection of Job Types OracleDB OracleDB BackupBackup
  9. 9. Intersection of Job Types OracleDB OracleDB BackupBackup HDFS ETL ETL
  10. 10. Intersection of Job Types OracleDB OracleDB BackupBackup HDFS ETL ETL Long-running and batch jobs running together!
  11. 11. Cloud Deployment A B online nearline C batch A1 A1 A2 A3B1 C1 C2 C3 B2 B3 C2 B4 B5 C2 C4 Applications with diverse requirements running together in a datacenter
  12. 12. Cloud Deployment A B C A1 A1 A2 A3B1 C1 C2 C3 B2 B3 C2 B4 B5 C2 C4 Applications with diverse requirements running together in a datacenter DB Backup ETL
  13. 13. Processes on Machines Machine ContainerProcess VM
  14. 14. Processes on Machines TaskTaskProcess No Isolation Machine ContainerProcess VM
  15. 15. Processes on Machines TaskTaskProcess 128 MB 128 MB 128 MB Process Process Process No Isolation VM-based Isolation Machine ContainerProcess VM
  16. 16. Processes on Machines TaskTaskProcess 256 MB 64 MB 128 MB 128 MB 128 MB Process Process Process Process Process No Isolation VM-based Isolation Container-based Isolation Machine ContainerProcess VM
  17. 17. • Run as individual processes – Poor isolation or poor utilization • Virtual machines – Better isolation – Xen, Hyper-V, ESX, KVM • Containers – cgroup – YARN, Mesos – Super lightweight, dynamic based on application requirements Processes on Machines
  18. 18. Processes on Machines Virtualization and containerization significantly improve process isolation and open up possibilities for efficient utilization of physical resources
  19. 19. Container-Based Solution
  20. 20. Container-Based Solution System Requirements A B C 64 MB 64 MB 64 MB 128 MB 128 MB 256 MB
  21. 21. Container-Based Solution Allocation 64 MB 64 MB 128 MB 256 MB 128 MB 64 MB Machine Container
  22. 22. Container-Based Solution Allocation 64 MB 64 MB 128 MB 256 MB 128 MB 64 MB Machine Container A A A B B C Process
  23. 23. Container-Based Solution Allocation 64 MB 64 MB 128 MB 256 MB 128 MB 64 MB Containerization is powerful! Machine Container A A A B B C Process
  24. 24. Container-Based Solution Allocation 64 MB 64 MB 128 MB 256 MB 128 MB 64 MB Containerization is powerful! Machine Container A A A B B C Process But do processes always fit so nicely?
  25. 25. Over-Utilization 256 MB Container-Based Solution Machine ContainerProcess
  26. 26. Over-Utilization 256 MB Process 1 Container-Based Solution Machine ContainerProcess
  27. 27. Over-Utilization Outcome: Preemption and relaunch 256 MB Process 1 Container-Based Solution Machine ContainerProcess
  28. 28. Over-Utilization Outcome: Preemption and relaunch Container-Based Solution 384 MB Machine ContainerProcess
  29. 29. Over-Utilization Outcome: Preemption and relaunch Container-Based Solution 384 MBProcess 1 Machine ContainerProcess
  30. 30. Under-Utilization 384 MB 128 MB Container-Based Solution Machine ContainerProcess
  31. 31. Under-Utilization Outcome: Over-provisioned until restart 384 MB Process 1 128 MB Container-Based Solution Machine ContainerProcess Process 2
  32. 32. Container-Based Solution Failure 64 MB 64 MB 128 MB 256 MB 128 MB 64 MB Machine Container A A A B B C Process
  33. 33. Container-Based Solution Failure 64 MB 64 MB 128 MB 128 MB Machine Container A A B B Process
  34. 34. Container-Based Solution Failure 64 MB 64 MB 128 MB 128 MB Outcome: Launch containers elsewhere Machine Container A A B B Process 256 MBC 64 MBA What about stateful systems?
  35. 35. Container-Based Solution Failure 64 MB 64 MB 128 MB 256 MB 128 MB 64 MB Machine Container SLAVE SLAVE MASTER B B C Process
  36. 36. Container-Based Solution Failure 64 MB 64 MB 128 MB 128 MB Without additional information, the master is unavailable until restart Machine Container SLAVE SLAVE B B Process
  37. 37. Scaling Container-Based Solution Machine ContainerProcess 256 MB50% 256 MB50%
  38. 38. Scaling Container-Based Solution Machine ContainerProcess
  39. 39. Scaling Container-Based Solution Machine ContainerProcess 128 MB33% 128 MB33% 128 MB33% Outcome: Relaunch with new sharding
  40. 40. Container-Based Solution Container-Based Solution Utilization Application requirements define container size Fault Tolerance New container is started Scaling Workload is repartitioned and new containers are brought up Discovery Existence
  41. 41. Container-Based Solution We need something finer-grained The container model provides flexibility within machines, but assumes homogeneity of tasks within containers
  42. 42. Task-Based Solution
  43. 43. Task-Based Solution System Requirements A B C complete in less than 5 hours always have 2 containers running response time should be less than 50 ms
  44. 44. Task-Based Solution Allocation Machine Container A A B Task B C C
  45. 45. Over-Utilization Task-Based Solution Machine ContainerTask
  46. 46. Over-Utilization Task-Based Solution Task 1 Machine ContainerTask
  47. 47. Over-Utilization Task-Based Solution Task 1 Machine ContainerTask
  48. 48. Over-Utilization Task-Based Solution Task 1 Machine ContainerTask Task 1
  49. 49. Over-Utilization Task-Based Solution Hide the overhead of a container restart Machine ContainerTask Task 1
  50. 50. Under-Utilization 384 MB 128 MB Task-Based Solution Machine ContainerTask
  51. 51. Under-Utilization 384 MB Task 1 128 MB Task 2 Task-Based Solution Machine ContainerTask
  52. 52. Under-Utilization Optimize container allocations based on usage 384 MB Task 1 Task 2 Task-Based Solution Machine ContainerTask
  53. 53. Task-Based Solution Failure Task 1 Leader Task 2 Leader Task 3 Leader Task 2 Standby Task 3 Standby Task 1 Standby Task 2 Standby Task 1 Standby Task 3 Standby Machine Container
  54. 54. Task-Based Solution Failure Task 1 Leader Task 2 Leader Task 2 Standby Task 3 Standby Task 1 Standby Task 3 Standby Task 3 Leader Machine Container
  55. 55. Task-Based Solution Failure Some systems cannot wait for new containers to start Task 1 Leader Task 2 Leader Task 2 Standby Task 3 Standby Task 1 Standby Task 3 Standby Task 3 Leader Machine Container
  56. 56. Task-Based Solution Discovery Task 1 Leader Task 2 Leader Task 2 Standby Machine Container Task 1:! Leader at N1 Standby at N2 Task 1 Standby Task 2:! Leader at N2 Standby at N1 N1 N2
  57. 57. Task-Based Solution Discovery Task 1 Leader Task 2 Leader Task 2 Standby Machine Container Learn where everything runs, and what state each task is in Task 1:! Leader at N1 Standby at N2 Task 1 Standby Task 2:! Leader at N2 Standby at N1 N1 N2
  58. 58. Scaling Task-Based Solution T4 T5 T6 T1 T2 T3 Machine ContainerTask
  59. 59. Scaling Task-Based Solution T4 T5 T6 T1 T2 T3 Machine ContainerTask
  60. 60. Scaling Task-Based Solution T4 T5 T6 T1 T2 T3 Machine ContainerTask
  61. 61. Scaling Task-Based Solution T4 T5 T6 T1 T2 T3 Machine ContainerTask
  62. 62. Comparing Solutions Container Solution Task + Container Solution Utilization Application requirements define container size Tasks are distributed as needed to a minimal container set as per SLA Fault Tolerance New container is started Existing task can assume a new state while waiting for new container Scaling Workload is repartitioned and new containers are brought up Tasks are moved across containers Discovery Existence Existence and state
  63. 63. Benefits of a Task-Based Solution Comparing Solutions Container reuse Minimize overhead of container relaunch Fine-grained scheduling
  64. 64. Benefits of a Task-Based Solution Comparing Solutions Container reuse Minimize overhead of container relaunch Fine-grained scheduling Task : Container :: Thread : Process Task is the right level of abstraction
  65. 65. Working at task granularity is powerful We need a performance-centric approach to resource assignment Comparing Solutions
  66. 66. Working at task granularity is powerful How can Helix help? We need a performance-centric approach to resource assignment Comparing Solutions
  67. 67. Working at task granularity is powerful How can Helix help? We need a performance-centric approach to resource assignment Comparing Solutions YARN/Mesos: containers bring flexibility in a machine Helix: tasks bring flexibility in a container
  68. 68. Task Management with Helix
  69. 69. Application Lifecycle Capacity Planning Provisioning Fault Tolerance State Management Allocating physical resources for your load Deploying and launching tasks Staying available, ensuring success Determining what code should be running and where
  70. 70. Resource Partition PartitionPartition Helix Overview Resources (Task Groups) master slave offline All partitions can be replicated. All resources can be partitioned (these are tasks). Each replica is in a state.
  71. 71. Offline Master Slave State Model and Constraints Helix Overview State Constraints Transition Constraints Partition Master: [1, 1] Slave: [0, 2] Max T1 transitions in parallel Resource - Max T2 transitions in parallel Node No more than 10 partitions Max T3 transitions in parallel Cluster - Max T4 transitions in parallel StateCount=2 StateCount=1
  72. 72. Offline Master Slave State Model and Constraints Helix Overview State Constraints Transition Constraints Partition Master: [1, 1] Slave: [0, 2] Max T1 transitions in parallel Resource - Max T2 transitions in parallel Node No more than 10 partitions Max T3 transitions in parallel Cluster - Max T4 transitions in parallel StateCount=2 StateCount=1 Helix manages task state
  73. 73. Controller PARTICIPANTS Spectators Controller Controller Manage TASKS Helix Overview Cluster Roles
  74. 74. Helix Controller High-Level Overview Rebalancer Task Assignment Constraints Nodes “single master” “no more than 3 tasks per instance”
  75. 75. Helix Controller Rebalancer ResourceAssignment computeResourceMapping( RebalancerConfig rebalancerConfig, ResourceAssignment prevAssignment, Cluster cluster, ResourceCurrentState currentState); Based on the current nodes in the cluster and constraints, find an assignment of task to node
  76. 76. Helix Controller Rebalancer ResourceAssignment computeResourceMapping( RebalancerConfig rebalancerConfig, ResourceAssignment prevAssignment, Cluster cluster, ResourceCurrentState currentState); Based on the current nodes in the cluster and constraints, find an assignment of task to node What else do we need?
  77. 77. Helix Controller What is Missing? Dynamic Container Allocation Container Isolation Automated Service Deployment Resource Utilization Monitoring
  78. 78. Helix Controller Target Provider Based on some constraints, determine how many containers are required in this system Fixed CPU Memory Bin Packing We’re working on integrating with monitoring systems in order to query for usage information
  79. 79. Helix Controller Target Provider Based on some constraints, determine how many containers are required in this system TargetProviderResponse evaluateExistingContainers( Cluster cluster, ResourceId resourceId, Collection<Participant> participants); class TargetProviderResponse { List<ContainerSpec> containersToAcquire; List<Participant> containersToRelease; List<Participant> containersToStop; List<Participant> containersToStart; } Fixed CPU Memory Bin Packing We’re working on integrating with monitoring systems in order to query for usage information
  80. 80. Helix Controller Adding a Target Provider Rebalancer Task Assignment Constraints Nodes Target Provider
  81. 81. Helix Controller Adding a Target Provider Rebalancer Task Assignment Constraints Nodes Target Provider How do we use the target provider response?
  82. 82. Helix Controller Container Provider Given the container requirements, ensure that number of containers are running YARN Mesos Logical
  83. 83. Helix Controller Container Provider Given the container requirements, ensure that number of containers are running ListenableFuture<ContainerId> allocateContainer(ContainerSpec spec); ! ListenableFuture<Boolean> deallocateContainer(ContainerId containerId); ! ListenableFuture<Boolean> startContainer(ContainerId containerId, Participant participant); ! ListenableFuture<Boolean> stopContainer(ContainerId containerId); YARN Mesos Logical
  84. 84. Helix Controller Logical Container Provider
  85. 85. Helix Controller Logical Container Provider use use startContainer
  86. 86. Helix Controller Adding a Container Provider Rebalancer Task Assignment Constraints Nodes Target Provider Container Provider Target Provider + Container Provider = Provisioner
  87. 87. Application Lifecycle Capacity Planning Provisioning Fault Tolerance State Management Target Provider Container Provider Existing Helix Controller (enhanced by Provisioner) Existing Helix Controller (enhanced by Provisioner) With Helix and the Task Abstraction
  88. 88. System Architecture
  89. 89. System Architecture Resource Provider
  90. 90. System Architecture submit job Resource ProviderClient
  91. 91. System Architecture submit job Resource Provider Controller Container Provisioner Rebalancer Client App Launcher
  92. 92. System Architecture submit job Resource Provider Controller Container Provisioner Rebalancer Client container request App Launcher
  93. 93. System Architecture submit job Resource Provider Controller Container Provisioner Rebalancer Client container request Participant Container Participant Launcher Helix Participant App App Launcher
  94. 94. System Architecture submit job Resource Provider Controller Container Provisioner Rebalancer Client container request Participant Container Participant Launcher Helix Participant App App Launcher assign tasks
  95. 95. HDFS/Common Area Helix + YARN YARN Architecture Client Resource Manager Application Master Container Node Manager Node Manager submit job node statusnode status container request assign work status App Package grab package
  96. 96. HDFS/Common Area Helix + YARN Helix + YARN Architecture Client Resource Manager Application Master Container Node Manager Node Manager submit job node statusnode status container request assign tasks status Helix Controller Rebalancer Helix Participant App App Package grab package
  97. 97. HDFS/Common Area Scheduler Slave Helix + Mesos Mesos Architecture Scheduler Mesos Master Slave Machine Slave Machine Mesos Slave Mesos Slave offer resources node statusnode status Mesos Executor grab executor Executor Package offer response
  98. 98. Scheduler Slave Helix Controller Helix + Mesos Helix + Mesos Architecture Scheduler Mesos Master Slave Machine Slave Machine Mesos Slave Mesos Slave offer resources node statusnode status assign tasks HDFS/Common Area Mesos Executor grab executor Helix Executor Package offer response Participant Participant
  99. 99. Example
  100. 100. Distributed Document Store Overview Oracle Partition 0 Partition 1 Partition 2 Oracle Partition 0 Partition 1 Partition 2 P1 BackupP2 Backup HDFS ETL ETL Master Slave Oracle Partition 0 Partition 1 Partition 2 P0 Backup ETL
  101. 101. Distributed Document Store Overview Oracle Partition 0 Partition 1 Partition 2 Oracle Partition 0 Partition 1 Partition 2 P1 BackupP2 Backup HDFS ETL ETL Master Slave P0 Backup Partition 0 Partition 1 Partition 2
  102. 102. Distributed Document Store YARN Example Client Resource Manager submit job container request assign work status node status Application Master Node Manager Helix Controller Rebalancer Container Node Manager node status Helix Participant OraclePartition 0 Partition 1 P1 Backup ETL
  103. 103. YAML Specification appConfig: { config: { k1: v1 } } appPackageUri: 'file://path/to/myApp-pkg.tar' appName: myApp services: [DB, ETL] # the task containers serviceConfigMap: {DB: { num_containers: 3, memory: 1024 }, ... ETL: { time_to_complete: 5h, ... }, ...} servicePackageURIMap: { DB: ‘file://path/to/db-service-pkg.tar', ... } ... Distributed Document Store
  104. 104. YAML Specification appConfig: { config: { k1: v1 } } appPackageUri: 'file://path/to/myApp-pkg.tar' appName: myApp services: [DB, ETL] # the task containers serviceConfigMap: {DB: { num_containers: 3, memory: 1024 }, ... ETL: { time_to_complete: 5h, ... }, ...} servicePackageURIMap: { DB: ‘file://path/to/db-service-pkg.tar', ... } ... Distributed Document Store TargetProvider specification
  105. 105. Service/Container Implementation public class MyQueuerService extends StatelessParticipantService { @Override public void init() { ... } ! @Override public void onOnline() { ... } ! @Override public void onOffline() { ... } } Distributed Document Store
  106. 106. Task Implementation public class BackupTask extends Task { @Override public ListenableFuture<Status> start() { ... } ! @Override public ListenableFuture<Status> cancel() { ... } ! @Override public ListenableFuture<Status> pause() { ... } ! @Override public ListenableFuture<Status> resume() { ... } } Distributed Document Store
  107. 107. Distributed Document Store State Model-Style Callbacks public class StoreStateModel extends StateModel { public void onBecomeMasterFromSlave() { ... } ! public void onBecomeSlaveFromMaster() { ... } ! public void onBecomeSlaveFromOffline() { ... } ! public void onBecomeOfflineFromSlave() { ... } }
  108. 108. class  RoutingLogic  {        public  void  write(Request  request)  {            partition  =  getPartition(request.key);            List<Participant>  nodes  =                    routingTableProvider.getInstance(                            partition,  “MASTER”);            nodes.get(0).write(request);        }   !      public  void  read(Request  request)  {            partition  =  getPartition(request.key);            List<Participant>  nodes  =                    routingTableProvider.getInstance(partition);            random(nodes).read(request);        }   } Spectator (for Discovery) Distributed Document Store
  109. 109. Recap • Container abstraction has become a huge win • With Helix, we can go a step further and make tasks the unit of work • With the TargetProvider and ContainerProvider abstractions, any popular provisioner can be plugged in
  110. 110. Questions? Jason zzhang@apache.org Kanak kanak@apache.org; @kinselofant Website helix.apache.org Dev Mailing List dev@helix.apache.org User Mailing List user@helix.apache.org Twitter @apachehelix ?

×