Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
YUMING LIANG
VOIDBOX - DOCKER ON YARN
WHAT IS HULU
GAMING • CONNECTED TVS • MOBILE • COMPUTER
540CONTENT PARTNERS
+!
75+DISTRIBUTION PARTNERS
100+HULU BEIJING
WHAT IS AUDIENCE PLATFORM
WORKFLOW ENGINE
HADOOP ECOSYSTEM: YARN + HDFS + MAP/REDUCE + HBASE + ZOOKEEPER
CONFIGURATION SER...
AGENDA
•  Why Voidbox"
•  What is Voidbox"
•  Architecture"
•  How to use Voidbox"
•  Voidbox in Hulu"
•  What’s next"
•  ...
WHY VOIDBOX	
YARN: DATA OPERATION SYSTEM
CLUSTER
MAP /
REDUCE
TEZ SLIDER
HIVE PIG HBASESTORM
USER APPLICATION
Why only big...
WHY VOIDBOX	
YARN: DISTRIBUTED OPERATION SYSTEM
CLUSTER
MAP /
REDUCE
TEZ SLIDER
HIVE PIG HBASESTORM
USER APPLICATION
TASK
...
WHY VOIDBOX	
20+ TASK IN
DIFFERENT
LANGUAGE
20+ VIRTUAL
MACHINE
LOW UTILIZATION
MACHINE FAILURE
VOIDBOX
WHAT IS VOIDBOX
LIGHTWEIGHT
VM
RUNTIME
ISOLATION
UNION
FILE SYSTEM
RESOURCE
MANAGEMENT
SCHEDULING
FAULT
TOLERANCE
DISTRIBU...
WHAT IS VOIDBOX
Core
VOIDBOX MASTER VOIDBOX PROXY
RUNTIME MODE REGISTRY / MGMT
API
Framework
DAG SERVICE TASK
ARCHITECTURE – DOCKER	
Host OS
Server
bins/libs
App
A
App
B
Hypervisor
App
A
Guest OS Guest OS
bins/libs bins/libs
App
B
D...
ARCHITECTURE – DOCKER
ARCHITECTURE – VOIDBOX	
YARN modules	
Resource
Manager
Client
NodeManager
HDFS
NodeManager
HDFS
NodeManager
HDFS
YARN appl...
ARCHITECTURE – VOIDBOX	
YARN modules	
Resource
Manager
Voidbox
Client
NodeManager
NodeManager
NodeManager
Docker Engine
Do...
ARCHITECTURE – VOIDBOX YARN CLUSTER MODE	
YARN modules	
Resource
Manager
Voidbox
Client
NodeManager
NodeManager
Docker Eng...
ARCHITECTURE – VOIDBOX YARN CLIENT MODE	
YARN modules	
Resource
Manager
Voidbox
Client
NodeManager
NodeManager
Docker Engi...
ARCHITECTURE – VOIDBOX YARN LOCAL MODE	
YARN modules	
Resource
Manager
Voidbox
Client
NodeManager
NodeManager
Docker Engin...
ARCHITECTURE – VOIDBOX FAULT TOLERANCE	
•  Work preserving in case ResourceManager restarts
–  https://issues.apache.org/j...
ARCHITECTURE – VOIDBOX FAULT TOLERANCE	
yarn-site.xml
<property>
<name>yarn.resourcemanager.work-preserving-recovery.enabl...
ARCHITECTURE – VOIDBOX RESOURCE MANAGEMENT	
•  Management Center
–  Add/remove Docker engine in cluster
–  Garbage collect...
ARCHITECTURE – LOGS	
•  VoidboxMaster’s log
–  Specify log4j.properties to rotate master’s log when submit an application
...
CORE
ARCHITECTURE – KEY COMPONENTS	
FRAMEWORK
APP
CORE
ARCHITECTURE – KEY COMPONENTS	
•  Functionality
•  Allocate/release container
•  Launch/stop task in container
•  Mul...
CORE
ARCHITECTURE – KEY COMPONENTS	
•  DAG framework
•  DAGDriver
•  Task scheduler
•  Task retry policy
•  DAGContext
•  ...
CORE
ARCHITECTURE – KEY COMPONENTS	
•  DAG APP
•  DAG programming based on Framework
•  Service APP
•  Service programming...
HOW TO USE VOIDBOX	
Advanced API:
•  AMClient
–  requestResource(cpu, memory)
–  releaseResource(container)
•  DockerTaskR...
P1
P2
C1
C2
C3
How to increase / decrease number of consumers automatically?
HOW TO USE VOIDBOX
HOW TO USE VOIDBOX	
P1
P2
Message Queue
Monitor
Consumer
Manager
Docker
Consumer
Docker
Consumer
Docker
Consumer
AMClient
...
VOIDBOX IN HULU
VOIDBOX IN HULU	
RESOURCE BOUND
HIGHLY
PARALLELIZABLE
COMPLEX
ENVIRONMENT
DEPENDENCIES
VOIDBOX APPLICATION
VOIDBOX BENEFITS	
•  Ease creating distributed applications
–  Build-in service discovery, fault tolerance, scheduling, co...
WHAT’S NEXT	
YARN
VOIDBOX
MESOS
MAP
REDUCE
/ SPARK /
AURORA /
MARATHON /
KUBERNETES
DOCKER
M / R
SPARK
HIVE
TEZ
HBASE
STOR...
QUESTIONS
Q & A
THANK YOU
BDTC2015 hulu-梁宇明-voidbox - docker on yarn
BDTC2015 hulu-梁宇明-voidbox - docker on yarn
BDTC2015 hulu-梁宇明-voidbox - docker on yarn
Upcoming SlideShare
Loading in …5
×

BDTC2015 hulu-梁宇明-voidbox - docker on yarn

549 views

Published on

From http://www.csdn.net/article/2015-12-17/2826501
《Hulu 资深研发主管梁宇明 :Voidbox - Docker On YARN在Hulu的实践》
Docker 技术越来越得到了很多开发者的青睐,而YARN对于多数爱好者来说还是一个比较新的产品平台。如果两者放在一起融化会发生什么事情呢?来自Hulu公司的资深研发主管梁宇明为大家讲解了这一神奇的经历。他的演讲题目是《Voidbox - Docker On YARN在Hulu的实践》。因为基于YARN的大数据计算平台使得不同的计算框架可以在同一集群中混合部署,进而提升了集群资源利用率。

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

BDTC2015 hulu-梁宇明-voidbox - docker on yarn

  1. 1. YUMING LIANG VOIDBOX - DOCKER ON YARN
  2. 2. WHAT IS HULU GAMING • CONNECTED TVS • MOBILE • COMPUTER
  3. 3. 540CONTENT PARTNERS +! 75+DISTRIBUTION PARTNERS 100+HULU BEIJING
  4. 4. WHAT IS AUDIENCE PLATFORM WORKFLOW ENGINE HADOOP ECOSYSTEM: YARN + HDFS + MAP/REDUCE + HBASE + ZOOKEEPER CONFIGURATION SERVICE KILLUA / MATRIX FIREWORK SPARK VOIDBOX DOCKER ON YARN ESTIMATION SERVICE NESTOBATCH FLOW: ETL + PROCESSING + SERVING REAL-TIME FLOW: ETL + PROCESSING + SERVING Lambda Architecture SEGMENTATION SYSTEM OLAP
  5. 5. AGENDA •  Why Voidbox" •  What is Voidbox" •  Architecture" •  How to use Voidbox" •  Voidbox in Hulu" •  What’s next" •  Q&A"
  6. 6. WHY VOIDBOX YARN: DATA OPERATION SYSTEM CLUSTER MAP / REDUCE TEZ SLIDER HIVE PIG HBASESTORM USER APPLICATION Why only big data application? Can we run other application on yarn? What about task processing, web service?
  7. 7. WHY VOIDBOX YARN: DISTRIBUTED OPERATION SYSTEM CLUSTER MAP / REDUCE TEZ SLIDER HIVE PIG HBASESTORM USER APPLICATION TASK SYSTEM WEB SERVICE OTHER Data App Other App Is there any demand? VOIDBOX
  8. 8. WHY VOIDBOX 20+ TASK IN DIFFERENT LANGUAGE 20+ VIRTUAL MACHINE LOW UTILIZATION MACHINE FAILURE VOIDBOX
  9. 9. WHAT IS VOIDBOX LIGHTWEIGHT VM RUNTIME ISOLATION UNION FILE SYSTEM RESOURCE MANAGEMENT SCHEDULING FAULT TOLERANCE DISTRIBUTE COMPUTATION ARBITRARY APPLICATION FAULT TOLERANCE VOIDBOX IS A PROGRAMMING FRAMEWORK
  10. 10. WHAT IS VOIDBOX Core VOIDBOX MASTER VOIDBOX PROXY RUNTIME MODE REGISTRY / MGMT API Framework DAG SERVICE TASK
  11. 11. ARCHITECTURE – DOCKER Host OS Server bins/libs App A App B Hypervisor App A Guest OS Guest OS bins/libs bins/libs App B Docker Engine App A bins/libs bins/libs App B Docker Image Docker Registry DOCKER IMAGE DOCKER ENGINE DOCKER CONTAINERDOCKER CONTAINERDOCKER CONTAINER Runtime Isolation bins/libs App B’
  12. 12. ARCHITECTURE – DOCKER
  13. 13. ARCHITECTURE – VOIDBOX YARN modules Resource Manager Client NodeManager HDFS NodeManager HDFS NodeManager HDFS YARN application1 Container (MapTask) MR AppMaster Container (MapTask) Client Application Master Container YARN application2
  14. 14. ARCHITECTURE – VOIDBOX YARN modules Resource Manager Voidbox Client NodeManager NodeManager NodeManager Docker Engine Docker Engine Voidbox Master State Server Container Voidbox Proxy Container Voidbox Proxy Docker Registry In GlusterFs HDFS HDFS HDFS Voidbox Driver Pull Voidbox modules Docker modules Voidbox is a standard appliction on top of YARN, just like MapReduce server server server server
  15. 15. ARCHITECTURE – VOIDBOX YARN CLUSTER MODE YARN modules Resource Manager Voidbox Client NodeManager NodeManager Docker Engine Voidbox Master Container Voidbox Proxy HDFS HDFS Voidbox Driver Voidbox modules Docker modules
  16. 16. ARCHITECTURE – VOIDBOX YARN CLIENT MODE YARN modules Resource Manager Voidbox Client NodeManager NodeManager Docker Engine Voidbox Master Container Voidbox Proxy HDFS HDFS Voidbox modules Docker modules Voidbox Driver
  17. 17. ARCHITECTURE – VOIDBOX YARN LOCAL MODE YARN modules Resource Manager Voidbox Client NodeManager NodeManager Docker Engine Voidbox Master Container Voidbox Proxy HDFS HDFS Voidbox modules Docker modules Voidbox Driver
  18. 18. ARCHITECTURE – VOIDBOX FAULT TOLERANCE •  Work preserving in case ResourceManager restarts –  https://issues.apache.org/jira/browse/YARN-556 target version:2.6.0 –  Improvement of ResourceManager HA •  Container preserving in case NodeManager restarts –  https://issues.apache.org/jira/browse/YARN-1336 –  Avoid multiple ApplicationMasters –  Avoid out-of-control containers in NM crashed machine •  ApplicationMaster retry count reset window –  https://issues.apache.org/jira/browse/YARN-611 •  YARN container -> Docker cli -> Docker container –  Add shutdown hook to recycle docker container in case container unexpected exit –  Deal with voidboxproxy exits without running shutdown hook(kill -9 proxy.pid)
  19. 19. ARCHITECTURE – VOIDBOX FAULT TOLERANCE yarn-site.xml <property> <name>yarn.resourcemanager.work-preserving-recovery.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.am.max-attempts</name> <value>100</value> </property> <property> <name>yarn.nodemanager.recovery.enabled</name> <value>true</value> </property> ApplicationSubmissionContext public abstract void setMaxAppAttempts(int maxAppAttempts); public abstract void setAttemptFailuresValidityInterval(long attemptFailuresValidityInterval); public abstract void setKeepContainersAcrossApplicationAttempts(boolean keepContainers);
  20. 20. ARCHITECTURE – VOIDBOX RESOURCE MANAGEMENT •  Management Center –  Add/remove Docker engine in cluster –  Garbage collection of out-of-control Container(NodeManage crashes) –  History Log Server –  Service discovery
  21. 21. ARCHITECTURE – LOGS •  VoidboxMaster’s log –  Specify log4j.properties to rotate master’s log when submit an application •  Docker container’s log –  Stdout, stderr log •  Copyfile, truncate(0) to rotate docker container’s log •  Rotate log file in host(default:/var/lib/docker/containers/../containerid-json.log) –  A log dir in Docker image •  Map specify log volume, by user-defined log rotate in Docker image
  22. 22. CORE ARCHITECTURE – KEY COMPONENTS FRAMEWORK APP
  23. 23. CORE ARCHITECTURE – KEY COMPONENTS •  Functionality •  Allocate/release container •  Launch/stop task in container •  Multi run mode: yarn-cluster, yarn-client, yarn-local •  Component •  VoidboxMaster •  Container resource allocation/release/preserving •  Container context management •  VoidboxProxy •  Docker container lifecycle management •  Management Center •  Docker engine management •  Garbage collectionFRAMEWORK APP
  24. 24. CORE ARCHITECTURE – KEY COMPONENTS •  DAG framework •  DAGDriver •  Task scheduler •  Task retry policy •  DAGContext •  Describe DAG docker job •  Service framework •  ServiceDriver •  Replication controller •  Health check, warm up •  ServiceContext •  Describe service docker job FRAMEWORK APP
  25. 25. CORE ARCHITECTURE – KEY COMPONENTS •  DAG APP •  DAG programming based on Framework •  Service APP •  Service programming based on Framework •  Register to load balance FRAMEWORK APP
  26. 26. HOW TO USE VOIDBOX Advanced API: •  AMClient –  requestResource(cpu, memory) –  releaseResource(container) •  DockerTaskRunner –  run(task)
  27. 27. P1 P2 C1 C2 C3 How to increase / decrease number of consumers automatically? HOW TO USE VOIDBOX
  28. 28. HOW TO USE VOIDBOX P1 P2 Message Queue Monitor Consumer Manager Docker Consumer Docker Consumer Docker Consumer AMClient TaskRunner
  29. 29. VOIDBOX IN HULU
  30. 30. VOIDBOX IN HULU RESOURCE BOUND HIGHLY PARALLELIZABLE COMPLEX ENVIRONMENT DEPENDENCIES VOIDBOX APPLICATION
  31. 31. VOIDBOX BENEFITS •  Ease creating distributed applications –  Build-in service discovery, fault tolerance, scheduling, communication –  Support complex environment dependencies •  Improve cluster efficiency –  Share cluster with other data application •  Simplify deployment –  Package only –  No deployment •  Flexibility –  Programming interface
  32. 32. WHAT’S NEXT YARN VOIDBOX MESOS MAP REDUCE / SPARK / AURORA / MARATHON / KUBERNETES DOCKER M / R SPARK HIVE TEZ HBASE STORM SLIDER PAAS SAAS DISTRIBUTED APPLICATION
  33. 33. QUESTIONS Q & A
  34. 34. THANK YOU

×