Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Managing Hadoop, HBase and Storm Clusters at Yahoo Scale

677 views

Published on

Managing Hadoop, HBase and Storm Clusters at Yahoo Scale

Published in: Technology
  • Be the first to comment

Managing Hadoop, HBase and Storm Clusters at Yahoo Scale

  1. 1. Managing Hadoop, HBase and Storm Clusters at Yahoo Scale PRESENTED BY Dheeraj Kapur, Savitha Ravikrishnan⎪June 30, 2016
  2. 2. Agenda Topic Speaker(s) Introduction, HDFS RU, HBase RU & Storm RU Dheeraj Kapur YARN RU, Component RU, Distributed Cache & Sharelib Savitha Ravikrishnan Q&A All Presenters HadoopSummit 2016
  3. 3. Hadoop at Yahoo
  4. 4. Grid Infrastructure at Yahoo HadoopSummit 2016 ▪ A multi-tenant, secure, distributed compute and storage environment, based on Hadoop stack for large scale data processing ▪ 3 data centers, over 45k physical nodes. ▪ 18 YARN (Hadoop) clusters, having 350 to 5200 nodes. ▪ 9 HBase clusters, having 80 to 1080 nodes. ▪ 13 Storm clusters, having 40 to 250 nodes
  5. 5. Grid Stack Zookeeper Backend Support Hadoop Storage Hadoop Compute Hadoop Services Support Shop Monitoring Starling for logging HDFS Hbase as NoSql store Hcatalog for metadata registry YARN (Mapred) and Tez for Batch processing Storm for stream processing Spark for iterative programming PIG for ETL Hive for SQL Oozie for workflows Proxy services GDM for data Mang Café on Spark for ML
  6. 6. Deployment Model DataNode NodeManager NameNode R M DataNodes RegionServers NameNode HBase Master Nimbu s Supervisor Administration, Management and Monitoring ZooKeeper Pools HTTP/HDFS/GDM Load Proxies Applications and Data Data Feeds Data Stores Oozie Server HS2/ HCat HadoopSummit 2016
  7. 7. HDFS
  8. 8. Hadoop Rolling Upgrade ▪ Complete CI/CD for HDFS and YARN Upgrades ▪ Build software and config “tgz” and push to repo servers ▪ Installs software and configs in pre-deploy phase, activate during upgrade ▪ Slow upgrade 1 node per cycle ▪ Each component is upgraded independently i.e HDFS, YARN & Client HadoopSummit 2016 Release Configs/Bundles: --- doc: This file is auto generated packages: - label: hadoop version: 2.7.2.13.1606200235-20160620-000 - label: conf version: 2.7.2.13.1606200235-20160620-000 - label: gridjdk version: 1.7.0_17.1303042057-20160620-000 - label: yjava_jdk version: 1.8.0_60.51-20160620-000
  9. 9. Package Download (pre- deploy) RU process Git (release info) Namenode, Datanodes, Resourcemanager HBaseMaster, Regionserver, Gateways Repo Farm Jenkins Start Servers /Cluster ygrid-deploy-software
  10. 10. CI/CD process Git (release info) Jenkins Start HDFS Upgrade RU process Finalize RU Create Dir Structure Put NN in RU mode SNN Upgrade NN Failover SNN Upgrade foreach DN Select DN Check installed version Stop DN Activate new software Start DN Wait for DN to join Stop/termina te RU on X failures 1 2 3a 3b 3c 4a 4b 4c 4d 4e 4f After 100 hosts are successfully upgraded Check HDFS used %age, Live nodes consistency on NNs Terminate Upgrade incase of more than X failure Involves service and IP failover from NN to SNN and vice versa Safeupgrade-dn
  11. 11. Hadoop 2.7.x improvements over 2.6.x Performance ▪ Reduce NN failover by parallelizing the quota init ▪ Datanode layout inefficiency causing high I/O load. ▪ Use a offline upgrade script to speed up the layout upgrade. ▪ Adding fake metrics sink to subvert JMX cache fix, causing delays in datanode upgrade/health check. ▪ Improved datanode shutdown speed Failure handling ▪ Reduce the read/write failures by blocking clients until DN is fully initialized.
  12. 12. YARN
  13. 13. YARN Rolling Upgrade ▪ Minimize downtime, maximize service availability ▪ Work preserving restart on RM and NM ▪ Retains state for 10mins. ▪ Ensures that applications continuously run during a RM restart ▪ Save state, update software, restart and restore state. ▪ Uses leveldb as state store ▪ After RM restarts, it loads all the application metadata and other credentials from state-store and populates them into memory. HadoopSummit 2016
  14. 14. CI/CD process Git (release info) Jenkins Start YARN Upgrade RU process Create Dir Structure Resource Manager Upgrade HistoryServer Upgrade Foreach NM Select NM Check installed version Safestop NM (kill -9) Activate new software Start NM Wait for NM to join Stop/termina te RU on X failures Timeline Server Upgrade 1 2 2a 2b 2c 2d 2e 3 4 5 Terminate Upgrade incase of more than X failure
  15. 15. Distributed cache & Sharelib
  16. 16. Distributed Cache ▪ Distributed cache distributes application-specific, large, read-only files efficiently. ▪ Applications specify the files to be cached in URLs (hdfs://) in the Job ▪ DistributedCache tracks the modification timestamps of the cached files. ▪ DistributedCache can be used to distribute simple, read-only data or text files and more complex types such as archives and JAR files. HadoopSummit 2016
  17. 17. Sharelib ▪ "Sharelib" is a management system for a directory in HDFS named /sharelib, which exists on every cluster. ▪ Shared libraries can simplify the deployment and management of applications. ▪ The target directory is /sharelib, under which you will find various things: /sharelib/v1 - where all the packages are • /sharelib/v1/conf - where the unique metafile for the cluster is (and all previous versions) • /sharelib/v1/{tez, pig, ... } - where the package versions are kept ▪ The links/tags (metafile) are unique per cluster. ▪ Grid Ops maintains shared libraries on HDFS of each cluster ▪ Packages in shared libraries include mapreduce, pig, hbase, hcatalog, hive and oozie. HadoopSummit 2016
  18. 18. Jenkins Start Sharelib Uploader Git Bundles Verify Dist Cache Download toDo packages Dist repo Re-package and upload package Re-generate Meta info (HDFS) Upload to Oozie Sharelib Update Generate clients to update
  19. 19. Subsystems
  20. 20. Component Upgrade HadoopSummit 2016 ▪ New Releases : CI environment continuously releases certified builds & their versions. ▪ Generate state : Package rulesets contain the list of core packages and their dependencies for each & every cluster ▪ Deploy cookbooks : contain chef code and configuration that is pushed to Chef server ▪ Deploy pipelines : are YAML files that specify the flow & order of the deploy for every environment/cluster. ▪ Validation jobs : are run after a deploy completes on all the nodes which ensures end-to-end functionality is working as expected.
  21. 21. Components Upgrade CI process Component versions Git Bundles Certified Releases Rule set files (cluster: component specific) Git bundles Certified package version info Statefiles Build Farms Cookbook, Roles, Env, Attribute files Git (release info) Build Farms Artifactory Ruby (Rake) New Release Package Rulesets Deploy cookbooks A B Build Farms Rspec rubocop, state generate, compare & upload Validate increment version 1 2 3 Chef
  22. 22. CD process Components Upgrade cont.. Git (release info) Build Farms Statefiles Deploy Pipeline Component Node Ruby (Rake) Min size, zerodowntime check, targetsize, validate Chef-client, cookbook-converge, graceful shutdown and healthcheck 4 Chef A B
  23. 23. HBase
  24. 24. HBase Rolling Upgrade Release Configs: default: group: 'all' command: 'start' system: 'ALL' verbose: 'true' retry: 3 upgradeREST: 'false' upgradeGateway: 'true' dryrun: 'false' force: 'false' upgrade_type: 'rolling' skip_nn_upgrade: 'false' skip_master_upgrade: 'false' Workflow definitions: default: continue_on_failure: - broken - badnodes relux.red: - master - default - user - ca_soln-stage - perf,perf2,projects - restALL ▪ Workflow based system. ▪ Complete CI/CD for HDFS and HBase Upgrades ▪ Build tgz and push to repo servers ▪ Installs software before hand, activate new release during upgrade ▪ Each component and Region group is upgraded independently i.e HDFS, group of regionservers.
  25. 25. CI/CD process Git (release info) Jenkins Start Put NN in RU mode & Upgrade NN SNN Master Upgrade Region- server Upgrade process Stargate Upgrade Gateway Upgrade HBase Upgrade Foreach DN/RS Upgrade regionserver Repo Server Package + conf version Stop Regionserver DN Safeupgrade, Stop DN Upgrade and Start DN Upgrade and Start RS 1 2 3 4 3a 3c 3b 3d 3e 3f 3f 5 HDFS Rolling Upgrade process Iterate over each group Iterate over each server in a group
  26. 26. STORM
  27. 27. Storm Rolling Upgrade Release Configs: default: parallel: 10 verbose: 'true' retry: 3 dryrun: 'false' upgrade_type: 'rolling' quarantine: 'true' terminate_on_failure: 'true' sup_failure_threshold: 10 sendmail_to: 'dheerajk@yahoo-inc.com' sendmail_cc: 'storm-devel@yahoo-inc.com, grid-ops@yahoo-inc.com' cluster_workflow: cluster1.colo1: pacemaker_drpc cluster2.colo2: default Workflow Defination: default: rolling_task: - upgradeNimbus - bounceNimbus - upgradeSupervisor - bounceSupervisor - upgradeDRPC - bounceDRPC - upgradeGateways - doGatewayTask - verifySupervisor - runDRPCTestTopology - verifySoftwareVersion full_upgrade_task: - killAllTopologies - specifyOperation_stop - sleep10 - bounceNimbus - bounceSupervisor - bounceDRPC - clearDiskCache - cleanZKP - upgradeNimbus - upgradeSupervisor - upgradeDRPC - specifyOperation_start - bounceNimbus - bounceSupervisor - bounceDRPC - upgradeGateways - doGatewayTask - verifySupervisor - runDRPCTestTopology - verifySoftwareVersion ▪ Complete CI/CD system. Statefiles are build per component and pushed to artifactory before upgrade ▪ Installs software before hand, activate new release during upgrade ▪ Each component is upgraded independently i.e Pacemaker, Nimbus, DRPC & Supervisor
  28. 28. Storm Upgrade CI/CD process Git (release info) Jenkins Start Artifactory (State files & Release info) RE Jenkins and SD process Pacemaker Upgrade Nimbus Upgrade Supervisor Upgrade Bounce Workers DRPC Upgrade DRPC Upgrade Verify Supervisors Run Test/Validatio n topology Audit All Components RE Jenkins lets to statefile generation for each component and updates git with release info Statefiles are published in artifactory and downloaded during upgrade Upgrade fails if more than X supervisors fails to upgrade
  29. 29. Rolling Upgrade timeline Component Parallelism Hadoop 2.6.x Hadoop 2.7.x Hbase 0.98.x Storm 0.10.1.x HDFS (4k nodes) 1 4 days 1 day X X YARN (4k nodes) 1 1 day 1 day X X HBase (1k nodes) 1-4 4-5 days X 4-5 days X Storm (350 nodes) 10 X X X 4-6 hrs Components 1 1-2 hrs 1-2 hrs 1-2 hrs X HadoopSummit 2016
  30. 30. 99.928 99.898 99.940 99.687 99.705 99.600 99.650 99.700 99.750 99.800 99.850 99.900 99.950 100.000 AB DB FB HB IB JB LB PB UB BT LT PT TT UT BR DR IR LR MR PR Rolling Upgrade Impact YTD Availability by Cluster 99.990
  31. 31. Thank You HadoopSummit 2016

×