02.28.13 WANdisco ApacheCon 2013


Published on

WANdisco is a provider of non-stop software for global enterprises to meet the challenges of Big Data and distributed software development.

KEY HIGHLIGHTS, Session 1: Tuesday, Feb. 26, 5:15 p.m.-6 p.m.
Hadoop and HBase on the Cloud: A Case Study on Performance and Isolation

Cloud infrastructure is a flexible tool to orchestrate multiple Hadoop and HBase clusters, which provides strict isolation of data and compute resources for multiple customers. Most importantly, our benchmarks show that virtualized environment allows for higher average utilization of per-node resources. For more session information, visit http://na.apachecon.com/schedule/presentation/131/.

CO-PRESENTERS, Dr. Konstantin V. Shvachko, Chief Architect, Big Data, WANdisco and Jagane Sundar, CTO/VP Engineering, Big Data, WANdisco
A veteran Hadoop developer and respected author, Konstantin Shvachko is a technical expert specializing in efficient data structures and algorithms for large-scale distributed storage systems. Konstantin joined WANdisco through the AltoStor acquisition and before that he was founder and Chief Scientist at AltoScale, a Hadoop and HBase-as-a-Platform company acquired by VertiCloud. Konstantin played a lead architectural role at eBay, building two generations of the organization's Hadoop platform. At Yahoo!, he worked on the Hadoop Distributed File System (HDFS). Konstantin has dozens of publications and presentations to his credit and is currently a member of the Apache Hadoop PMC. Konstantin has a Ph.D. in Computer Science and M.S. in Mathematics from Moscow State University, Russia.

Jagane Sundar has extensive big data, cloud, virtualization, and networking experience and joined WANdisco through its AltoStor acquisition. Before AltoStor, Jagane was founder and CEO of AltoScale, a Hadoop and HBase-as-a-Platform company acquired by VertiCloud. His experience with Hadoop began as Director of Hadoop Performance and Operability at Yahoo! Jagane has such accomplishments to his credit as the creation of Livebackup, development of a user mode TCP Stack for Precision I/O, development of the NFS and PPP clients and parts of the TCP stack for JavaOS for Sun MicroSystems, and more. Jagane received his B.E. in Electronics and Communications Engineering from Anna University.

About WANdisco
WANdisco ( LSE : WAND ) is a provider of enterprise-ready, non-stop software solutions that enable globally distributed organizations to meet today's data challenges of secure storage, scalability and availability. WANdisco's products are differentiated by the company's patented, active-active data replication technology, serving crucial high availability (HA) requirements, including Hadoop Big Data and Application Lifecycle Management (ALM). Fortune Global 1000 companies including AT&T, Motorola, Intel and Halliburton rely on WANdisco for performance, reliability, security and availability. For additional information, please visit www.wandisco.com.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

02.28.13 WANdisco ApacheCon 2013

  1. 1. Hadoop and HBase on the Cloud:A Case Study on Performance and Isolation.Konstantin V. Shvachko Jagane SundarFebruary 26, 2013
  2. 2.  Founders of AltoStor and AltoScale Jagane: WANdisco, CTO and VP Engineering of Big Data– Director of Hadoop Performance and Operability at Yahoo!– Big data, cloud, virtualization, and networking experience Konstantin: WANdisco, Chief Architect– Hadoop, HDFS at Yahoo! & eBay– Efficient data structures and algorithms for large-scale distributed storage systems– Giraffa - file system with distributed metadata & data utilizing HDFS and HBase.Hosted on Apache Extra/ page 2Authors
  3. 3.  The Hadoop Distributed File System(HDFS)– Reliable storage layer– NameNode – namespace and blockmanagement– DataNodes – block replica container MapReduce – distributed computationframework– Simple computational model– JobTracker – job scheduling, resourcemanagement, lifecycle coordination– TaskTracker – task execution module Analysis and transformation of very largeamounts of data using commodityservers/ page 3A reliable, scalable, high performance distributed computing systemWhat is Apache HadoopNameNodeDataNodeTaskTrackerJobTrackerDataNodeTaskTrackerDataNodeTaskTrackerBlockTask
  4. 4.  Table: big, sparse, loosely structured– Collection of rows, sorted by row keys– Rows can have arbitrary number ofcolumns Table is split Horizontally into Regions– Dynamic Table partitioning– Region Servers serve regions toapplications Columns grouped into Column families– Vertical partition of tables Distributed Cache: Regions are loadedin nodes’ RAM– Real-time access to data/ page 4A distributed key-value storage for real-time access to semi-structured dataWhat is Apache HBaseDataNode DataNode DataNodeNameNode JobTrackerRegionServer RegionServerRegionServerTaskTracker TaskTrackerTaskTrackerHBaseMaster
  5. 5. / page 5A Parallel Computational Model and Distributed FrameworkWhat is MapReducedogs C, 3likecatsV, 1C, 2 V, 2C, 3 V, 1C, 8V, 4
  6. 6.  I/O utilization– Can run with the speed of spinning drives– Examples: DFSIO, Terasort (well tuned) Network utilization – optimized by design– Data locality. Tasks executed on nodes where input data resides.No massive transfers– Block replication of 3 requires two data transfers– Map writes transient output locally– Shuffle requires cross-node transfers CPU utilization1. IO bound workloads preclude from using more cpu time2. Cluster provisioning:peak-load performance vs. average utilization tradeoff/ page 6Low Average CPU Utilization on Hadoop ClustersWhat is the Problem
  7. 7.  Computation of Pi– pure CPU workload, no input or output data– Enormous amount of FFTs computing amazingly large numbers– Record Pi run over-heated the datacenter Well tuned Terasort is CPU intensive Compression – marginal utilization gain Production clusters run cold1. IO bound workloads2. Conservative provisioning of cluster resources to meet strict SLAs/ page 7CPU LoadTwo quadrillionth (1015)digit of π is 0
  8. 8.  72 GB - total RAM / node– 4 GB – DataNode– 2 GB – TaskTracker– 16 GB – RegionServer– 2 GB – per individual task: 25 task slots (17 maps and 8 reduces) Average utilization vs peak-load performance– Oversubscription (28 task slots)– Better average utilization– MR Tasks can starve HBase RegionServers Better Isolation of resources → Aggressive resource allocation/ page 8Rule of thumbCluster Provisioning Dilemma
  9. 9.  Goal: Eliminate disk IO contention Faster non-volatile storage devices improve IO performance– Advantage in random reads– Similar performance for sequential IOs More RAM: HBase caching/ page 9With non-spinning storageIncreasing IO Rate
  10. 10.  DFSIO benchmark measures average throughput for IO operations– Write– Read (sequential)– Append– Random Read (new) MapReduce job– Map: same operation write or read for all mappers. Measures throughput– Single reducer: aggregates the performance results Random Reads (MAPREDUCE-4651)– Random Read DFSIO randomly chooses an offset– Backward Read DFSIO reads files in reverse order– Skip Read DFSIO reads seeks ahead after every portion read– Avoid read-ahead buffering– Similar results for all three random read modifications/ page 10Standard Hadoop Benchmark measuring HDFS performanceWhat is DFSIO
  11. 11.  Four node cluster: Hadoop 1.0.3 HBase 0.92.1– 1 master-node: NameNode, JobTracker– 3 slave node: DataNode, TaskTracker Node configuraiton– Intel 8 core processor with hyper-threading– 24 GB RAM– Four 1TB 7200 rpm SATA drives– 1 Gbps network interfaces DFSIO dataset– 72 files of size 10 GB each– Total data read: 7GB– Single read size: 1 MB– Concurrent readers: from 3 to 72/ page 11DFSIOBenchmarking Environment
  12. 12. 0200400600800100012001400160018003 12 24 48 72AggregateThroughput(MB/sec)Concurrent ReadersDisksFlash/ page 12Increasing Load with Random ReadsRandom Reads
  13. 13.  YCSB allows to define a mix of read / write operations,measure latency and throughput– Compares different database: relational and no-SQL– Data is represented as a table of records with number of fixed fields– Unique key identifies each record Main operations– Insert: Insert a new record– Read: Read a record– Update: Update a record by replacing the value of one field– Scan: Scan a random number of consequent records, starting at a random recordkey/ page 13Yahoo! Cloud Serving BenchmarkWhat is YCSB
  14. 14.  Four node cluster– 1 master-node: NameNode, JobTracker, HBase Master, Zookeeper– 3 slave node: DataNode, TaskTracker, RegionServer– Physical master node– 2 to 4 VMs on a slave node. Max 12 VMs YCSB datasets of two different sizes: 10 and 30 million records– dstat collects system resource metrics: CPU, memory usage, disk and network stats/ page 14YCSBBenchmarking Environment
  15. 15. / page 15YCSB WorkloadsWorkloads Insert % Read % Update % Scan %Data Load 100Reads with heavy insert load 55 45Short range scans: workload E 5 95
  16. 16. / page 16Random reads and Scans substantially faster with flashAverage Workloads Throughput0102030405060708090100Data LoadReads withInserts Short rangeScansThroughput(%Ops/sec)DisksFlash
  17. 17. 05001000150020002500300064 256 512 1024Throughput(Ops/sec)Concurrent ThreadsPhysical nodes6 Vms9 Vms12 Vms/ page 17Adding one VM per node increases overall performance 20% on averageShort range Scans: Throughput
  18. 18. / page 18Latency grows linearly with number of threads on physical nodesShort range Scans: Latency02004006008001000120064 256 512 1024AverageLatency(ms)Concurrent ThreadsPhysical nodes6 Vms9 Vms12 Vms
  19. 19. / page 19Virtualized Cluster drastically increases CPU utilizationCPU Utilization comparison4%3% 1%92%CPU Physical nodesuser system wait idle55%23%1%21%CPU Virtualized clusteruser system wait idle Physical node cluster generatesvery light CPU load – 92% idle With VMs the CPU can be drawn close to100% at peaks
  20. 20. / page 20Latency of reads on mixed workload: 45% reads and 55% insertsReads with Inserts010020030040050060070010 million records30 million recordsReadLatency(ms)DisksFlash
  21. 21.  HDFS– Sequential IO is handled well by the disk storage– Flash substantially outperforms disks on workloads with random reads HBase write-only workload provides marginal improvement for flash Using multiple VMs / node provides 100% peak utilization of HW resources– CPU utilization on physical-node clusters is a fraction of its capacity Combination of Flash Storage and Virtualization implieshigh performance of HBase forRandom Read and Reads Mixed with writes workloads Virtualization serves two main functions:– Resource utilization by running more server processes per node– Resource isolation by designating certain percentage of resources to each serverand not letting them starve each other/ page 21VMs allow to utilize Random Read advantage of flash for HadoopConclusions
  22. 22. Thank youKonstantin V. Shvachko Jagane Sundar