Your SlideShare is downloading. ×
0
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Optimizing Dell PowerEdge Configurations for Hadoop

2,649

Published on

Hadoop Hardware configurations for Dell PowerEdge Servers. (Presentation from 2013 Dell Enterprise Forum. )

Hadoop Hardware configurations for Dell PowerEdge Servers. (Presentation from 2013 Dell Enterprise Forum. )

Published in: Technology
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,649
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
90
Comments
0
Likes
6
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Optimizing PowerEdgeConfigurations forHadoopMichael PittaroPrincipal Architect, Big Data SolutionsDell
  • 2. Big Data is when the dataitself is part of the problem.Volume• A large amount of data, growing atlarge ratesVelocity• The speed at which the data must beprocessedVariety• The range of data types and datastructureWhat is Big Data ?
  • 3. Dell | Cloudera Apache Hadoop Solution3Retail Telco Media WebFinance
  • 4. • A Proven Big Data Platform– Cloudera CDH4 Hadoop Distribution with Cloudera Manager– Validated and Supported Reference Architecture– Production deployments across all verticals• Dell Crowbar provides deployment and management at scale– Integrated with Cloudera Manager– Bare metal to deployed cluster in hours– Lifecycle management for ongoing operations• Dell Partner Ecosystem– Pentaho for Data Integration– Pentaho for Reporting and Visualization– Datameer for Spreadsheet style analytics and visualization– Clarity and Dell Implementation ServicesDell | Cloudera Apache Hadoop Solution4
  • 5. • Customers want results– Performance– Predictability– Reliability– Availability– Management– Monitoring• Customers want value• Big Data has many options– Servers– Networking– Software– Tools– Application Code– Fast Evolution• Wide range of applicationsThe Problem with Big Data Projects5
  • 6. • Tested Server Configurations• Tested Network Configurations• Base Software Configuration– Big Data Software– OS Infrastructure– Operational Infrastructure• Predefined configuration– Recommended starting point• Patterns, Use Cases, and BestPractices are emerging in Big Data• Reference Architectures helppackage this knowledge for reuseA Reference Architecture Fills The Gap6
  • 7. • PowerEdge R720, R720XD– Balanced Compute and Storage• PowerEdge C6105– Scale Out Computing– Large Disk Capacity• PowerEdge C8000– Scale Out Computing– Flexible Configuration7Reference Architecture : Servers
  • 8. 1GbE 10GbETop of RackForce 10 S60 Force 10 S4810ClusterAggregationForce 10 S4810 Force 10 S4810BondedConnectionsRedundantNetworkingReference Architecture: Networking8
  • 9. • Hadoop– Cloudera CDH 4– Cloudera Manager– Hadoop Tools• Infrastructure Management– Nagios– Ganglia• Configuration Management– Predefined parameters– Role based configuration9Reference Architecture: SoftwareHivePigHBaseSqoopOozieHueFlumeWhirrZookeeper
  • 10. Tying it all Together: Crowbar10Dell“Crowbar”OpsManagementCore Components &Operating SystemsBig DataInfrastructure & DellExtensionsPhysical ResourcesAPIs, User Access, &Ecosystem PartnersCrowbarDeployerProvisionerNetwork RAIDBIOS IPMINTPDNS LoggingHDFS HBase HiveNagios GangliaPentahoClouderaCloudera PigForce10
  • 11. 11 Revolutionary Cloud SolutionsConfidentialHadoop Node ArchitectureCloudera ManagerHadoop ClientsTaskTrackerDataNodeTaskTrackerDataNodeTaskTrackerDataNodeJobTrackerJobTrackerCrowbarNagiosGangliaAdmin NodeEdge Node Data Node Data Node Data NodeMaster Name Node Secondary Name NodeStandbyNameNodeJournalNodeJournalNodeStandbyNameNodeHigh Availability NodeActiveNameNodeJournalNodeJobTracker
  • 12. 12 Revolutionary Cloud SolutionsConfidentialHadoop Cluster Scaling
  • 13. Learning The Reference Architecture• Read It !– Read it again– Keep it under your pillow• Three Documents– Reference Architecture– Deployment Guide– Users Guide• Deploy it– Works on 4 or 5 nodes• Available through the Dell Sales Team13
  • 14. Leveraging the Reference Architecture• Start with the base configuration– It works, and eliminates mix and match problems– There are a lot of subtle details hidden behind the configurations• Easy changes: processor, memory, disk– Will generally not break anything– Will affect performance, however• Harder changes: Hadoop configuration– Mainly, need to know what youre doing here– We have experience and recommendations•Hardest Changes: Optimization for workloads– The default configuration is a general purpose one– Specific workloads must be tested and benchmarked14
  • 15. • Assume 1.5 Hadoop Tasks per physical core– Turn Hyperthreading on– This allows headroom for other processes• Configure Hadoop Task slots– 2/3 map tasks– 1/3 reduce tasks• Dual Socket 6 core Xeon example› mapred.tasktracker.map.tasks.maximum: 12› mapred.task.tracker.reduce.tasks.maximum: 6• Faster is better– Hadoop compression uses processor cycles– Most Hadoop jobs are I/O bound, not processor bound– The Map / Reduce balance depends on actual workload– It’s hard to optimize more without knowing the actual workloadSelecting Processors15
  • 16. • Hadoop scales processing and storage together– The cluster grows by adding more data nodes– The ratio of processor to storage is the main adjustment• Generally, aim for a 1 spindle / 1 core ratio– I/O is large blocks (64Mb to 256Mb)– Primarily sequential read/write, very little random I/O– 8 tasks will be reading or writing 8 individual spindles• Drive Sizes and Types– NL SAS or Enterprise SATA 6 Gb/sec– Drive size is mainly a price decision• Depth per node– Up to 48 TB/node is common– 112 Tb / node is possible– Consider how much data is ‘active’– Very deep storage impacts recovery performanceSpindle / Core / Storage Depth Optimization16
  • 17. PowerEdge C8000 Hadoop Scaling - 16 core Xeon1705,00010,00015,00020,00025,00030,00035,000115294357718599113127141155169183197211225239TbStorage(1) 12 spindle 3Tb versus (3) 6 spindle 3TbCores (1)Storage (1)IOPS (1)Storage (3)IOPS (3)
  • 18. • Workload optimization requires profiling and benchmarking• HBase versus pure Map/Reduce are different– I/O patterns are different– Hbase requires more memory– Cloudera RTQ (Impala) is I/O Intensive• Map Reduce usage varies– I/O intensive to CPU intensive• Ingestion and Transfer impact the edge (gateway) nodes• Heterogenous Cluster versus dedicated Clusters ?– Cloudera have added support for heterogenous clusters and nodes– Dedicated cluster makes sense if workload is consistent› Primarily for ‘data’ businessesWorkload Optimization :Hadoop has widely varying workloads18
  • 19. Reference Architecture Options• High Availability– Networking configuration– Master / Secondary Name Node configuration• Alternative Switches– It’s possible– Contact us for advice• Cluster Size– The Reference Architecture Scales Easily to Around 720 Nodes– Beyond that, a network engineer needs to take a closer look• Node Size– Memory recommendations are a starting point– Disk / Core balance is a never ending debate19
  • 20. Model Data Node Configuration Comments RAR720Xd Dual socket, 12 cores,24 x 2.5” spindlesMost popular platform forHadoopC8000 Dual socket, 16 cores,16 x 3.5” spindlesPopular for deep/dense HadoopapplicationsC6100 /C6105Dual socket, 8/12 cores,12 x 3.5” spindlesTwo node version. C6100 ishardware EOLC2100 Dual Socket, 12 cores,12 x 3.5” spindlesPopular, hardware EOL but oftenrepurposed for HadoopR620 Dual Socket, 8 cores,10 x 2.5” spindles1U form factorC6220 Dual-socket, 8 cores,6 x 2.5” spindlesCore/spindle ratio is not ideal forHadoop.In the Wild – Dell Customer Hadoop Configurations20
  • 21. SecureWorks : Based on R720xd Reference ArchitectureSecureWorks24 hours a day, 365 days a year, helping protectthe security of its customers’ assets in real timeChallengeCollecting, processing, and analyzing massiveamounts of data from customer environmentsResults• Reduced cost of data storage to ~21 centsper gigabyte• 80% savings over previous proprietarysolution• 6 months faster deployment• < 1 yr. payback on entire investment• Data doubles every 18 months, magnifyingsavings
  • 22. Further Information• Dell Hadoop Home Page– http://www.dell.com/hadoop• Dell Cloudera Apache Hadoop install with Crowbar (video)– http://www.youtube.com/watch?v=ZWPJv_OsjEk• Cloudera CDH4 Documentation– http://ccp.cloudera.com/display/CDH4DOC/CDH4+Documentation• Crowbar homepage and documentation on GitHub– http://github.com/dellcloudedge/crowbar/wiki• Open Source Crowbar Installers– http://crowbar.zehicle.com/22
  • 23. Q&A23
  • 24. Thank you!24
  • 25. 25Notices & DisclaimersCopyright © 2013 by Dell, Inc.No part of this document may be reproduced or transmitted in any form without the written permission from Dell, Inc.This document could include technical inaccuracies or typographical errors. Dell may make improvements or changes in the product(s)or program(s) described herein at any time without notice. Any statements regarding Dell’s future direction and intent are subject tochange or withdrawal without notice, and represent goals and objectives only.References in this document to Dell products, programs, or services does not imply that Dell intends to make such products, programsor services available in all countries in which Dell operates or does business. Any reference to an Dell Program Product in thisdocument is not intended to state or imply that only that program product may be used. Any functionality equivalent program, thatdoes not infringe Dell’s intellectual property rights, may be used.The information provided in this document is distributed “AS IS” without any warranty, either expressed or implied. Dell EXPRESSLYDISCLAIMS any warranties of merchantability, fitness for a particular purpose OR INFRINGEMENT. Dell shall have no responsibility toupdate this information.The provision of the information contained herein is not intended to, and does not, grant any right or license under any Dell patents orcopyrights.Dell, Inc.300 Innovative WayNashua, NH 03063 USA

×