SlideShare a Scribd company logo
HIGH PERFORMANCE
HARDWARE FOR DATA
ANALYSIS
Michael Pittaro
Michael_Pittaro@dell.com
O P E N
D A T A
S C I E N C E
C O N F E R E N C E_
BOSTON 2015
@opendatasci
WWW.SLIDESHARE.NET/LHRC_MIKEYP
WWW.GITHUB.COM/LHRC-MIKEYP
@pmikeyp
mikeyp@acm.org
O P E N
D A T A
S C I E N C E
C O N F E R E N C E_
BOSTON 2015
@opendatasci
3
About This Talk
• We can’t cover everything about hardware in a 30 minute session.
• We can go deep enough to help you
– Understand tradeoffs and balanced architectures
– Ask the right questions about choices
– Learn from what others are doing
• My Approach Today
1. Why look at high performance hardware ?
2. Look at a production cluster design
3. Look at the choices and tradeoffs behind the scene
4
Why consider High Performance Hardware ?
• Choice of hardware can have large impacts
– On performance
– On budget
• Understanding the hardware helps with the software
– Scalable and parallel systems deal with both
• Data is heavy
– Local clusters are persistent
– Large data transfer may not be a viable option.
• Cloud hosting may not be an option
– You can’t or won’t delegate critical infrastructure to third parties.
– You need every bit of performance you can get.
5
Servers
Processors
Memory
Lack of Trusted Information
Jargon
Disk Drives
Networking
Choices, Choices - The Hardware Toolbox
5
6
Performance
Reliability
Predictability
Cost
Management
Proven
Solutions
Tested
Configurations
What the Customer Wants
6
7
Reference Architectures Fill The Gap
• Tested Server Configurations
• Tested Network Configurations
• Recommended Software Configuration
– Application and Workload Software
– OS Infrastructure
– Operational Infrastructure
• Opinionated Point of View
– Based on real world experience
• Recommended starting point
– Customization is possible
7
8
The secret to a good architecture is balance
Price
Performance
Fault Zones
Application Workload
Software
9
Cluster Architecture
• The Dell In-Memory Appliance for Cloudera Enterprise
9
10
Dell In-Memory Appliance – Summary Specs
Cluster Starter Mid-Size Small Enterprise Maximum
Data Nodes 4 12 20 44
Total Memory 1536 GB 4608 GB 7680 GB 26896 GB
Total Storage 176TB 528 TB 880 TB 2112 TB
Processing Cores 80 280 400 880
Racks (42U) 1 2 2 4
Data Node Characteristic Configuration
Server Dell R720xd (2 Rack Units)
Processor Two Intel Xeon E5-2670v2 2.5GHz, 25M Cache, 10 Core
Memory 384GB
Memory Speed 1866 Mt/s DRAM
Disks 12 X 4TB SATA, 3.0 Gbps (48 TB)
Networking Dual 10GbE interfaces, with active bonding
Management Network
Two x 1GbE interfaces
11
Server Examples
M1000e Blade Chassis (10U)
4 Socket R920 (4U)
2 Socket R730xd (2U)
12
Server Choices
• 4 Socket Servers (e.g. Dell R920)
– Optimized for enterprise applications - Large RDBMS servers, SAP, SAP HANA,
Microsoft Exchange
– Very large memory available (6 TB)
– Often use direct or network attached storage
• ‘Blade’ Servers (e.g. Dell M620, M1000e Chassis)
– Pluggable Processor and Storage modules
– Backplane and Chassis has a lot of shared interconnect logic
– Flexibility for enterprise applications - Virtualization is popular
• 2 Socket Servers (e.g. Dell R620, R630, R720, R730)
– Many options available
– 1U and 2U chassis footprints
– Developed for Web Hosting and Large Scale-Out Clusters
– Dell Internal Storage – 12 x 3.5” drives, 24 x 2.5” drives (in chassis)
13
• Assume 1-1.5 Hadoop tasks per core
– allows headroom for other processes
• Hyperthreading
– Enable for Hadoop, Spark
– for others: it depends
• Hadoop: aim for 1 core / disk spindle
• Impala: can handle more spindles and cores easily
• Spark: I/O depends on back end storage
• Faster processor is better
– Most Hadoop jobs are I/O bound, not processor bound
– Hadoop compression uses processor cycles
– Less cores with a faster clock is often a good tradeoff
– The Map / Reduce balance depends on actual workload
– It’s hard to optimize more without knowing the actual workload
Selecting Processors
14
Intel Xeon Dual Socket Processor Architecture
Haswell CPU
Up to 18 cores
TDP: Up to 145 W (SVR); 160 W (WS)
Socket Socket-R3
Scalability 2S capability
Memory
4xDDR4 channels
1333, 1600, 1866 (2 DPC), 2133 (1 DPC)
RDIMM, LRDIMM
QPI
2xQPI 1.1 channels
6.4, 8.0, 9.6 GT/s
PCIe
PCIe 3.0 (2.5, 5, 8 GT/s)
PCIe Extensions: Dual Cast, Atomics
40xPCIe*3.0
Intel® Xeon®
processor
E5-2600 v3
Intel® Xeon®
processor
E5-2600 v3
QPI
2 Channels
DDR4
LAN
Up to
4x10GbE
PCIe* 3.0, 40 lanes
Intel® C610
series
chipset
WBG
DDR4
DDR4
DDR4
DDR4
DDR4
DDR4
DDR4
15
Intel Processor Generations
Product Xeon E5-2600 E5-2600 V2 E5-2600 V3
Microarchitecture SandyBridge IvyBridge Haswell
Cores / Threads 8 / 16 12/24 18/36
Last Level Cache Up to 20MB Up to 30 MB Up to 45 MB
Max Memory Speed 1600 MT/S
DDR3
1866 MT/s
DDR3
2133 MT/s
DDR4
QPI (GT/s) 2 channels
6.4, 7.2, 8.0
2 channels
6.4, 7.2, 8.0
2 channels
6.4, 8.0, 9.6
Max DIMMS 12 12 12
Max Clock Speed 3.1GHz / 3.8GHz 3.7 GHz / 3.8GHz 3.7 Ghz / 3.8Ghz
Process Tech 32nm 22nm 22nm
Year 2012 2013 2014
16
Selecting Memory
• DDR3 versus DDR4, RDIMM versus LRDIMM
– DDR3 is cheaper now, DDR4 is faster (15%)
• DIMM Sizes
– 8GB, 16GB, 32GB, 64GB, 128GB
• Sweet Spot Varies
– DDR4 around 32GB right now
• Balance the memory banks
– 4 memory channels per processor
– 4 x 16GB better than 2 x 32GB
• Server Class Memory
– It’s all ECC checked
– Dell Server BIOS options to optimize checking method
17
Selecting Disks
• 3.5” Drives
– 3TB, 4TB, 6TB per drive
– Pricing sweet spot is 3TB
– Use enterprise grade drives, not consumer !!
– SATA or SAS. SAS slightly faster.
– 3.0 GB/sec is fine, 6.0 Gb/sec is a waste with spinning drives
• 2.5” Drives
– 800GB and 1.2 TB
– More expensive than 3.5” drives
– more spindles and performance
• SATA Solid State Drives
– 6.0 Gb/sec
– 2.5” and 1.8” options
– Expensive for now
– Not as deterministic as spindles
18
• Hadoop scales processing and storage together
– The cluster grows by adding more data nodes
– The ratio of processor to storage is the main adjustment
• Generally, aim for a 1 spindle / 1 core ratio
– I/O is large blocks (64Mb to 256Mb)
– Primarily sequential read/write, very little random I/O
– 8 tasks will be reading or writing 8 individual spindles
• Drive Sizes and Types
– NL SAS or Enterprise SATA 6 Gb/sec
– Drive size is mainly a price decision
• Depth per node
– Up to 48 TB/node is common
– 112 Tb / node is possible
– Consider how much data is ‘active’
– Very deep storage impacts recovery performance
Spindle / Core / Storage Depth Optimization
1
19
PowerEdge C8000 Hadoop Scaling - 16 core Xeon
1
0
5,000
10,000
15,000
20,000
25,000
30,000
35,000
1
26
51
76
101
126
151
176
201
226
TbStorage
(1) 12 spindle 3Tb versus (3) 6 spindle
3Tb
Cores (1)
Storage (1)
IOPS (1)
Storage (3)
IOPS (3)
20
Network Architecture – Layer 2 Switching
21
Network and Switches
• Simple Tree Structure
– Top of Rack (TOR) for each rack / group of nodes
– Racks feed up to a Cluster or Aggregation Switch
– All switching is at Layer 2 (Ethernet)
› No fancy routing or layer 3 (IP) packet inspection
– Most switches are 48 ports in this class
• Switch Characteristics
– Line rate switching at 10Gbps
– Deep buffers to handle bursts
– Virtual Link Trunking (VLT)– two switches act as one, with failover
– Uplinks are 40GbE
• High Availability and Performance
– Use two 10GbE links to alternate switches
– Bond at the Linux level into a single device
22
Model Data Node
Configuration
Comments RA
R730Xd Dual socket, 12 cores,
24 x 2.5” spindles
Most popular platform for
Hadoop
C8000 Dual socket, 16 cores,
16 x 3.5” spindles
Popular for deep/dense
Hadoop applications
C6100 /
C6105
Dual socket, 8/12 cores,
12 x 3.5” spindles
Two node version. C6100 is
hardware EOL
C2100 Dual Socket, 12 cores,
12 x 3.5” spindles
Popular, hardware EOL but
often repurposed for
Hadoop
R620 Dual Socket, 8 cores,
10 x 2.5” spindles
1U form factor
C6220 Dual-socket, 8 cores,
6 x 2.5” spindles
Core/spindle ratio is not
ideal for Hadoop.
In the Wild – Dell Customer Hadoop Configurations
2
23
• GPU’s
– Possible, not seen too often with Hadoop
• Ingest / Streaming
– Usually a custom configuration for high speed capture/loading (e.g. Kafka, Storm)
• Dell PowerEdge VRTX
– Designed as a ‘mini-blade’ for branch offices
– Could make a killer data science workstation
What I haven’t talked about!
24
• Dell.com/hadoop
– Hadoop Reference Acchitectures
– Optimizing PowerEdge Configurations for Hadoop
• Slideshare
– http://www.slideshare.net/lhrc-mikeyp
Download Links / References
25
High Performance Hardware for Data Analysis
• Choosing hardware for big data analysis is difficult because of the many options and variables involved. The problem is more
complicated when you need a full cluster for big data analytics.
• This session will cover the basic guidelines and architectural choices involved in choosing analytics hardware for Spark and
Hadoop. I will cover processor core and memory ratios, disk subsystems, and network architecture. This is a practical advice
oriented session, and will focus on performance and cost tradeoffs for many different options.

More Related Content

What's hot

Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Cloudera, Inc.
 
Improving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxImproving Hadoop Performance via Linux
Improving Hadoop Performance via Linux
Alex Moundalexis
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
Cloudera, Inc.
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteDataWorks Summit
 
Global Azure Virtual 2020 What's new on Azure IaaS for SQL VMs
Global Azure Virtual 2020 What's new on Azure IaaS for SQL VMsGlobal Azure Virtual 2020 What's new on Azure IaaS for SQL VMs
Global Azure Virtual 2020 What's new on Azure IaaS for SQL VMs
Marco Obinu
 
Optimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopOptimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for Hadoop
DataWorks Summit
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
Kathleen Ting
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guide
larsgeorge
 
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedInHadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
Allen Wittenauer
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014
larsgeorge
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)mundlapudi
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Esther Kundin
 
Deploying Grid Services Using Apache Hadoop
Deploying Grid Services Using Apache HadoopDeploying Grid Services Using Apache Hadoop
Deploying Grid Services Using Apache Hadoop
Allen Wittenauer
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance Improvement
Biju Nair
 
Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.Jack Levin
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
Lars Hofhansl
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0
enissoz
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
Cloudera, Inc.
 
Performance evaluation of cloudera impala (with Comparison to Hive)
Performance evaluation of cloudera impala (with Comparison to Hive)Performance evaluation of cloudera impala (with Comparison to Hive)
Performance evaluation of cloudera impala (with Comparison to Hive)Yukinori Suda
 

What's hot (20)

Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
 
Improving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxImproving Hadoop Performance via Linux
Improving Hadoop Performance via Linux
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
 
Global Azure Virtual 2020 What's new on Azure IaaS for SQL VMs
Global Azure Virtual 2020 What's new on Azure IaaS for SQL VMsGlobal Azure Virtual 2020 What's new on Azure IaaS for SQL VMs
Global Azure Virtual 2020 What's new on Azure IaaS for SQL VMs
 
Optimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopOptimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for Hadoop
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guide
 
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedInHadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Deploying Grid Services Using Apache Hadoop
Deploying Grid Services Using Apache HadoopDeploying Grid Services Using Apache Hadoop
Deploying Grid Services Using Apache Hadoop
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance Improvement
 
Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.
 
ha_module5
ha_module5ha_module5
ha_module5
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
 
Performance evaluation of cloudera impala (with Comparison to Hive)
Performance evaluation of cloudera impala (with Comparison to Hive)Performance evaluation of cloudera impala (with Comparison to Hive)
Performance evaluation of cloudera impala (with Comparison to Hive)
 

Similar to High Performance Hardware for Data Analysis

Mike Pittaro - High Performance Hardware for Data Analysis
Mike Pittaro - High Performance Hardware for Data Analysis Mike Pittaro - High Performance Hardware for Data Analysis
Mike Pittaro - High Performance Hardware for Data Analysis
PyData
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Community
 
HPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY
 
Deploying ssd in the data center 2014
Deploying ssd in the data center 2014Deploying ssd in the data center 2014
Deploying ssd in the data center 2014
Howard Marks
 
Webinar NETGEAR - ReadyNAS, le novità hardware e software
Webinar NETGEAR - ReadyNAS, le novità hardware e softwareWebinar NETGEAR - ReadyNAS, le novità hardware e software
Webinar NETGEAR - ReadyNAS, le novità hardware e software
Netgear Italia
 
Výhody a benefity nasazení Oracle Database Appliance
Výhody a benefity nasazení Oracle Database ApplianceVýhody a benefity nasazení Oracle Database Appliance
Výhody a benefity nasazení Oracle Database Appliance
MarketingArrowECS_CZ
 
SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4UniFabric
 
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems SpecialistOWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
Paris Open Source Summit
 
INCOSE Colorado Front Range Chapter Presentation - Technology Impact on Compu...
INCOSE Colorado Front Range Chapter Presentation - Technology Impact on Compu...INCOSE Colorado Front Range Chapter Presentation - Technology Impact on Compu...
INCOSE Colorado Front Range Chapter Presentation - Technology Impact on Compu...
INCOSE Colorado Front Range Chapter
 
Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016
Colin Charles
 
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
Red_Hat_Storage
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
Splunk
 
Oracle real application_cluster
Oracle real application_clusterOracle real application_cluster
Oracle real application_cluster
Prabhat gangwar
 
LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March...
LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March...LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March...
LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March...
Linaro
 
FAQ
FAQFAQ
FAQ
mobigen
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttu
Alan Sill
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
Nicolas Poggi
 
Presentation db2 best practices for optimal performance
Presentation   db2 best practices for optimal performancePresentation   db2 best practices for optimal performance
Presentation db2 best practices for optimal performance
solarisyougood
 
AI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersAI Accelerators for Cloud Datacenters
AI Accelerators for Cloud Datacenters
CastLabKAIST
 

Similar to High Performance Hardware for Data Analysis (20)

Mike Pittaro - High Performance Hardware for Data Analysis
Mike Pittaro - High Performance Hardware for Data Analysis Mike Pittaro - High Performance Hardware for Data Analysis
Mike Pittaro - High Performance Hardware for Data Analysis
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
 
HPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big Data
 
Deploying ssd in the data center 2014
Deploying ssd in the data center 2014Deploying ssd in the data center 2014
Deploying ssd in the data center 2014
 
Webinar NETGEAR - ReadyNAS, le novità hardware e software
Webinar NETGEAR - ReadyNAS, le novità hardware e softwareWebinar NETGEAR - ReadyNAS, le novità hardware e software
Webinar NETGEAR - ReadyNAS, le novità hardware e software
 
Výhody a benefity nasazení Oracle Database Appliance
Výhody a benefity nasazení Oracle Database ApplianceVýhody a benefity nasazení Oracle Database Appliance
Výhody a benefity nasazení Oracle Database Appliance
 
SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4
 
Session 307 ravi pendekanti engineered systems
Session 307  ravi pendekanti engineered systemsSession 307  ravi pendekanti engineered systems
Session 307 ravi pendekanti engineered systems
 
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems SpecialistOWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
 
INCOSE Colorado Front Range Chapter Presentation - Technology Impact on Compu...
INCOSE Colorado Front Range Chapter Presentation - Technology Impact on Compu...INCOSE Colorado Front Range Chapter Presentation - Technology Impact on Compu...
INCOSE Colorado Front Range Chapter Presentation - Technology Impact on Compu...
 
Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016
 
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 
Oracle real application_cluster
Oracle real application_clusterOracle real application_cluster
Oracle real application_cluster
 
LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March...
LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March...LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March...
LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March...
 
FAQ
FAQFAQ
FAQ
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttu
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
 
Presentation db2 best practices for optimal performance
Presentation   db2 best practices for optimal performancePresentation   db2 best practices for optimal performance
Presentation db2 best practices for optimal performance
 
AI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersAI Accelerators for Cloud Datacenters
AI Accelerators for Cloud Datacenters
 

Recently uploaded

Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 

Recently uploaded (20)

Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 

High Performance Hardware for Data Analysis

  • 1. HIGH PERFORMANCE HARDWARE FOR DATA ANALYSIS Michael Pittaro Michael_Pittaro@dell.com O P E N D A T A S C I E N C E C O N F E R E N C E_ BOSTON 2015 @opendatasci
  • 2. WWW.SLIDESHARE.NET/LHRC_MIKEYP WWW.GITHUB.COM/LHRC-MIKEYP @pmikeyp mikeyp@acm.org O P E N D A T A S C I E N C E C O N F E R E N C E_ BOSTON 2015 @opendatasci
  • 3. 3 About This Talk • We can’t cover everything about hardware in a 30 minute session. • We can go deep enough to help you – Understand tradeoffs and balanced architectures – Ask the right questions about choices – Learn from what others are doing • My Approach Today 1. Why look at high performance hardware ? 2. Look at a production cluster design 3. Look at the choices and tradeoffs behind the scene
  • 4. 4 Why consider High Performance Hardware ? • Choice of hardware can have large impacts – On performance – On budget • Understanding the hardware helps with the software – Scalable and parallel systems deal with both • Data is heavy – Local clusters are persistent – Large data transfer may not be a viable option. • Cloud hosting may not be an option – You can’t or won’t delegate critical infrastructure to third parties. – You need every bit of performance you can get.
  • 5. 5 Servers Processors Memory Lack of Trusted Information Jargon Disk Drives Networking Choices, Choices - The Hardware Toolbox 5
  • 7. 7 Reference Architectures Fill The Gap • Tested Server Configurations • Tested Network Configurations • Recommended Software Configuration – Application and Workload Software – OS Infrastructure – Operational Infrastructure • Opinionated Point of View – Based on real world experience • Recommended starting point – Customization is possible 7
  • 8. 8 The secret to a good architecture is balance Price Performance Fault Zones Application Workload Software
  • 9. 9 Cluster Architecture • The Dell In-Memory Appliance for Cloudera Enterprise 9
  • 10. 10 Dell In-Memory Appliance – Summary Specs Cluster Starter Mid-Size Small Enterprise Maximum Data Nodes 4 12 20 44 Total Memory 1536 GB 4608 GB 7680 GB 26896 GB Total Storage 176TB 528 TB 880 TB 2112 TB Processing Cores 80 280 400 880 Racks (42U) 1 2 2 4 Data Node Characteristic Configuration Server Dell R720xd (2 Rack Units) Processor Two Intel Xeon E5-2670v2 2.5GHz, 25M Cache, 10 Core Memory 384GB Memory Speed 1866 Mt/s DRAM Disks 12 X 4TB SATA, 3.0 Gbps (48 TB) Networking Dual 10GbE interfaces, with active bonding Management Network Two x 1GbE interfaces
  • 11. 11 Server Examples M1000e Blade Chassis (10U) 4 Socket R920 (4U) 2 Socket R730xd (2U)
  • 12. 12 Server Choices • 4 Socket Servers (e.g. Dell R920) – Optimized for enterprise applications - Large RDBMS servers, SAP, SAP HANA, Microsoft Exchange – Very large memory available (6 TB) – Often use direct or network attached storage • ‘Blade’ Servers (e.g. Dell M620, M1000e Chassis) – Pluggable Processor and Storage modules – Backplane and Chassis has a lot of shared interconnect logic – Flexibility for enterprise applications - Virtualization is popular • 2 Socket Servers (e.g. Dell R620, R630, R720, R730) – Many options available – 1U and 2U chassis footprints – Developed for Web Hosting and Large Scale-Out Clusters – Dell Internal Storage – 12 x 3.5” drives, 24 x 2.5” drives (in chassis)
  • 13. 13 • Assume 1-1.5 Hadoop tasks per core – allows headroom for other processes • Hyperthreading – Enable for Hadoop, Spark – for others: it depends • Hadoop: aim for 1 core / disk spindle • Impala: can handle more spindles and cores easily • Spark: I/O depends on back end storage • Faster processor is better – Most Hadoop jobs are I/O bound, not processor bound – Hadoop compression uses processor cycles – Less cores with a faster clock is often a good tradeoff – The Map / Reduce balance depends on actual workload – It’s hard to optimize more without knowing the actual workload Selecting Processors
  • 14. 14 Intel Xeon Dual Socket Processor Architecture Haswell CPU Up to 18 cores TDP: Up to 145 W (SVR); 160 W (WS) Socket Socket-R3 Scalability 2S capability Memory 4xDDR4 channels 1333, 1600, 1866 (2 DPC), 2133 (1 DPC) RDIMM, LRDIMM QPI 2xQPI 1.1 channels 6.4, 8.0, 9.6 GT/s PCIe PCIe 3.0 (2.5, 5, 8 GT/s) PCIe Extensions: Dual Cast, Atomics 40xPCIe*3.0 Intel® Xeon® processor E5-2600 v3 Intel® Xeon® processor E5-2600 v3 QPI 2 Channels DDR4 LAN Up to 4x10GbE PCIe* 3.0, 40 lanes Intel® C610 series chipset WBG DDR4 DDR4 DDR4 DDR4 DDR4 DDR4 DDR4
  • 15. 15 Intel Processor Generations Product Xeon E5-2600 E5-2600 V2 E5-2600 V3 Microarchitecture SandyBridge IvyBridge Haswell Cores / Threads 8 / 16 12/24 18/36 Last Level Cache Up to 20MB Up to 30 MB Up to 45 MB Max Memory Speed 1600 MT/S DDR3 1866 MT/s DDR3 2133 MT/s DDR4 QPI (GT/s) 2 channels 6.4, 7.2, 8.0 2 channels 6.4, 7.2, 8.0 2 channels 6.4, 8.0, 9.6 Max DIMMS 12 12 12 Max Clock Speed 3.1GHz / 3.8GHz 3.7 GHz / 3.8GHz 3.7 Ghz / 3.8Ghz Process Tech 32nm 22nm 22nm Year 2012 2013 2014
  • 16. 16 Selecting Memory • DDR3 versus DDR4, RDIMM versus LRDIMM – DDR3 is cheaper now, DDR4 is faster (15%) • DIMM Sizes – 8GB, 16GB, 32GB, 64GB, 128GB • Sweet Spot Varies – DDR4 around 32GB right now • Balance the memory banks – 4 memory channels per processor – 4 x 16GB better than 2 x 32GB • Server Class Memory – It’s all ECC checked – Dell Server BIOS options to optimize checking method
  • 17. 17 Selecting Disks • 3.5” Drives – 3TB, 4TB, 6TB per drive – Pricing sweet spot is 3TB – Use enterprise grade drives, not consumer !! – SATA or SAS. SAS slightly faster. – 3.0 GB/sec is fine, 6.0 Gb/sec is a waste with spinning drives • 2.5” Drives – 800GB and 1.2 TB – More expensive than 3.5” drives – more spindles and performance • SATA Solid State Drives – 6.0 Gb/sec – 2.5” and 1.8” options – Expensive for now – Not as deterministic as spindles
  • 18. 18 • Hadoop scales processing and storage together – The cluster grows by adding more data nodes – The ratio of processor to storage is the main adjustment • Generally, aim for a 1 spindle / 1 core ratio – I/O is large blocks (64Mb to 256Mb) – Primarily sequential read/write, very little random I/O – 8 tasks will be reading or writing 8 individual spindles • Drive Sizes and Types – NL SAS or Enterprise SATA 6 Gb/sec – Drive size is mainly a price decision • Depth per node – Up to 48 TB/node is common – 112 Tb / node is possible – Consider how much data is ‘active’ – Very deep storage impacts recovery performance Spindle / Core / Storage Depth Optimization 1
  • 19. 19 PowerEdge C8000 Hadoop Scaling - 16 core Xeon 1 0 5,000 10,000 15,000 20,000 25,000 30,000 35,000 1 26 51 76 101 126 151 176 201 226 TbStorage (1) 12 spindle 3Tb versus (3) 6 spindle 3Tb Cores (1) Storage (1) IOPS (1) Storage (3) IOPS (3)
  • 20. 20 Network Architecture – Layer 2 Switching
  • 21. 21 Network and Switches • Simple Tree Structure – Top of Rack (TOR) for each rack / group of nodes – Racks feed up to a Cluster or Aggregation Switch – All switching is at Layer 2 (Ethernet) › No fancy routing or layer 3 (IP) packet inspection – Most switches are 48 ports in this class • Switch Characteristics – Line rate switching at 10Gbps – Deep buffers to handle bursts – Virtual Link Trunking (VLT)– two switches act as one, with failover – Uplinks are 40GbE • High Availability and Performance – Use two 10GbE links to alternate switches – Bond at the Linux level into a single device
  • 22. 22 Model Data Node Configuration Comments RA R730Xd Dual socket, 12 cores, 24 x 2.5” spindles Most popular platform for Hadoop C8000 Dual socket, 16 cores, 16 x 3.5” spindles Popular for deep/dense Hadoop applications C6100 / C6105 Dual socket, 8/12 cores, 12 x 3.5” spindles Two node version. C6100 is hardware EOL C2100 Dual Socket, 12 cores, 12 x 3.5” spindles Popular, hardware EOL but often repurposed for Hadoop R620 Dual Socket, 8 cores, 10 x 2.5” spindles 1U form factor C6220 Dual-socket, 8 cores, 6 x 2.5” spindles Core/spindle ratio is not ideal for Hadoop. In the Wild – Dell Customer Hadoop Configurations 2
  • 23. 23 • GPU’s – Possible, not seen too often with Hadoop • Ingest / Streaming – Usually a custom configuration for high speed capture/loading (e.g. Kafka, Storm) • Dell PowerEdge VRTX – Designed as a ‘mini-blade’ for branch offices – Could make a killer data science workstation What I haven’t talked about!
  • 24. 24 • Dell.com/hadoop – Hadoop Reference Acchitectures – Optimizing PowerEdge Configurations for Hadoop • Slideshare – http://www.slideshare.net/lhrc-mikeyp Download Links / References
  • 25. 25 High Performance Hardware for Data Analysis • Choosing hardware for big data analysis is difficult because of the many options and variables involved. The problem is more complicated when you need a full cluster for big data analytics. • This session will cover the basic guidelines and architectural choices involved in choosing analytics hardware for Spark and Hadoop. I will cover processor core and memory ratios, disk subsystems, and network architecture. This is a practical advice oriented session, and will focus on performance and cost tradeoffs for many different options.