More Related Content
Similar to Operate your hadoop cluster like a high eff goldmine
Similar to Operate your hadoop cluster like a high eff goldmine (20)
More from DataWorks Summit
More from DataWorks Summit (20)
Operate your hadoop cluster like a high eff goldmine
- 1. Operate Your Hadoop Cluster
Like a High-Efficiency
Data Goldmine
Greg Bruno
VP Engineering & Co-founder / StackIQ
Thursday 14 June 2012
© 2012 StackIQ. All rights reserved. 1
- 7. You Need a Repeatable Process to
Successfully Deploy Hadoop Clusters
1 2 3
Select
Operate
Provision
© 2012 StackIQ. All rights reserved. 7
- 8. 1
Select Commodity Hardware
Three Server Types
Networking Considerations
© 2012 StackIQ. All rights reserved. 8
- 9. • Standard high volume equipment = “best
bang for the buck”
• Flexible configuration
Commodity You pick the CPUs, memory and disks that make
Hardware sense for your deployment
• Easy to “tune” your configuration
Need more disk caching? Add memory!
CPU pegged? Upgrade CPU!
Need more disk I/Os per second? Add spindles
or faster spinning drives!
© 2012 StackIQ. All rights reserved. 9
- 10. Choose Servers Suited to
their Purpose in the Cluster
Name Nodes
Maintain metadata
Data Nodes
Lots of big disks
to store all the data
Cluster Manager
Provisions and monitors
name nodes and data nodes
Needs much less disk
storage than data nodes
© 2012 StackIQ. All rights reserved. 10
- 11. Networking Factors to Consider
• Hadoop is designed to access
data locally from data nodes so
a high-performance network is
often not needed
• Standard Gigabit Ethernet
generally works well
© 2012 StackIQ. All rights reserved. 11
- 12. Single Rack Configuration
648 TB of raw storage à 216 TB usable!
Gigabit Ethernet switch
Dell PowerConnect 5524
Cluster Manager node
Dell PowerEdge R410
16 GB memory
2 x 1 TB disks
18 Data Nodes
Dell PowerEdge R720xd
96 GB memory
12 x 3 TB 3.5” disks
Name Nodes
Dell PowerEdge R720xd
96 GB memory
12 x 3 TB 3.5” disks
© 2012 StackIQ. All rights reserved. 12
- 14. Expansion Rack
• Gigabit Ethernet Switch
• 20 Data Nodes
• 720 TB of raw storage à 240 TB usable
© 2012 StackIQ. All rights reserved. 14
- 15. Interconnection Rack
• 10 Gigabit Ethernet switch
Dell PowerConnect 8024F
• Gigabit Ethernet switch
• 20 Data Nodes
• 720 TB of raw storage à 240 TB usable
© 2012 StackIQ. All rights reserved. 15
- 16. 5 Racks = 1.1 Petabyte of Usable Storage!
© 2012 StackIQ. All rights reserved. 16
- 17. 2
Starting with Bare Metal
Hadoop Dependencies
Provision
System Test
© 2012 StackIQ. All rights reserved. 17
- 18. Starting with Bare Metal
• Assume nothing
Our Hadoop provisioning tool installs all
the bits (e.g., OS, libraries, Hadoop
software) and configures all the services
(e.g., network, firewall, disks, Hadoop
services).
Other Hadoop provisioning tools assume
all nodes in the cluster have a base OS
installed and they are up on the network
• Step 1: Install Cluster Manager
node
• Step 2: Install backend nodes
Cluster Manager node is used to install all
backend nodes
© 2012 StackIQ. All rights reserved. 18
- 19. Install Cluster Manager
• Boot the Cluster Manager node
with the StackIQ Enterprise Data
DVD
• Input basic information about the
node (e.g., FQDN, public IP
address, timezone, etc.)
• Installer copies content of DVD to
the local disk and node installs
itself from that repository
• After the node boots it is ready to
install the backend nodes
© 2012 StackIQ. All rights reserved. 19
- 20. Install Backend Nodes
• Set boot order to:
1. PXE
2. Local hard disk
• Put Cluster Manager node into
“discover” mode
• Boot backend nodes
Nodes are discovered by the Cluster
Manager, auto-assigned a name and
an IP address, then sent a kickstart
file
Discovery and installation are fully
automated
© 2012 StackIQ. All rights reserved. 20
- 21. Hadoop Dependencies
• After backend nodes install and reboot, they are up on the
network, with all the Hadoop software installed
And all the required software that the Hadoop services need are installed too
• The Cluster Manager has information about every backend node
Name / IP
Disk parameters
Number of CPUs
• Node info is used to auto-tune the backend nodes, specifically,
the Hadoop services
© 2012 StackIQ. All rights reserved. 21
- 22. Now It’s Time
for a Basic
System Test
• Do all nodes
respond to “ssh”?
• Login to each node via
ssh and execute
“uptime”:
© 2012 StackIQ. All rights reserved. 22
- 23. 3
Service Configuration
Operate Service Monitoring
Service Management
© 2012 StackIQ. All rights reserved. 23
- 24. Configuring Hadoop Services
• The hardware has been racked and cabled
• All the software is in place
• Time to define and start Hadoop Services
HDFS
MapReduce
ZooKeeper
HBase
© 2012 StackIQ. All rights reserved. 24
- 25. clustermgr
compute-0-0 compute-1-0
compute-0-1 compute-1-1
Configuring compute-0-2
HDFS rack 0 rack 1
© 2012 StackIQ. All rights reserved. 25
- 33. HDFS Data Movement Visualization
Node Color Rack Membership
Vector Color Operation (read or write)
Line Width Data Volume
© 2012 StackIQ. All rights reserved. 33
- 34. Dealing with
the Inevitable
• No matter how well planned,
every cluster will need to
undergo changes
Expansion (add)
Failures (replace)
Changes (reconfigure)
• We manage all the above with
one operation – install!
Expansion (discover and install!)
Failure (replace node and install!)
Changes (add packages to the
distribution and reinstall!)
© 2012 StackIQ. All rights reserved. 34
- 35. We Just Went
From Zero to
Hadoop Cluster!
• You now have a blueprint to:
Procure hardware
Provision an entire software stack
OS, libraries, cluster
management services, Hadoop
software, etc.
Configure Hadoop services
• Now go build your own Hadoop
cluster and start mining the gold
in your data!
© 2012 StackIQ. All rights reserved. 35