Here is a quick summary of Big Data, Hadoop and the Cloudera Hadoop appliance review, used in my YouTube video here:
http://www.youtube.com/watch?v=fDr5fs28g2A
Contents:
1. Big Data and Hadoop
2. Hadoop software components
3. Hadoop Appliance / Kit setup
4. Possible Practical Use: Using Hadoop for basic analysis of the Unix “messages” file
5. Cloudera Hadoop Appliance Review
Instant Hadoop of your Own (Cloudera Big Data Appliance Review)
1. Instant Hadoop of your Own
Created by Jack Bezalel
Senior IT Architect
As part of the CTE Mentorship Program
and the “CA Software as an Appliance SIG”
CA Technologies 1
3. Why did we pick this appliance?
• #21 out of 800+ “Most Popular” @ VMware
• #9 Most popular if we discount OSs
• Hadoop is hot (becoming a strategic tool)
• Double Value - you’d want the app anyway
3
4. What’s Hadoop all about?
OPPORTUNITY:
We have access to amazingly valuable data
(Social Media, Mobile, …)
4
5. What’s Hadoop all about?
• Challenges:
– Data is seldom UN-Structured
– Can’t predict queries in advance
– Can’t optimize via
SQL / Indexing
– Too much data for
one node / DB
5
6. What’s in Hadoop?
• Reliable data storage using the Hadoop
Distributed File System (HDFS)
• High-Performance
parallel
data processing
• Map / Reduce
6
7. What’s in Hadoop?
MapReduce
Picture Attribution: Lukas Kästner at http://www.flickr.com/photos/lkaestner/
7
8. How does it scale so well?
• Commodity, Shared-Nothing Servers
• Dynamic Node Activation / Deactivation
• Self Healing
8
9. Who uses Hadoop?
• Originally developed and employed by and
• Hadoop is now widely used in
– Finance
– Technology
– Telecom
– media and entertainment
– Government
– research institutions and other markets with
significant data.
9
10. Why did we use Cloudera’s Hadoop
kit?
• Active Hadoop contributor
• Enterprise-ready
• Developer friendly (Java classes)
• Saves time – Bundling +
Rigorous testing
10
11. Cloudera Free Edition (CDH3)
• Automates the installation and configuration
• Allows Entire cluster (up to 50 nodes)
• Requiring only root SSH access to Nodes
• Download Here:
https://ccp.cloudera.com/display/SUPPORT/Cl
oudera+Manager+Free+Edition+Download
11
12. Setup Walkthrough
• Not a pre-set appliance (Requires OS)
• Requires Redhat (CentOS and others
supported)
• 64bit only
• VMs used:
– Cloudera Manager
– Nodes to deploy Hadoop on
12
21. Appliance Review Time!
• Post any questions you may have in the Q&A
section and we’ll answer ALL
– Either now using the web Q&A button
– Or here at the Cloud Administration and
Virtualization Chatter Group
21
22. So what makes a great appliance?
• Does the job – no more, no less
• Quick and simple setup
• Quick and easy updates
• Easy control of one of many instances
• Simple Infrastructure requirements
• Reliable underlying system
• No delays doing it’s job
• What else?
22
23. Is CDH3 really an appliance?
A great one?
• Does the job
• Quick and simple setup
• Quick and easy updates
• Easy control of one or many instances
• Simple Infrastructure requirements
• Reliable underlying system
• No delays doing it’s job
23
24. But an appliance should be Pre-
Installed – Right?
• Probably
• But still, a quick manual setup -- not big deal
• Manual setup = flexibility (you choose OS)
• Cloudera are a startup, manual = faster ship
• Internal startups could do the same…
• Address an urge = popular even if imperfect
24
25. Q&A Time!
Want to be one of the first to get a copy
of a pre-set ready to use Hadoop VM?
Then Before we sign off
1. Join the Cloud Administration and Virtualization Chatter group
2. Post a request to join the “CA Software as a Virtual Appliance” SIG
Few Questions:
1. How much do licenses cost? (Free up to 50)
2. Will we have to change our software to feed our logs to hadoop? (yep, parallel…)
3. Hadoop uses be for, by IT software companies? (self service, mining, sharing I/O)
25