Instant Hadoop of your Own (Cloudera Big Data Appliance Review)


Published on

Here is a quick summary of Big Data, Hadoop and the Cloudera Hadoop appliance review, used in my YouTube video here:


1. Big Data and Hadoop
2. Hadoop software components
3. Hadoop Appliance / Kit setup
4. Possible Practical Use: Using Hadoop for basic analysis of the Unix “messages” file
5. Cloudera Hadoop Appliance Review

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Instant Hadoop of your Own (Cloudera Big Data Appliance Review)

  1. 1. Instant Hadoop of your Own Created by Jack Bezalel Senior IT Architect As part of the CTE Mentorship Program and the “CA Software as an Appliance SIG” CA Technologies 1
  2. 2. Cloudera Hadoop Appliance 2
  3. 3. Why did we pick this appliance?• #21 out of 800+ “Most Popular” @ VMware• #9 Most popular if we discount OSs• Hadoop is hot (becoming a strategic tool)• Double Value - you’d want the app anyway 3
  4. 4. What’s Hadoop all about? OPPORTUNITY:We have access to amazingly valuable data (Social Media, Mobile, …) 4
  5. 5. What’s Hadoop all about?• Challenges: – Data is seldom UN-Structured – Can’t predict queries in advance – Can’t optimize via SQL / Indexing – Too much data for one node / DB 5
  6. 6. What’s in Hadoop?• Reliable data storage using the Hadoop Distributed File System (HDFS)• High-Performance parallel data processing• Map / Reduce 6
  7. 7. What’s in Hadoop? MapReducePicture Attribution: Lukas Kästner at 7
  8. 8. How does it scale so well?• Commodity, Shared-Nothing Servers• Dynamic Node Activation / Deactivation• Self Healing 8
  9. 9. Who uses Hadoop?• Originally developed and employed by and• Hadoop is now widely used in – Finance – Technology – Telecom – media and entertainment – Government – research institutions and other markets with significant data. 9
  10. 10. Why did we use Cloudera’s Hadoop kit?• Active Hadoop contributor• Enterprise-ready• Developer friendly (Java classes)• Saves time – Bundling + Rigorous testing 10
  11. 11. Cloudera Free Edition (CDH3)• Automates the installation and configuration• Allows Entire cluster (up to 50 nodes)• Requiring only root SSH access to Nodes• Download Here: oudera+Manager+Free+Edition+Download 11
  12. 12. Setup Walkthrough• Not a pre-set appliance (Requires OS)• Requires Redhat (CentOS and others supported)• 64bit only• VMs used: – Cloudera Manager – Nodes to deploy Hadoop on 12
  13. 13. Now enter your 2 or more Hadoop Node names 13
  14. 14. Yeh! 14
  15. 15. Starting the Data Import from File 15
  16. 16. Choosing the format of the data 16
  17. 17. Let’s load it! 17
  18. 18. Create a Select QUERY from our new table and Execute it 18
  19. 19. Monitor the log report as the query is executed 19
  20. 20. What a wonderful output!  20
  21. 21. Appliance Review Time!• Post any questions you may have in the Q&A section and we’ll answer ALL – Either now using the web Q&A button – Or here at the Cloud Administration and Virtualization Chatter Group 21
  22. 22. So what makes a great appliance?• Does the job – no more, no less• Quick and simple setup• Quick and easy updates• Easy control of one of many instances• Simple Infrastructure requirements• Reliable underlying system• No delays doing it’s job• What else? 22
  23. 23. Is CDH3 really an appliance? A great one?• Does the job• Quick and simple setup• Quick and easy updates• Easy control of one or many instances• Simple Infrastructure requirements• Reliable underlying system• No delays doing it’s job 23
  24. 24. But an appliance should be Pre- Installed – Right?• Probably• But still, a quick manual setup -- not big deal• Manual setup = flexibility (you choose OS)• Cloudera are a startup, manual = faster ship• Internal startups could do the same…• Address an urge = popular even if imperfect 24
  25. 25. Q&A Time! Want to be one of the first to get a copy of a pre-set ready to use Hadoop VM?Then Before we sign off1. Join the Cloud Administration and Virtualization Chatter group2. Post a request to join the “CA Software as a Virtual Appliance” SIGFew Questions:1. How much do licenses cost? (Free up to 50)2. Will we have to change our software to feed our logs to hadoop? (yep, parallel…)3. Hadoop uses be for, by IT software companies? (self service, mining, sharing I/O) 25