• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Instant Hadoop of your Own (Cloudera Big Data Appliance Review)
 

Instant Hadoop of your Own (Cloudera Big Data Appliance Review)

on

  • 1,481 views

Here is a quick summary of Big Data, Hadoop and the Cloudera Hadoop appliance review, used in my YouTube video here: ...

Here is a quick summary of Big Data, Hadoop and the Cloudera Hadoop appliance review, used in my YouTube video here:
http://www.youtube.com/watch?v=fDr5fs28g2A

Contents:

1. Big Data and Hadoop
2. Hadoop software components
3. Hadoop Appliance / Kit setup
4. Possible Practical Use: Using Hadoop for basic analysis of the Unix “messages” file
5. Cloudera Hadoop Appliance Review

Statistics

Views

Total Views
1,481
Views on SlideShare
1,362
Embed Views
119

Actions

Likes
0
Downloads
36
Comments
0

2 Embeds 119

http://itprofessional-mastermind.com 118
http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Instant Hadoop of your Own (Cloudera Big Data Appliance Review) Instant Hadoop of your Own (Cloudera Big Data Appliance Review) Presentation Transcript

    • Instant Hadoop of your Own Created by Jack Bezalel Senior IT Architect As part of the CTE Mentorship Program and the “CA Software as an Appliance SIG” CA Technologies 1
    • Cloudera Hadoop Appliance 2
    • Why did we pick this appliance?• #21 out of 800+ “Most Popular” @ VMware• #9 Most popular if we discount OSs• Hadoop is hot (becoming a strategic tool)• Double Value - you’d want the app anyway 3
    • What’s Hadoop all about? OPPORTUNITY:We have access to amazingly valuable data (Social Media, Mobile, …) 4
    • What’s Hadoop all about?• Challenges: – Data is seldom UN-Structured – Can’t predict queries in advance – Can’t optimize via SQL / Indexing – Too much data for one node / DB 5
    • What’s in Hadoop?• Reliable data storage using the Hadoop Distributed File System (HDFS)• High-Performance parallel data processing• Map / Reduce 6
    • What’s in Hadoop? MapReducePicture Attribution: Lukas Kästner at http://www.flickr.com/photos/lkaestner/ 7
    • How does it scale so well?• Commodity, Shared-Nothing Servers• Dynamic Node Activation / Deactivation• Self Healing 8
    • Who uses Hadoop?• Originally developed and employed by and• Hadoop is now widely used in – Finance – Technology – Telecom – media and entertainment – Government – research institutions and other markets with significant data. 9
    • Why did we use Cloudera’s Hadoop kit?• Active Hadoop contributor• Enterprise-ready• Developer friendly (Java classes)• Saves time – Bundling + Rigorous testing 10
    • Cloudera Free Edition (CDH3)• Automates the installation and configuration• Allows Entire cluster (up to 50 nodes)• Requiring only root SSH access to Nodes• Download Here: https://ccp.cloudera.com/display/SUPPORT/Cl oudera+Manager+Free+Edition+Download 11
    • Setup Walkthrough• Not a pre-set appliance (Requires OS)• Requires Redhat (CentOS and others supported)• 64bit only• VMs used: – Cloudera Manager – Nodes to deploy Hadoop on 12
    • Now enter your 2 or more Hadoop Node names 13
    • Yeh! 14
    • Starting the Data Import from File 15
    • Choosing the format of the data 16
    • Let’s load it! 17
    • Create a Select QUERY from our new table and Execute it 18
    • Monitor the log report as the query is executed 19
    • What a wonderful output!  20
    • Appliance Review Time!• Post any questions you may have in the Q&A section and we’ll answer ALL – Either now using the web Q&A button – Or here at the Cloud Administration and Virtualization Chatter Group 21
    • So what makes a great appliance?• Does the job – no more, no less• Quick and simple setup• Quick and easy updates• Easy control of one of many instances• Simple Infrastructure requirements• Reliable underlying system• No delays doing it’s job• What else? 22
    • Is CDH3 really an appliance? A great one?• Does the job• Quick and simple setup• Quick and easy updates• Easy control of one or many instances• Simple Infrastructure requirements• Reliable underlying system• No delays doing it’s job 23
    • But an appliance should be Pre- Installed – Right?• Probably• But still, a quick manual setup -- not big deal• Manual setup = flexibility (you choose OS)• Cloudera are a startup, manual = faster ship• Internal startups could do the same…• Address an urge = popular even if imperfect 24
    • Q&A Time! Want to be one of the first to get a copy of a pre-set ready to use Hadoop VM?Then Before we sign off1. Join the Cloud Administration and Virtualization Chatter group2. Post a request to join the “CA Software as a Virtual Appliance” SIGFew Questions:1. How much do licenses cost? (Free up to 50)2. Will we have to change our software to feed our logs to hadoop? (yep, parallel…)3. Hadoop uses be for, by IT software companies? (self service, mining, sharing I/O) 25