SlideShare a Scribd company logo
1 of 25
Instant Hadoop of your Own




           Created by Jack Bezalel
              Senior IT Architect
   As part of the CTE Mentorship Program
  and the “CA Software as an Appliance SIG”
               CA Technologies                1
Cloudera Hadoop Appliance




                            2
Why did we pick this appliance?




•   #21 out of 800+ “Most Popular” @ VMware
•   #9 Most popular if we discount OSs
•   Hadoop is hot (becoming a strategic tool)
•   Double Value - you’d want the app anyway

                                                3
What’s Hadoop all about?
            OPPORTUNITY:
We have access to amazingly valuable data
       (Social Media, Mobile, …)




                                            4
What’s Hadoop all about?
• Challenges:
  – Data is seldom UN-Structured
  – Can’t predict queries in advance
  – Can’t optimize via
    SQL / Indexing
  – Too much data for
   one node / DB



                                       5
What’s in Hadoop?
• Reliable data storage using the Hadoop
  Distributed File System (HDFS)
• High-Performance
  parallel
  data processing
• Map / Reduce



                                           6
What’s in Hadoop?
                                   MapReduce




Picture Attribution: Lukas Kästner at http://www.flickr.com/photos/lkaestner/




                                                                                7
How does it scale so well?
• Commodity, Shared-Nothing Servers
• Dynamic Node Activation / Deactivation
• Self Healing




                                           8
Who uses Hadoop?
• Originally developed and employed by               and

• Hadoop is now widely used in
  –   Finance
  –   Technology
  –   Telecom
  –   media and entertainment
  –   Government
  –   research institutions and other markets with
      significant data.
                                                           9
Why did we use Cloudera’s Hadoop
                  kit?
•   Active Hadoop contributor
•   Enterprise-ready
•   Developer friendly (Java classes)
•   Saves time – Bundling +
    Rigorous testing




                                        10
Cloudera Free Edition (CDH3)
•   Automates the installation and configuration
•   Allows Entire cluster (up to 50 nodes)
•   Requiring only root SSH access to Nodes
•   Download Here:
    https://ccp.cloudera.com/display/SUPPORT/Cl
    oudera+Manager+Free+Edition+Download



                                               11
Setup Walkthrough
• Not a pre-set appliance (Requires OS)
• Requires Redhat (CentOS and others
  supported)
• 64bit only
• VMs used:
  – Cloudera Manager
  – Nodes to deploy Hadoop on


                                          12
Now enter your 2 or more Hadoop
          Node names




                                  13
Yeh!




       14
Starting the Data Import from File




                                     15
Choosing the format of the data




                                  16
Let’s load it!




                 17
Create a Select QUERY from our new
        table and Execute it




                                     18
Monitor the log report as the query is
              executed




                                     19
What a wonderful output! 




                             20
Appliance Review Time!
• Post any questions you may have in the Q&A
  section and we’ll answer ALL
  – Either now using the web Q&A button
  – Or here at the Cloud Administration and
    Virtualization Chatter Group




                                               21
So what makes a great appliance?
•   Does the job – no more, no less
•   Quick and simple setup
•   Quick and easy updates
•   Easy control of one of many instances
•   Simple Infrastructure requirements
•   Reliable underlying system
•   No delays doing it’s job
•   What else?

                                            22
Is CDH3 really an appliance?
               A great one?
•   Does the job
•   Quick and simple setup
•   Quick and easy updates
•   Easy control of one or many instances
•   Simple Infrastructure requirements
•   Reliable underlying system
•   No delays doing it’s job

                                            23
But an appliance should be Pre-
             Installed – Right?
•   Probably
•   But still, a quick manual setup -- not big deal
•   Manual setup = flexibility (you choose OS)
•   Cloudera are a startup, manual = faster ship
•   Internal startups could do the same…
•   Address an urge = popular even if imperfect



                                                      24
Q&A Time!

           Want to be one of the first to get a copy
           of a pre-set ready to use Hadoop VM?
Then Before we sign off

1.   Join the Cloud Administration and Virtualization Chatter group
2.   Post a request to join the “CA Software as a Virtual Appliance” SIG

Few Questions:

1. How much do licenses cost? (Free up to 50)
2. Will we have to change our software to feed our logs to hadoop? (yep, parallel…)
3. Hadoop uses be for, by IT software companies? (self service, mining, sharing I/O)


                                                                                       25

More Related Content

Viewers also liked

Viewers also liked (14)

Yarns About Yarn
Yarns About YarnYarns About Yarn
Yarns About Yarn
 
Unlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator OptimizerUnlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator Optimizer
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
 
Hadoop administration using cloudera student lab guidebook
Hadoop administration using cloudera   student lab guidebookHadoop administration using cloudera   student lab guidebook
Hadoop administration using cloudera student lab guidebook
 
A beginners guide to Cloudera Hadoop
A beginners guide to Cloudera HadoopA beginners guide to Cloudera Hadoop
A beginners guide to Cloudera Hadoop
 
Hadoop & Cloudera Workshop
Hadoop & Cloudera WorkshopHadoop & Cloudera Workshop
Hadoop & Cloudera Workshop
 
Cloudera Desktop
Cloudera DesktopCloudera Desktop
Cloudera Desktop
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala Internals
 
Hadoop I/O Analysis
Hadoop I/O AnalysisHadoop I/O Analysis
Hadoop I/O Analysis
 
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
 
Data Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache HadoopData Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache Hadoop
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
 
Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2
 
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaHBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 

Instant Hadoop of your Own (Cloudera Big Data Appliance Review)

  • 1. Instant Hadoop of your Own Created by Jack Bezalel Senior IT Architect As part of the CTE Mentorship Program and the “CA Software as an Appliance SIG” CA Technologies 1
  • 3. Why did we pick this appliance? • #21 out of 800+ “Most Popular” @ VMware • #9 Most popular if we discount OSs • Hadoop is hot (becoming a strategic tool) • Double Value - you’d want the app anyway 3
  • 4. What’s Hadoop all about? OPPORTUNITY: We have access to amazingly valuable data (Social Media, Mobile, …) 4
  • 5. What’s Hadoop all about? • Challenges: – Data is seldom UN-Structured – Can’t predict queries in advance – Can’t optimize via SQL / Indexing – Too much data for one node / DB 5
  • 6. What’s in Hadoop? • Reliable data storage using the Hadoop Distributed File System (HDFS) • High-Performance parallel data processing • Map / Reduce 6
  • 7. What’s in Hadoop? MapReduce Picture Attribution: Lukas Kästner at http://www.flickr.com/photos/lkaestner/ 7
  • 8. How does it scale so well? • Commodity, Shared-Nothing Servers • Dynamic Node Activation / Deactivation • Self Healing 8
  • 9. Who uses Hadoop? • Originally developed and employed by and • Hadoop is now widely used in – Finance – Technology – Telecom – media and entertainment – Government – research institutions and other markets with significant data. 9
  • 10. Why did we use Cloudera’s Hadoop kit? • Active Hadoop contributor • Enterprise-ready • Developer friendly (Java classes) • Saves time – Bundling + Rigorous testing 10
  • 11. Cloudera Free Edition (CDH3) • Automates the installation and configuration • Allows Entire cluster (up to 50 nodes) • Requiring only root SSH access to Nodes • Download Here: https://ccp.cloudera.com/display/SUPPORT/Cl oudera+Manager+Free+Edition+Download 11
  • 12. Setup Walkthrough • Not a pre-set appliance (Requires OS) • Requires Redhat (CentOS and others supported) • 64bit only • VMs used: – Cloudera Manager – Nodes to deploy Hadoop on 12
  • 13. Now enter your 2 or more Hadoop Node names 13
  • 14. Yeh! 14
  • 15. Starting the Data Import from File 15
  • 16. Choosing the format of the data 16
  • 18. Create a Select QUERY from our new table and Execute it 18
  • 19. Monitor the log report as the query is executed 19
  • 20. What a wonderful output!  20
  • 21. Appliance Review Time! • Post any questions you may have in the Q&A section and we’ll answer ALL – Either now using the web Q&A button – Or here at the Cloud Administration and Virtualization Chatter Group 21
  • 22. So what makes a great appliance? • Does the job – no more, no less • Quick and simple setup • Quick and easy updates • Easy control of one of many instances • Simple Infrastructure requirements • Reliable underlying system • No delays doing it’s job • What else? 22
  • 23. Is CDH3 really an appliance? A great one? • Does the job • Quick and simple setup • Quick and easy updates • Easy control of one or many instances • Simple Infrastructure requirements • Reliable underlying system • No delays doing it’s job 23
  • 24. But an appliance should be Pre- Installed – Right? • Probably • But still, a quick manual setup -- not big deal • Manual setup = flexibility (you choose OS) • Cloudera are a startup, manual = faster ship • Internal startups could do the same… • Address an urge = popular even if imperfect 24
  • 25. Q&A Time! Want to be one of the first to get a copy of a pre-set ready to use Hadoop VM? Then Before we sign off 1. Join the Cloud Administration and Virtualization Chatter group 2. Post a request to join the “CA Software as a Virtual Appliance” SIG Few Questions: 1. How much do licenses cost? (Free up to 50) 2. Will we have to change our software to feed our logs to hadoop? (yep, parallel…) 3. Hadoop uses be for, by IT software companies? (self service, mining, sharing I/O) 25