SlideShare a Scribd company logo
Instant Hadoop of your Own




           Created by Jack Bezalel
              Senior IT Architect
   As part of the CTE Mentorship Program
  and the “CA Software as an Appliance SIG”
               CA Technologies                1
Cloudera Hadoop Appliance




                            2
Why did we pick this appliance?




•   #21 out of 800+ “Most Popular” @ VMware
•   #9 Most popular if we discount OSs
•   Hadoop is hot (becoming a strategic tool)
•   Double Value - you’d want the app anyway

                                                3
What’s Hadoop all about?
            OPPORTUNITY:
We have access to amazingly valuable data
       (Social Media, Mobile, …)




                                            4
What’s Hadoop all about?
• Challenges:
  – Data is seldom UN-Structured
  – Can’t predict queries in advance
  – Can’t optimize via
    SQL / Indexing
  – Too much data for
   one node / DB



                                       5
What’s in Hadoop?
• Reliable data storage using the Hadoop
  Distributed File System (HDFS)
• High-Performance
  parallel
  data processing
• Map / Reduce



                                           6
What’s in Hadoop?
                                   MapReduce




Picture Attribution: Lukas Kästner at http://www.flickr.com/photos/lkaestner/




                                                                                7
How does it scale so well?
• Commodity, Shared-Nothing Servers
• Dynamic Node Activation / Deactivation
• Self Healing




                                           8
Who uses Hadoop?
• Originally developed and employed by               and

• Hadoop is now widely used in
  –   Finance
  –   Technology
  –   Telecom
  –   media and entertainment
  –   Government
  –   research institutions and other markets with
      significant data.
                                                           9
Why did we use Cloudera’s Hadoop
                  kit?
•   Active Hadoop contributor
•   Enterprise-ready
•   Developer friendly (Java classes)
•   Saves time – Bundling +
    Rigorous testing




                                        10
Cloudera Free Edition (CDH3)
•   Automates the installation and configuration
•   Allows Entire cluster (up to 50 nodes)
•   Requiring only root SSH access to Nodes
•   Download Here:
    https://ccp.cloudera.com/display/SUPPORT/Cl
    oudera+Manager+Free+Edition+Download



                                               11
Setup Walkthrough
• Not a pre-set appliance (Requires OS)
• Requires Redhat (CentOS and others
  supported)
• 64bit only
• VMs used:
  – Cloudera Manager
  – Nodes to deploy Hadoop on


                                          12
Now enter your 2 or more Hadoop
          Node names




                                  13
Yeh!




       14
Starting the Data Import from File




                                     15
Choosing the format of the data




                                  16
Let’s load it!




                 17
Create a Select QUERY from our new
        table and Execute it




                                     18
Monitor the log report as the query is
              executed




                                     19
What a wonderful output! 




                             20
Appliance Review Time!
• Post any questions you may have in the Q&A
  section and we’ll answer ALL
  – Either now using the web Q&A button
  – Or here at the Cloud Administration and
    Virtualization Chatter Group




                                               21
So what makes a great appliance?
•   Does the job – no more, no less
•   Quick and simple setup
•   Quick and easy updates
•   Easy control of one of many instances
•   Simple Infrastructure requirements
•   Reliable underlying system
•   No delays doing it’s job
•   What else?

                                            22
Is CDH3 really an appliance?
               A great one?
•   Does the job
•   Quick and simple setup
•   Quick and easy updates
•   Easy control of one or many instances
•   Simple Infrastructure requirements
•   Reliable underlying system
•   No delays doing it’s job

                                            23
But an appliance should be Pre-
             Installed – Right?
•   Probably
•   But still, a quick manual setup -- not big deal
•   Manual setup = flexibility (you choose OS)
•   Cloudera are a startup, manual = faster ship
•   Internal startups could do the same…
•   Address an urge = popular even if imperfect



                                                      24
Q&A Time!

           Want to be one of the first to get a copy
           of a pre-set ready to use Hadoop VM?
Then Before we sign off

1.   Join the Cloud Administration and Virtualization Chatter group
2.   Post a request to join the “CA Software as a Virtual Appliance” SIG

Few Questions:

1. How much do licenses cost? (Free up to 50)
2. Will we have to change our software to feed our logs to hadoop? (yep, parallel…)
3. Hadoop uses be for, by IT software companies? (self service, mining, sharing I/O)


                                                                                       25

More Related Content

Viewers also liked

Viewers also liked (14)

Yarns About Yarn
Yarns About YarnYarns About Yarn
Yarns About Yarn
 
Unlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator OptimizerUnlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator Optimizer
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
 
Hadoop administration using cloudera student lab guidebook
Hadoop administration using cloudera   student lab guidebookHadoop administration using cloudera   student lab guidebook
Hadoop administration using cloudera student lab guidebook
 
A beginners guide to Cloudera Hadoop
A beginners guide to Cloudera HadoopA beginners guide to Cloudera Hadoop
A beginners guide to Cloudera Hadoop
 
Hadoop & Cloudera Workshop
Hadoop & Cloudera WorkshopHadoop & Cloudera Workshop
Hadoop & Cloudera Workshop
 
Cloudera Desktop
Cloudera DesktopCloudera Desktop
Cloudera Desktop
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala Internals
 
Hadoop I/O Analysis
Hadoop I/O AnalysisHadoop I/O Analysis
Hadoop I/O Analysis
 
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
 
Data Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache HadoopData Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache Hadoop
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
 
Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2
 
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaHBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 

Recently uploaded (20)

Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
Motion for AI: Creating Empathy in Technology
Motion for AI: Creating Empathy in TechnologyMotion for AI: Creating Empathy in Technology
Motion for AI: Creating Empathy in Technology
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
Server-Driven User Interface (SDUI) at Priceline
Server-Driven User Interface (SDUI) at PricelineServer-Driven User Interface (SDUI) at Priceline
Server-Driven User Interface (SDUI) at Priceline
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Transforming The New York Times: Empowering Evolution through UX
Transforming The New York Times: Empowering Evolution through UXTransforming The New York Times: Empowering Evolution through UX
Transforming The New York Times: Empowering Evolution through UX
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 

Instant Hadoop of your Own (Cloudera Big Data Appliance Review)

  • 1. Instant Hadoop of your Own Created by Jack Bezalel Senior IT Architect As part of the CTE Mentorship Program and the “CA Software as an Appliance SIG” CA Technologies 1
  • 3. Why did we pick this appliance? • #21 out of 800+ “Most Popular” @ VMware • #9 Most popular if we discount OSs • Hadoop is hot (becoming a strategic tool) • Double Value - you’d want the app anyway 3
  • 4. What’s Hadoop all about? OPPORTUNITY: We have access to amazingly valuable data (Social Media, Mobile, …) 4
  • 5. What’s Hadoop all about? • Challenges: – Data is seldom UN-Structured – Can’t predict queries in advance – Can’t optimize via SQL / Indexing – Too much data for one node / DB 5
  • 6. What’s in Hadoop? • Reliable data storage using the Hadoop Distributed File System (HDFS) • High-Performance parallel data processing • Map / Reduce 6
  • 7. What’s in Hadoop? MapReduce Picture Attribution: Lukas Kästner at http://www.flickr.com/photos/lkaestner/ 7
  • 8. How does it scale so well? • Commodity, Shared-Nothing Servers • Dynamic Node Activation / Deactivation • Self Healing 8
  • 9. Who uses Hadoop? • Originally developed and employed by and • Hadoop is now widely used in – Finance – Technology – Telecom – media and entertainment – Government – research institutions and other markets with significant data. 9
  • 10. Why did we use Cloudera’s Hadoop kit? • Active Hadoop contributor • Enterprise-ready • Developer friendly (Java classes) • Saves time – Bundling + Rigorous testing 10
  • 11. Cloudera Free Edition (CDH3) • Automates the installation and configuration • Allows Entire cluster (up to 50 nodes) • Requiring only root SSH access to Nodes • Download Here: https://ccp.cloudera.com/display/SUPPORT/Cl oudera+Manager+Free+Edition+Download 11
  • 12. Setup Walkthrough • Not a pre-set appliance (Requires OS) • Requires Redhat (CentOS and others supported) • 64bit only • VMs used: – Cloudera Manager – Nodes to deploy Hadoop on 12
  • 13. Now enter your 2 or more Hadoop Node names 13
  • 14. Yeh! 14
  • 15. Starting the Data Import from File 15
  • 16. Choosing the format of the data 16
  • 18. Create a Select QUERY from our new table and Execute it 18
  • 19. Monitor the log report as the query is executed 19
  • 20. What a wonderful output!  20
  • 21. Appliance Review Time! • Post any questions you may have in the Q&A section and we’ll answer ALL – Either now using the web Q&A button – Or here at the Cloud Administration and Virtualization Chatter Group 21
  • 22. So what makes a great appliance? • Does the job – no more, no less • Quick and simple setup • Quick and easy updates • Easy control of one of many instances • Simple Infrastructure requirements • Reliable underlying system • No delays doing it’s job • What else? 22
  • 23. Is CDH3 really an appliance? A great one? • Does the job • Quick and simple setup • Quick and easy updates • Easy control of one or many instances • Simple Infrastructure requirements • Reliable underlying system • No delays doing it’s job 23
  • 24. But an appliance should be Pre- Installed – Right? • Probably • But still, a quick manual setup -- not big deal • Manual setup = flexibility (you choose OS) • Cloudera are a startup, manual = faster ship • Internal startups could do the same… • Address an urge = popular even if imperfect 24
  • 25. Q&A Time! Want to be one of the first to get a copy of a pre-set ready to use Hadoop VM? Then Before we sign off 1. Join the Cloud Administration and Virtualization Chatter group 2. Post a request to join the “CA Software as a Virtual Appliance” SIG Few Questions: 1. How much do licenses cost? (Free up to 50) 2. Will we have to change our software to feed our logs to hadoop? (yep, parallel…) 3. Hadoop uses be for, by IT software companies? (self service, mining, sharing I/O) 25