SlideShare a Scribd company logo
1 of 13
WHICH HADOOP TO USE?
Demet Aksoy
Brane
Hadoop Modules
 HDFS (Hadoop Distributed File System): Actual storage of data over
distributed files system
 Hadoop MapReduce: The module introduced since first version of Hadoop for
parallel processing of large data sets
 Hadoop YARN: Job scheduling and cluster resource management framework
 Introduced since Hadoop 2.0, Hadoop 1.0 took care of scheduling within
MapReduce
 Hadoop Common: Common utilities supporting other Hadoop modules, e.g.,
administration
 Others: Zookeeper, Oozie, Hue, Sqoop, Flume, Kafka, Hive, Pig, Spark etc.
Main Hadoop Distibutions
 Cloudera
 Founded in 2008 by a group of engineers from Yahoo, Google, Facebook
 Largest user base so far
 Core distribution based on Apache Hadoop
 Also proprietary Cloudera Management Suite to automate installation, other services for convenience of usage
 Hortonworks
 Founded in 2011
 Only vendor with completely open source distribution
 Innovations such as YARN
 MapR
 Standard open source comes with a number of restrictions
 Will lead towards vendor distributions eventually
 Replaces HDFS with own proprietary file system MapRFS incorporating enterprise-grade features with ease of
use
Cloudera Internals
(journaldunet.com)
CDH (Cloudera Hadoop)
includes core modules
along with additional
components for :
• user interface,
• security, and
• data integration
Cloudera – Cloudera Manager
 Run cloudera manager to enable visual administration at any scale
 sudo ~/cloudera-manager –force
 Cloudera manager menu
 Following option will be available to enable
 Provides visual dashboard for all modules:
 Hosts, flime, hbase, hdfs, hive, hue, impala, ks-indexer, oozie, sentry, solr, spark, sqoop, yarn,
zookeeper ….
 Can start/stop/restart/Rolling Restart services with a click
 Health and Configuration Issues are flagged and Recent Commands are logged
 60 day free trial if you want to check it out
Home Clusters Hosts Diagnostics Audit Charts Administration
Cloudera Products
 Cloudera Express
 Free download combining CDH with Cloudera Manager
 Provides robust cluster management capabilities like automated deployment, centralized
administration, monitoring, and diagnostic tools
 Cloudera Enterprise
 In addition to CDH provides advanced system management and data management tools
 Includes dedicated support from Cloudera
 Cloudera Director
 Includes Cloudera Enterprise functionality plus extends enterprise data hub architecture to
the cloud
Hortonworks
Truly open source since 2011, one leading vendor
Hortonworks Pearls
 Only distribution that can run without a VM (Virtual Machine) on Windows
 Open source; you will not be lead to purchase eventually
 Similar to Cloudera
 Both enterprise-ready distributions for a while
 Both have established communities to consult
 Differences
 Hortonworks open source; Cloudera 60 day free trial
 Both work on Windows but Hortonworks has native; windows based cluster can be
deployed on Windows Azure using HDInsight service
 Cloudera has Cloudera Manager, Impala (SQL handling interface), and Cloudera Search.
Hortonworks has Ambari, Stringer and Apache Solr correspondingly
MapR
 MapR is different than the two with its own proprietary file system MapRFS
mapr.com
MapR Details
 Standard open source edition comes with a number of restrictions
 Vendor distributions aimed at covering these issues (so will have to move to
vendor distribution over time
 Through a partnership with Canonical (creator of Ubuntu) MapR offering as a
default component of Ubuntu operating system starting MapR M3 Edition
 Upto M3 Edition MapR is free but free version lacks some proprietary features
such as JobTracker HA, NameNode HA, NFS-HA, Mirroring, Snapshot etc
 MapR M5 Edition and on is not free but provides 24/7 support and annual
subscription model
Three Distributions
 MapR
 If you can afford and do not mind a different approach than Apache Hadoop
consider MapR
 Provides a complete stack
 Cloudera
 Based on open source Apache Hadoop with proprietary tools
 Similar to MapR provides both free and paid distributions with extra features
and support
 Hortonworks
 Only commercial vendor to provide complete open source Hadoop
 Hortonworks intentionally has not developed proprietary software and uses
open source tools like Ambari, Stringer, and Solr
So Which One?
 Your goal should be to figure out the best choice for your
business; there is not a single right choice
 Good news: all provide free versions – you can try it out
 If you do I suggest checking our benchmarking efforts and
develop your own tests
 All three offer consulting, training, and technical
assistance
 Consider added value according to your customer base
What to pay attention to?
 If you are looking at existing benchmarking studies you need to note
that
 You need to understand the experiment setting and parameters more than
the results
 It is possible to alter performance using different data sets, different sizes of
clusters, or different number of virtual machines etc.
 Your typical workload can be way different than the ones used in the study
 Try to get your own workload as the basis for your analysis
 You should stress test your results
 Do you expect to have extreme workloads
 How critical is it if you do
 How much of a slow down/approximation etc can you tolerate
 Can you generate realistic sampling

More Related Content

What's hot

Solution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab AcceleratorSolution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab AcceleratorBlueData, Inc.
 
Introduction to Apache hadoop
Introduction to Apache hadoopIntroduction to Apache hadoop
Introduction to Apache hadoopOmar Jaber
 
Introduction to Data Analyst Training
Introduction to Data Analyst TrainingIntroduction to Data Analyst Training
Introduction to Data Analyst TrainingCloudera, Inc.
 
Hadoop Overview
Hadoop Overview Hadoop Overview
Hadoop Overview EMC
 
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyScaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyRohit Kulkarni
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopVigen Sahakyan
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemInSemble
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystemsunera pathan
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdfEdureka!
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introductionXuan-Chao Huang
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programPraveen Kumar Donta
 

What's hot (20)

Solution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab AcceleratorSolution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab Accelerator
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Big Data and Hadoop - An Introduction
Big Data and Hadoop - An IntroductionBig Data and Hadoop - An Introduction
Big Data and Hadoop - An Introduction
 
Introduction to Apache hadoop
Introduction to Apache hadoopIntroduction to Apache hadoop
Introduction to Apache hadoop
 
Hadoop white papers
Hadoop white papersHadoop white papers
Hadoop white papers
 
Introduction to Data Analyst Training
Introduction to Data Analyst TrainingIntroduction to Data Analyst Training
Introduction to Data Analyst Training
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop Overview
Hadoop Overview Hadoop Overview
Hadoop Overview
 
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyScaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop Ecosystem
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdf
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction
 
Hadoop vs Apache Spark
Hadoop vs Apache SparkHadoop vs Apache Spark
Hadoop vs Apache Spark
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce program
 

Viewers also liked

IDCC 2190 Avenant n 59 augmentation valeur du point au 1er septembre 2016
IDCC 2190 Avenant n 59 augmentation valeur du point au 1er septembre 2016IDCC 2190 Avenant n 59 augmentation valeur du point au 1er septembre 2016
IDCC 2190 Avenant n 59 augmentation valeur du point au 1er septembre 2016Société Tripalio
 
Power point xmas laura sánchez
Power point xmas laura sánchezPower point xmas laura sánchez
Power point xmas laura sánchezanatomasbreton
 
A kezdetektől a könyvnyomtatásig
A kezdetektől a könyvnyomtatásigA kezdetektől a könyvnyomtatásig
A kezdetektől a könyvnyomtatásigTícia Megulesz
 
Habilitation d'INTERGROS par arrêté
Habilitation d'INTERGROS par arrêtéHabilitation d'INTERGROS par arrêté
Habilitation d'INTERGROS par arrêtéSociété Tripalio
 
Sanction de 2,5 millions d'euros contre AXA
Sanction de 2,5 millions d'euros contre AXASanction de 2,5 millions d'euros contre AXA
Sanction de 2,5 millions d'euros contre AXASociété Tripalio
 
Strata+Hadoop World NY 2016 - Avinash Ramineni
Strata+Hadoop World NY 2016 - Avinash RamineniStrata+Hadoop World NY 2016 - Avinash Ramineni
Strata+Hadoop World NY 2016 - Avinash RamineniAvinash Ramineni
 
Introduction to Apache Drill - interactive query and analysis at scale
Introduction to Apache Drill - interactive query and analysis at scaleIntroduction to Apache Drill - interactive query and analysis at scale
Introduction to Apache Drill - interactive query and analysis at scaleMapR Technologies
 
Track 5 session 2 - st dev con 2016 - security iot best practices
Track 5   session 2 - st dev con 2016 - security iot best practicesTrack 5   session 2 - st dev con 2016 - security iot best practices
Track 5 session 2 - st dev con 2016 - security iot best practicesST_World
 
Data validation in web applications
Data validation in web applicationsData validation in web applications
Data validation in web applicationssrkirkland
 
Track 1 session 7 - st dev con 2016 - smart cities
Track 1   session 7 - st dev con 2016 - smart citiesTrack 1   session 7 - st dev con 2016 - smart cities
Track 1 session 7 - st dev con 2016 - smart citiesST_World
 
Track 3 session 1 - st dev con 2016 -ieee- iot standards adn open source
Track 3   session 1 - st dev con 2016 -ieee- iot standards adn open sourceTrack 3   session 1 - st dev con 2016 -ieee- iot standards adn open source
Track 3 session 1 - st dev con 2016 -ieee- iot standards adn open sourceST_World
 
Track 3 session 4 - st dev con 2016 - sensortile
Track 3   session 4 - st dev con 2016 - sensortileTrack 3   session 4 - st dev con 2016 - sensortile
Track 3 session 4 - st dev con 2016 - sensortileST_World
 
Track 4 session 6 - st dev con 2016 - samsung artik
Track 4   session 6 - st dev con 2016 - samsung artikTrack 4   session 6 - st dev con 2016 - samsung artik
Track 4 session 6 - st dev con 2016 - samsung artikST_World
 
ventas de ordenadores de escritorio y portatiles
ventas de ordenadores de escritorio y portatiles ventas de ordenadores de escritorio y portatiles
ventas de ordenadores de escritorio y portatiles Jeyber Quiguanas
 
Pfsense Firewall ve Router Eğitimi
Pfsense Firewall ve Router EğitimiPfsense Firewall ve Router Eğitimi
Pfsense Firewall ve Router EğitimiBGA Cyber Security
 
İnternet Üzerinde Anonimlik ve Tespit Yöntemleri
İnternet Üzerinde Anonimlik ve Tespit Yöntemleriİnternet Üzerinde Anonimlik ve Tespit Yöntemleri
İnternet Üzerinde Anonimlik ve Tespit YöntemleriBGA Cyber Security
 

Viewers also liked (18)

Cox.CutSheets2015
Cox.CutSheets2015Cox.CutSheets2015
Cox.CutSheets2015
 
IDCC 2190 Avenant n 59 augmentation valeur du point au 1er septembre 2016
IDCC 2190 Avenant n 59 augmentation valeur du point au 1er septembre 2016IDCC 2190 Avenant n 59 augmentation valeur du point au 1er septembre 2016
IDCC 2190 Avenant n 59 augmentation valeur du point au 1er septembre 2016
 
Power point xmas laura sánchez
Power point xmas laura sánchezPower point xmas laura sánchez
Power point xmas laura sánchez
 
A kezdetektől a könyvnyomtatásig
A kezdetektől a könyvnyomtatásigA kezdetektől a könyvnyomtatásig
A kezdetektől a könyvnyomtatásig
 
Habilitation d'INTERGROS par arrêté
Habilitation d'INTERGROS par arrêtéHabilitation d'INTERGROS par arrêté
Habilitation d'INTERGROS par arrêté
 
Sanction de 2,5 millions d'euros contre AXA
Sanction de 2,5 millions d'euros contre AXASanction de 2,5 millions d'euros contre AXA
Sanction de 2,5 millions d'euros contre AXA
 
Strata+Hadoop World NY 2016 - Avinash Ramineni
Strata+Hadoop World NY 2016 - Avinash RamineniStrata+Hadoop World NY 2016 - Avinash Ramineni
Strata+Hadoop World NY 2016 - Avinash Ramineni
 
Introduction to Apache Drill - interactive query and analysis at scale
Introduction to Apache Drill - interactive query and analysis at scaleIntroduction to Apache Drill - interactive query and analysis at scale
Introduction to Apache Drill - interactive query and analysis at scale
 
Track 5 session 2 - st dev con 2016 - security iot best practices
Track 5   session 2 - st dev con 2016 - security iot best practicesTrack 5   session 2 - st dev con 2016 - security iot best practices
Track 5 session 2 - st dev con 2016 - security iot best practices
 
Data validation in web applications
Data validation in web applicationsData validation in web applications
Data validation in web applications
 
Data validation
Data validationData validation
Data validation
 
Track 1 session 7 - st dev con 2016 - smart cities
Track 1   session 7 - st dev con 2016 - smart citiesTrack 1   session 7 - st dev con 2016 - smart cities
Track 1 session 7 - st dev con 2016 - smart cities
 
Track 3 session 1 - st dev con 2016 -ieee- iot standards adn open source
Track 3   session 1 - st dev con 2016 -ieee- iot standards adn open sourceTrack 3   session 1 - st dev con 2016 -ieee- iot standards adn open source
Track 3 session 1 - st dev con 2016 -ieee- iot standards adn open source
 
Track 3 session 4 - st dev con 2016 - sensortile
Track 3   session 4 - st dev con 2016 - sensortileTrack 3   session 4 - st dev con 2016 - sensortile
Track 3 session 4 - st dev con 2016 - sensortile
 
Track 4 session 6 - st dev con 2016 - samsung artik
Track 4   session 6 - st dev con 2016 - samsung artikTrack 4   session 6 - st dev con 2016 - samsung artik
Track 4 session 6 - st dev con 2016 - samsung artik
 
ventas de ordenadores de escritorio y portatiles
ventas de ordenadores de escritorio y portatiles ventas de ordenadores de escritorio y portatiles
ventas de ordenadores de escritorio y portatiles
 
Pfsense Firewall ve Router Eğitimi
Pfsense Firewall ve Router EğitimiPfsense Firewall ve Router Eğitimi
Pfsense Firewall ve Router Eğitimi
 
İnternet Üzerinde Anonimlik ve Tespit Yöntemleri
İnternet Üzerinde Anonimlik ve Tespit Yöntemleriİnternet Üzerinde Anonimlik ve Tespit Yöntemleri
İnternet Üzerinde Anonimlik ve Tespit Yöntemleri
 

Similar to HadoopDistributions

Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?Edureka!
 
Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014Data Con LA
 
Hadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersHadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersMrigendra Sharma
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks
 
Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.Muthu Natarajan
 
Mrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataMrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataPatrickCrompton
 
Hadoop essentials by shiva achari - sample chapter
Hadoop essentials by shiva achari - sample chapterHadoop essentials by shiva achari - sample chapter
Hadoop essentials by shiva achari - sample chapterShiva Achari
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop BasicsSonal Tiwari
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony NguyenThanh Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 
Big Data Hadoop Technology
Big Data Hadoop TechnologyBig Data Hadoop Technology
Big Data Hadoop TechnologyRahul Sharma
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Hortonworks
 
Bigdataappliance datasheet-1883358
Bigdataappliance datasheet-1883358Bigdataappliance datasheet-1883358
Bigdataappliance datasheet-1883358Ory Chhean
 
Hadoop Training in Delhi
Hadoop Training in DelhiHadoop Training in Delhi
Hadoop Training in DelhiAPTRON
 
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero DowntimeHow to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero DowntimeIan Lumb
 

Similar to HadoopDistributions (20)

Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
 
Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014
 
Cap 10 ingles
Cap  10 inglesCap  10 ingles
Cap 10 ingles
 
Cap 10 ingles
Cap  10 inglesCap  10 ingles
Cap 10 ingles
 
Hadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersHadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, Providers
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.
 
Apresentação Hadoop
Apresentação HadoopApresentação Hadoop
Apresentação Hadoop
 
Mrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataMrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big Data
 
Hadoop essentials by shiva achari - sample chapter
Hadoop essentials by shiva achari - sample chapterHadoop essentials by shiva achari - sample chapter
Hadoop essentials by shiva achari - sample chapter
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Big Data Hadoop Technology
Big Data Hadoop TechnologyBig Data Hadoop Technology
Big Data Hadoop Technology
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]
 
Bigdataappliance datasheet-1883358
Bigdataappliance datasheet-1883358Bigdataappliance datasheet-1883358
Bigdataappliance datasheet-1883358
 
Hadoop Training in Delhi
Hadoop Training in DelhiHadoop Training in Delhi
Hadoop Training in Delhi
 
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero DowntimeHow to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
 
Why Hadoop as a Service?
Why Hadoop as a Service?Why Hadoop as a Service?
Why Hadoop as a Service?
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 

HadoopDistributions

  • 1. WHICH HADOOP TO USE? Demet Aksoy Brane
  • 2. Hadoop Modules  HDFS (Hadoop Distributed File System): Actual storage of data over distributed files system  Hadoop MapReduce: The module introduced since first version of Hadoop for parallel processing of large data sets  Hadoop YARN: Job scheduling and cluster resource management framework  Introduced since Hadoop 2.0, Hadoop 1.0 took care of scheduling within MapReduce  Hadoop Common: Common utilities supporting other Hadoop modules, e.g., administration  Others: Zookeeper, Oozie, Hue, Sqoop, Flume, Kafka, Hive, Pig, Spark etc.
  • 3. Main Hadoop Distibutions  Cloudera  Founded in 2008 by a group of engineers from Yahoo, Google, Facebook  Largest user base so far  Core distribution based on Apache Hadoop  Also proprietary Cloudera Management Suite to automate installation, other services for convenience of usage  Hortonworks  Founded in 2011  Only vendor with completely open source distribution  Innovations such as YARN  MapR  Standard open source comes with a number of restrictions  Will lead towards vendor distributions eventually  Replaces HDFS with own proprietary file system MapRFS incorporating enterprise-grade features with ease of use
  • 4. Cloudera Internals (journaldunet.com) CDH (Cloudera Hadoop) includes core modules along with additional components for : • user interface, • security, and • data integration
  • 5. Cloudera – Cloudera Manager  Run cloudera manager to enable visual administration at any scale  sudo ~/cloudera-manager –force  Cloudera manager menu  Following option will be available to enable  Provides visual dashboard for all modules:  Hosts, flime, hbase, hdfs, hive, hue, impala, ks-indexer, oozie, sentry, solr, spark, sqoop, yarn, zookeeper ….  Can start/stop/restart/Rolling Restart services with a click  Health and Configuration Issues are flagged and Recent Commands are logged  60 day free trial if you want to check it out Home Clusters Hosts Diagnostics Audit Charts Administration
  • 6. Cloudera Products  Cloudera Express  Free download combining CDH with Cloudera Manager  Provides robust cluster management capabilities like automated deployment, centralized administration, monitoring, and diagnostic tools  Cloudera Enterprise  In addition to CDH provides advanced system management and data management tools  Includes dedicated support from Cloudera  Cloudera Director  Includes Cloudera Enterprise functionality plus extends enterprise data hub architecture to the cloud
  • 7. Hortonworks Truly open source since 2011, one leading vendor
  • 8. Hortonworks Pearls  Only distribution that can run without a VM (Virtual Machine) on Windows  Open source; you will not be lead to purchase eventually  Similar to Cloudera  Both enterprise-ready distributions for a while  Both have established communities to consult  Differences  Hortonworks open source; Cloudera 60 day free trial  Both work on Windows but Hortonworks has native; windows based cluster can be deployed on Windows Azure using HDInsight service  Cloudera has Cloudera Manager, Impala (SQL handling interface), and Cloudera Search. Hortonworks has Ambari, Stringer and Apache Solr correspondingly
  • 9. MapR  MapR is different than the two with its own proprietary file system MapRFS mapr.com
  • 10. MapR Details  Standard open source edition comes with a number of restrictions  Vendor distributions aimed at covering these issues (so will have to move to vendor distribution over time  Through a partnership with Canonical (creator of Ubuntu) MapR offering as a default component of Ubuntu operating system starting MapR M3 Edition  Upto M3 Edition MapR is free but free version lacks some proprietary features such as JobTracker HA, NameNode HA, NFS-HA, Mirroring, Snapshot etc  MapR M5 Edition and on is not free but provides 24/7 support and annual subscription model
  • 11. Three Distributions  MapR  If you can afford and do not mind a different approach than Apache Hadoop consider MapR  Provides a complete stack  Cloudera  Based on open source Apache Hadoop with proprietary tools  Similar to MapR provides both free and paid distributions with extra features and support  Hortonworks  Only commercial vendor to provide complete open source Hadoop  Hortonworks intentionally has not developed proprietary software and uses open source tools like Ambari, Stringer, and Solr
  • 12. So Which One?  Your goal should be to figure out the best choice for your business; there is not a single right choice  Good news: all provide free versions – you can try it out  If you do I suggest checking our benchmarking efforts and develop your own tests  All three offer consulting, training, and technical assistance  Consider added value according to your customer base
  • 13. What to pay attention to?  If you are looking at existing benchmarking studies you need to note that  You need to understand the experiment setting and parameters more than the results  It is possible to alter performance using different data sets, different sizes of clusters, or different number of virtual machines etc.  Your typical workload can be way different than the ones used in the study  Try to get your own workload as the basis for your analysis  You should stress test your results  Do you expect to have extreme workloads  How critical is it if you do  How much of a slow down/approximation etc can you tolerate  Can you generate realistic sampling