SlideShare a Scribd company logo
Hadoop Platforms
1
11/2/2016
Introduction
 Hadoop was created by Doug Cutting and Mike Cafarella in 2005. It was named after a toy elephant.
It was originally developed to support distribution for the Nutch search engine project.
 Hadoop is an open-source software framework for storing data and running applications on clusters. It provides immense
storage for any kind of data, enormous processing power and the ability to handle limitless concurrent tasks.
 Hadoop is a highly scalable analytics platform and can process multiple petabytes of data spread across hundreds or
thousands of physical storage servers or nodes.
 It provides:
 Redundant, fault-tolerant data storage
 Parallel computation framework
 Job Coordination
 Hadoop is a solution to manage Big Data, it is framework for running data management applications on a
large cluster built of commodity hardware.
2
3
11/2/2016
Importance of Hadoop
 Ability to store and process huge amounts of any kind of data, quickly.
 Computing power- Hadoop's distributed computing model processes big data
faster.
 Fault tolerance- Data and application processing are protected against hardware
failure. If a node goes down, jobs are automatically redirected to other nodes to
make sure the distributed computing does not fail.
 Flexibility- structured and unstructured both kinds of data can be stored
without pre-processing them.
 Low cost- The open-source framework is free and uses commodity hardware to
store large quantities of data.
 Scalability- Nodes can be added as and when needed and maintenance cost is
very less.
http://www.sas.com/content/sascom/en_us/insights/big-data/hadoop/_jcr_content/par/styledcontainer_8bf1/par/styledcontainer_a643/par/textimage_ea05/image.img.png/1468851612191.png
3
4
11/2/2016
Hadoop Core Components
Hadoop is a system for large scale data processing.
It has two main components:
1. HDFS – Hadoop Distributed File System (Storage)
 Distributed across “nodes”
 Natively redundant
 NameNode tracks locations.
2. MapReduce (Processing)
 Splits a task across processors
 “near” the data & assembles results
 Self-healing, High Bandwidth
 Clustured storage
 JobTracker manages the TaskTrackers
http://cdn.edureka.co/blog/wp-content/uploads/2014/08/hadoop1componenets.png
4
5
11/2/2016
Top 5 Hadoop Platform Providers
 A software framework which provides the necessary tools to
carry out Big Data analysis is widely used across industries.
 It is open-source, designed to be user-friendly, in its “raw”
state it still needs considerable specialist knowledge to set up
and run.
 “Hadoop-as-a-Service” has evolved in recent times, all of the
installation will actually take place within the vendors own
cloud, with customers paying a subscription to access the
services.
 The top 5 Hadoop platform providers are:
 IBM
 Amazon Web Services
 Hortonworks
 Cloudera
 MapR
https://media.licdn.com/mpr/mpr/AAEAAQAAAAAAAAclAAAAJDZmZTQwODVlLTAwZGQtNGI3Ny05OTlhLTUzMTEyYTNmMTllMg.jpg
`
5
6
11/2/2016
1. IBM
 IBM has deep roots in the computing industry. Its BigInsights package
adds its proprietary analytics and visualization algorithms to the core
Hadoop infrastructure.
 IBM Open Platform with Apache Hadoop
 Native support for rolling upgrades for Hadoop services
 Support for long-running applications within YARN for enhanced
reliability & security
 Heterogeneous storage in HDFS for in-memory, SSD in addition to
HDD
 Spark in-memory distributed compute engine for dramatic performance increases over MapReduce and simplifies
developer experience, leveraging Java, Python & Scala languages
 Apache Hadoop projects included: HDFS, YARN, MapReduce, Ambari, Hbase, Hive, Oozie, Parquet, Parquet Format,
Pig, Snappy, Solr, Spark, Sqoop, Zookeeper, Open JDK, Knox, Slider
https://www-01.ibm.com/software/in/data/images/bd-platform.jpg
6
7
11/2/2016
2. Amazon Web Services
 Amazon is a frontrunner and offering Hadoop in its cloud services
package.
 Amazon Web Services (AWS) is a hosted solution integrating
Hadoop with Amazon’s Elastic Cloud Compute and Simple Storage
Service (S3) cloud-based data processing and storage services.
 AWS offers a broad set of global compute, storage, database,
analytics, application, and deployment services that help
organizations move faster, lower IT costs, and scale applications.
 AWS are trusted by the largest enterprises and the hottest start-
ups to power a wide variety of workloads including web and
mobile applications, data processing and warehousing, storage,
archive, and many others.
 Big Data on AWS introduces you to cloud-based big data solutions such as Amazon Elastic, MapReduce (EMR),
Amazon Redshift, Amazon Kinesis and the rest of the AWS big data platform.
http://www.strategism.org/wp-content/uploads/2015/06/amazon-800x600.jpg
7
8
11/2/2016
3. Hortonworks
 Horton is one of the few which offer 100% open source
Hadoop technology without any proprietary.
 Horton were also the first to integrate support for Apache
Catalog, which creates “metadata” – data within data –
simplifying the process of sharing your data across other
layers of service such as Apache Hive or Pig.
 HDP (HORTONW0RKS DATA PLATFORM) is the
enterprise-ready open source Apache™
Hadoop® distribution based on a centralized architecture
(YARN).
 HDP addresses the complete needs of data-at-rest, powers real-time customer applications and delivers
robust analytics that accelerate decision making and innovation.
 Hortonworks is all about data: data-in-motion, data-at-rest, and Modern Data Applications. Our Connected
Data Platforms help customers create actionable intelligence to transform their businesses.
http://hortonworks.com/wp-content/uploads/2014/03/11.png
8
9
11/2/2016
4. Cloudera
 Most popular and have largest number of installations running.
 Cloudera contribute Impala, which offers real-time massively parallel
processing of Big Data to Hadoop.
 Cloudera's open-source Apache Hadoop distribution, CDH (Cloudera
Distribution Including Apache Hadoop), targets enterprise-class
deployments of that technology.
 Cloudera says that more than 50% of its engineering output is donated
upstream to the various Apache-licensed open source projects (Apache
Hive, Apache Avro, Apache HBase, and so on) that combine to form the
Hadoop platform.
 Cloudera is a sponsor of the Apache Software Foundation.
http://blog.cloudera.com/wp-content/uploads/2013/06/search.png
9
10
11/2/2016
5. MapR
 MapR uses some differing concepts, such as native support for
UNIX file systems rather than HDFS.
 MapR technologies is spearheading development of the Apache
Drill project, which provides advanced tools for interactive real-
time querying of Big Datasets.
 The MapR Converged Data Platform is the industry’s only
platform to integrate the enormous power of Hadoop and Spark
with global event streaming, real-time database capabilities, and
enterprise storage.
 The MapR Hadoop distribution replaces HDFS with its proprietary
file system, MapR-FS, which is designed to provide more efficient
management of data, reliability and ease of use.
 The MapR Converged Data Platform supports big data storage
and processing through the Apache collection of Hadoop
products, as well as its added-value components.
http://2s7gjr373w3x22jf92z99mgm5w-wpengine.netdna-ssl.com/wp-content/uploads/2016/03/Mapr_Zeta_4-1.png
10
11/2/2016
References
1. http://www.sas.com/en_us/insights/big-data/hadoop.html#hadoopimportance
2. http://www.ironsystems.com/products/hadoop-platforms-overview
3. http://www.slideshare.net/billonahill/intro-to-hadoop-14125097/32-Hadoop_provides_Redundant_faulttolerant_data
4. http://www.computerweekly.com/feature/Big-data-storage-Hadoop-storage-basics
5. https://www.linkedin.com/pulse/big-data-top-10-commercial-hadoop-platforms-bernard-marr
6. http://data-informed.com/10-top-commercial-hadoop-platforms/
7. http://www.cloudera.com/partners/solutions/amazon-web-services.html
8. http://hortonworks.com/products/data-center/hdp/
9. http://www-03.ibm.com/software/products/en/ibm-open-platform-with-apache-hadoop
10. https://en.wikipedia.org/wiki/Cloudera
11. https://www.mapr.com/
12. http://searchdatamanagement.techtarget.com/feature/Inside-the-MapR-Hadoop-distribution-for-managing-big-data
13. http://www.ironnetworks.com/
14. http://www.ironsystems.com/
15. http://shop.ironnetworks.com/
11

More Related Content

What's hot

HDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows AzureHDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows Azure
Lynn Langit
 
Future of-hadoop-analytics
Future of-hadoop-analyticsFuture of-hadoop-analytics
Future of-hadoop-analytics
MapR Technologies
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
HARMAN Services
 
The Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceThe Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-Service
BlueData, Inc.
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
Ameet Paranjape
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
markgrover
 
Big data hadoop rdbms
Big data hadoop rdbmsBig data hadoop rdbms
Big data hadoop rdbms
Arjen de Vries
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
Thanh Nguyen
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
Md. Hasan Basri (Angel)
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopBrock Noland
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
Asis Mohanty
 
Analyst Report : The Enterprise Use of Hadoop
Analyst Report : The Enterprise Use of Hadoop Analyst Report : The Enterprise Use of Hadoop
Analyst Report : The Enterprise Use of Hadoop
EMC
 
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXHow Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
BMC Software
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Agile Testing Alliance
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
Cognizant
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
Hortonworks
 
Big Data: RDBMS vs. Hadoop vs. Spark
Big Data: RDBMS vs. Hadoop vs. SparkBig Data: RDBMS vs. Hadoop vs. Spark
Big Data: RDBMS vs. Hadoop vs. Spark
Graisy Biswal
 
Bigdata and hadoop
Bigdata and hadoopBigdata and hadoop
Bigdata and hadoop
Aditi Yadav
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analyticsjoshwills
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
DataWorks Summit
 

What's hot (20)

HDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows AzureHDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows Azure
 
Future of-hadoop-analytics
Future of-hadoop-analyticsFuture of-hadoop-analytics
Future of-hadoop-analytics
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
 
The Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceThe Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-Service
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
 
Big data hadoop rdbms
Big data hadoop rdbmsBig data hadoop rdbms
Big data hadoop rdbms
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Analyst Report : The Enterprise Use of Hadoop
Analyst Report : The Enterprise Use of Hadoop Analyst Report : The Enterprise Use of Hadoop
Analyst Report : The Enterprise Use of Hadoop
 
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXHow Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
 
Big Data: RDBMS vs. Hadoop vs. Spark
Big Data: RDBMS vs. Hadoop vs. SparkBig Data: RDBMS vs. Hadoop vs. Spark
Big Data: RDBMS vs. Hadoop vs. Spark
 
Bigdata and hadoop
Bigdata and hadoopBigdata and hadoop
Bigdata and hadoop
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analytics
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
 

Viewers also liked

Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Desing Pathshala
 
Big data lambda architecture - Streaming Layer Hands On
Big data lambda architecture - Streaming Layer Hands OnBig data lambda architecture - Streaming Layer Hands On
Big data lambda architecture - Streaming Layer Hands On
hkbhadraa
 
Big Data on AWS - Toronto FSI Symposium - October 2016
Big Data on AWS - Toronto FSI Symposium - October 2016Big Data on AWS - Toronto FSI Symposium - October 2016
Big Data on AWS - Toronto FSI Symposium - October 2016
Amazon Web Services
 
Introduction to Map-Reduce
Introduction to Map-ReduceIntroduction to Map-Reduce
Introduction to Map-Reduce
Brendan Tierney
 
Hadoop Real Life Use Case & MapReduce Details
Hadoop Real Life Use Case & MapReduce DetailsHadoop Real Life Use Case & MapReduce Details
Hadoop Real Life Use Case & MapReduce Details
Anju Singh
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
Lynn Langit
 

Viewers also liked (6)

Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
 
Big data lambda architecture - Streaming Layer Hands On
Big data lambda architecture - Streaming Layer Hands OnBig data lambda architecture - Streaming Layer Hands On
Big data lambda architecture - Streaming Layer Hands On
 
Big Data on AWS - Toronto FSI Symposium - October 2016
Big Data on AWS - Toronto FSI Symposium - October 2016Big Data on AWS - Toronto FSI Symposium - October 2016
Big Data on AWS - Toronto FSI Symposium - October 2016
 
Introduction to Map-Reduce
Introduction to Map-ReduceIntroduction to Map-Reduce
Introduction to Map-Reduce
 
Hadoop Real Life Use Case & MapReduce Details
Hadoop Real Life Use Case & MapReduce DetailsHadoop Real Life Use Case & MapReduce Details
Hadoop Real Life Use Case & MapReduce Details
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 

Similar to Hadoop Platforms - Introduction, Importance, Providers

Hadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceHadoop Ecosystem at a Glance
Hadoop Ecosystem at a Glance
Neev Technologies
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
Mahmoud Yassin
 
Big Data Hadoop Technology
Big Data Hadoop TechnologyBig Data Hadoop Technology
Big Data Hadoop Technology
Rahul Sharma
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
Sonal Tiwari
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
Ajay Ohri
 
Hadoop
HadoopHadoop
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
Manoj Jangalva
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
Nikita Sure
 
Hadoop
HadoopHadoop
Hadoop
Oded Rotter
 
Spark,Hadoop,Presto Comparition
Spark,Hadoop,Presto ComparitionSpark,Hadoop,Presto Comparition
Spark,Hadoop,Presto Comparition
Sandish Kumar H N
 
Bigdataappliance datasheet-1883358
Bigdataappliance datasheet-1883358Bigdataappliance datasheet-1883358
Bigdataappliance datasheet-1883358Ory Chhean
 
HDFS
HDFSHDFS
Hadoop .pdf
Hadoop .pdfHadoop .pdf
Hadoop .pdf
SudhanshiBakre1
 
Apache hadoop introduction and architecture
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architecture
Harikrishnan K
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paper
Supratim Ray
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014
Rajan Kanitkar
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
Thanh Nguyen
 
Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.
Muthu Natarajan
 

Similar to Hadoop Platforms - Introduction, Importance, Providers (20)

Hadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceHadoop Ecosystem at a Glance
Hadoop Ecosystem at a Glance
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Big Data Hadoop Technology
Big Data Hadoop TechnologyBig Data Hadoop Technology
Big Data Hadoop Technology
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
 
Hadoop
HadoopHadoop
Hadoop
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
 
Hadoop
HadoopHadoop
Hadoop
 
Spark,Hadoop,Presto Comparition
Spark,Hadoop,Presto ComparitionSpark,Hadoop,Presto Comparition
Spark,Hadoop,Presto Comparition
 
Bigdataappliance datasheet-1883358
Bigdataappliance datasheet-1883358Bigdataappliance datasheet-1883358
Bigdataappliance datasheet-1883358
 
HDFS
HDFSHDFS
HDFS
 
Hadoop .pdf
Hadoop .pdfHadoop .pdf
Hadoop .pdf
 
Apache hadoop introduction and architecture
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architecture
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paper
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.
 

Recently uploaded

PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 

Recently uploaded (20)

PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 

Hadoop Platforms - Introduction, Importance, Providers

  • 2. 11/2/2016 Introduction  Hadoop was created by Doug Cutting and Mike Cafarella in 2005. It was named after a toy elephant. It was originally developed to support distribution for the Nutch search engine project.  Hadoop is an open-source software framework for storing data and running applications on clusters. It provides immense storage for any kind of data, enormous processing power and the ability to handle limitless concurrent tasks.  Hadoop is a highly scalable analytics platform and can process multiple petabytes of data spread across hundreds or thousands of physical storage servers or nodes.  It provides:  Redundant, fault-tolerant data storage  Parallel computation framework  Job Coordination  Hadoop is a solution to manage Big Data, it is framework for running data management applications on a large cluster built of commodity hardware. 2
  • 3. 3 11/2/2016 Importance of Hadoop  Ability to store and process huge amounts of any kind of data, quickly.  Computing power- Hadoop's distributed computing model processes big data faster.  Fault tolerance- Data and application processing are protected against hardware failure. If a node goes down, jobs are automatically redirected to other nodes to make sure the distributed computing does not fail.  Flexibility- structured and unstructured both kinds of data can be stored without pre-processing them.  Low cost- The open-source framework is free and uses commodity hardware to store large quantities of data.  Scalability- Nodes can be added as and when needed and maintenance cost is very less. http://www.sas.com/content/sascom/en_us/insights/big-data/hadoop/_jcr_content/par/styledcontainer_8bf1/par/styledcontainer_a643/par/textimage_ea05/image.img.png/1468851612191.png 3
  • 4. 4 11/2/2016 Hadoop Core Components Hadoop is a system for large scale data processing. It has two main components: 1. HDFS – Hadoop Distributed File System (Storage)  Distributed across “nodes”  Natively redundant  NameNode tracks locations. 2. MapReduce (Processing)  Splits a task across processors  “near” the data & assembles results  Self-healing, High Bandwidth  Clustured storage  JobTracker manages the TaskTrackers http://cdn.edureka.co/blog/wp-content/uploads/2014/08/hadoop1componenets.png 4
  • 5. 5 11/2/2016 Top 5 Hadoop Platform Providers  A software framework which provides the necessary tools to carry out Big Data analysis is widely used across industries.  It is open-source, designed to be user-friendly, in its “raw” state it still needs considerable specialist knowledge to set up and run.  “Hadoop-as-a-Service” has evolved in recent times, all of the installation will actually take place within the vendors own cloud, with customers paying a subscription to access the services.  The top 5 Hadoop platform providers are:  IBM  Amazon Web Services  Hortonworks  Cloudera  MapR https://media.licdn.com/mpr/mpr/AAEAAQAAAAAAAAclAAAAJDZmZTQwODVlLTAwZGQtNGI3Ny05OTlhLTUzMTEyYTNmMTllMg.jpg ` 5
  • 6. 6 11/2/2016 1. IBM  IBM has deep roots in the computing industry. Its BigInsights package adds its proprietary analytics and visualization algorithms to the core Hadoop infrastructure.  IBM Open Platform with Apache Hadoop  Native support for rolling upgrades for Hadoop services  Support for long-running applications within YARN for enhanced reliability & security  Heterogeneous storage in HDFS for in-memory, SSD in addition to HDD  Spark in-memory distributed compute engine for dramatic performance increases over MapReduce and simplifies developer experience, leveraging Java, Python & Scala languages  Apache Hadoop projects included: HDFS, YARN, MapReduce, Ambari, Hbase, Hive, Oozie, Parquet, Parquet Format, Pig, Snappy, Solr, Spark, Sqoop, Zookeeper, Open JDK, Knox, Slider https://www-01.ibm.com/software/in/data/images/bd-platform.jpg 6
  • 7. 7 11/2/2016 2. Amazon Web Services  Amazon is a frontrunner and offering Hadoop in its cloud services package.  Amazon Web Services (AWS) is a hosted solution integrating Hadoop with Amazon’s Elastic Cloud Compute and Simple Storage Service (S3) cloud-based data processing and storage services.  AWS offers a broad set of global compute, storage, database, analytics, application, and deployment services that help organizations move faster, lower IT costs, and scale applications.  AWS are trusted by the largest enterprises and the hottest start- ups to power a wide variety of workloads including web and mobile applications, data processing and warehousing, storage, archive, and many others.  Big Data on AWS introduces you to cloud-based big data solutions such as Amazon Elastic, MapReduce (EMR), Amazon Redshift, Amazon Kinesis and the rest of the AWS big data platform. http://www.strategism.org/wp-content/uploads/2015/06/amazon-800x600.jpg 7
  • 8. 8 11/2/2016 3. Hortonworks  Horton is one of the few which offer 100% open source Hadoop technology without any proprietary.  Horton were also the first to integrate support for Apache Catalog, which creates “metadata” – data within data – simplifying the process of sharing your data across other layers of service such as Apache Hive or Pig.  HDP (HORTONW0RKS DATA PLATFORM) is the enterprise-ready open source Apache™ Hadoop® distribution based on a centralized architecture (YARN).  HDP addresses the complete needs of data-at-rest, powers real-time customer applications and delivers robust analytics that accelerate decision making and innovation.  Hortonworks is all about data: data-in-motion, data-at-rest, and Modern Data Applications. Our Connected Data Platforms help customers create actionable intelligence to transform their businesses. http://hortonworks.com/wp-content/uploads/2014/03/11.png 8
  • 9. 9 11/2/2016 4. Cloudera  Most popular and have largest number of installations running.  Cloudera contribute Impala, which offers real-time massively parallel processing of Big Data to Hadoop.  Cloudera's open-source Apache Hadoop distribution, CDH (Cloudera Distribution Including Apache Hadoop), targets enterprise-class deployments of that technology.  Cloudera says that more than 50% of its engineering output is donated upstream to the various Apache-licensed open source projects (Apache Hive, Apache Avro, Apache HBase, and so on) that combine to form the Hadoop platform.  Cloudera is a sponsor of the Apache Software Foundation. http://blog.cloudera.com/wp-content/uploads/2013/06/search.png 9
  • 10. 10 11/2/2016 5. MapR  MapR uses some differing concepts, such as native support for UNIX file systems rather than HDFS.  MapR technologies is spearheading development of the Apache Drill project, which provides advanced tools for interactive real- time querying of Big Datasets.  The MapR Converged Data Platform is the industry’s only platform to integrate the enormous power of Hadoop and Spark with global event streaming, real-time database capabilities, and enterprise storage.  The MapR Hadoop distribution replaces HDFS with its proprietary file system, MapR-FS, which is designed to provide more efficient management of data, reliability and ease of use.  The MapR Converged Data Platform supports big data storage and processing through the Apache collection of Hadoop products, as well as its added-value components. http://2s7gjr373w3x22jf92z99mgm5w-wpengine.netdna-ssl.com/wp-content/uploads/2016/03/Mapr_Zeta_4-1.png 10
  • 11. 11/2/2016 References 1. http://www.sas.com/en_us/insights/big-data/hadoop.html#hadoopimportance 2. http://www.ironsystems.com/products/hadoop-platforms-overview 3. http://www.slideshare.net/billonahill/intro-to-hadoop-14125097/32-Hadoop_provides_Redundant_faulttolerant_data 4. http://www.computerweekly.com/feature/Big-data-storage-Hadoop-storage-basics 5. https://www.linkedin.com/pulse/big-data-top-10-commercial-hadoop-platforms-bernard-marr 6. http://data-informed.com/10-top-commercial-hadoop-platforms/ 7. http://www.cloudera.com/partners/solutions/amazon-web-services.html 8. http://hortonworks.com/products/data-center/hdp/ 9. http://www-03.ibm.com/software/products/en/ibm-open-platform-with-apache-hadoop 10. https://en.wikipedia.org/wiki/Cloudera 11. https://www.mapr.com/ 12. http://searchdatamanagement.techtarget.com/feature/Inside-the-MapR-Hadoop-distribution-for-managing-big-data 13. http://www.ironnetworks.com/ 14. http://www.ironsystems.com/ 15. http://shop.ironnetworks.com/ 11