SlideShare a Scribd company logo
1 of 50
Copyright © 2017, edureka and/or its affiliates. All rights reserved.
HADOOP COMPONENTS
HADOOP CORE COMPONENTS
HADOOP ARCHITECTURE
www.edureka.co
WHAT IS HADOOP?
MAJOR HADOOP COMPONENTS
WHAT IS HADOOP?
www.edureka.co
www.edureka.co
WHAT IS HADOOP?
HADOOP
Hadoop is an open source distributed processing
framework that manages data processing and
storage for big data applications running in clustered
systems.
HADOOP CORE COMPONENTS
www.edureka.co
HADOOP CORE COMPONENTS
MAPREDUCE
COMMON UTILITIES
HDFS
YARN
www.edureka.co
HADOOP CORE COMPONENTS
NAMENODE RESOURCE MANAGER
SECONDARY
NAMENODE
DATANODE NODEMANAGER
HDFS YARN
Hadoop
MASTER
SLAVE
www.edureka.co
HADOOP ARCHITECTURE
www.edureka.co
HADOOP ARCHITECTURE
NAMENODE SECONDARY
NAMENODE
FS-image
Edit Log
Edit Log
(New)
FS-image
Edit Log
FS-image
(Final)
www.edureka.co
HADOOP CORE COMPONENTS
NODE
MANAGER
APP
MANAGER
CONTAINER
NODE
MANAGER
APP
MANAGER
CONTAINER
NODE
MANAGER
APP
MANAGER
CONTAINER
CLIENT RESOURCE MANAGER
Node Status
Resource Request
MapReduce Status
www.edureka.co
MAJOR HADOOP COMPONENTS
www.edureka.co
Storage Managers General Purpose
Execution Engines
Data abstraction
Engines
Machine Learning
Engines
Machine Learning
Engines
Database
Management
Engines
Resource
Management YARN
Storage HDFS
General Purpose
Execution
Engines
General Purpose
Execution
Engines
Hadoop Cluster
Management
Software
Graph Processing
Frameworks
Realtime Data
Streaming
Frameworks
www.edureka.co
Copyright © 2017, edureka and/or its affiliates. All rights reserved.
www.edureka.co
www.edureka.co
HADOOP
STORAGE MANAGERS
MAJOR HADOOP COMPONENTS
HDFS
• Hadoop Distributed File System.
• Primary Data Storage Unit in Hadoop.
• Used in Distributed Data Processing environment.
www.edureka.co
MAJOR HADOOP COMPONENTS
HCATALOG
• Hadoop Storage Management layer.
• Exposes Tabular data of Hive metastore to other
applications like Pig, MapReduce etc.
www.edureka.co
MAJOR HADOOP COMPONENTS
ZOOKEEPER
• Centralized Open-source Server
• Used to provide a distributed configuration
service, synchronization service, and naming
registry for large distributed systems.
www.edureka.co
MAJOR HADOOP COMPONENTS
OOZIE
• Server-based workflow scheduling system
• It Schedules jobs in Apache Hadoop Jobs
• Used to manage Directed Acyclical Graphs (DAGs)
www.edureka.co
Copyright © 2017, edureka and/or its affiliates. All rights reserved.
www.edureka.co
www.edureka.co
GENERAL PURPOSE
EXECUTION ENGINES
MAJOR HADOOP COMPONENTS
MAPREDUCE
• Software Framework for distributed processing .
• It splits data into chunks to enable map, filter and
other operations.
• Used in Functional Programming.
www.edureka.co
MAJOR HADOOP COMPONENTS
SPARK
• General Purpose Cluster Computing Framework.
• It can perform Real-time data streaming and ETL
• Used for Micro-Batch Processing.
www.edureka.co
MAJOR HADOOP COMPONENTS
TEZ
• High performance Data processing tool.
• Executes series of MapReduce Jobs as single Job
• Used to Batch Processing environment
www.edureka.co
Copyright © 2017, edureka and/or its affiliates. All rights reserved.
www.edureka.co
www.edureka.co
HADOOP DATABASE
MANAGEMENT ENGINES
MAJOR HADOOP COMPONENTS
HIVE
• Data Warehouse Software Project
• Enables SQL like queries for Databases.
• Used in ETL, Hive DDL and DML
www.edureka.co
MAJOR HADOOP COMPONENTS
SPARK SQL
• Distributed SQL Query engine
• Enables Structured Data Processing.
• Used importing data from RDDs, Hive, Parquet
files etc.
www.edureka.co
MAJOR HADOOP COMPONENTS
IMPALA
• In-Memory Processing Query engine
• Integrates with HIVE metastore to share the table
information between the components.
• Used to process data in Hadoop Clusters
www.edureka.co
MAJOR HADOOP COMPONENTS
APACHE DRILL
• Low Latency Distributed Query engine
• Combines a variety of data stores just by using a
single query.
• Used to support different kinds of NoSQL Data
bases.
www.edureka.co
MAJOR HADOOP COMPONENTS
HBASE
• Open source, non-relational distributed database
• Combines a variety of data stores just by using a
single query.
www.edureka.co
Copyright © 2017, edureka and/or its affiliates. All rights reserved.
www.edureka.co
www.edureka.co
HADOOP DATA
ABSTRACTION ENGINES
MAJOR HADOOP COMPONENTS
APACHE PIG
• High level scripting language
• Enables users to write complex data
transformations
• Performs ETL and analyses huge Datasets.
www.edureka.co
MAJOR HADOOP COMPONENTS
APACHE SQOOP
• Command-line interface application for
transferring data between relational databases
and Hadoop.
• Data Ingesting tool.
• Enables to import and export structured data in
an enterprise level
www.edureka.co
Copyright © 2017, edureka and/or its affiliates. All rights reserved.
www.edureka.co
www.edureka.co
HADOOP REAL-TIME
STREAMING FRAMEWORKS
MAJOR HADOOP COMPONENTS
SPARK STREAMING
• Spark Streaming is an extension of the
core SparkAPI.
• Enables scalable, high-throughput, fault-
tolerant stream processing of live data streams
• Spark Streaming provides a high-level abstraction
called discretized stream for continuous data
streaming.
www.edureka.co
MAJOR HADOOP COMPONENTS
APACHE KAFKA
• Open-source stream-processing software
• Ingests and moves large amounts of data very
quickly.
• Uses publish and subscribe to streams of records.
www.edureka.co
MAJOR HADOOP COMPONENTS
APACHE FLUME
• Open-source Distributed and Reliable software
• Architecture is based on Streaming Data Flows
• Collecting, Aggregating and Moving large logs of
Data.
www.edureka.co
Copyright © 2017, edureka and/or its affiliates. All rights reserved.
www.edureka.co
www.edureka.co
HADOOP GRAPH
PROCESSING FRAMEWORK
MAJOR HADOOP COMPONENTS
APACHE GIRAPH
• Iterative graph processing framework.
• Utilizes Apache Hadoop's MapReduce
implementation to process graphs.
• Used to analyse social media data
www.edureka.co
MAJOR HADOOP COMPONENTS
APACHE GRAPHX
• GraphX is Apache Spark's API for graphs and
graph-parallel computation.
• Comparable performance to the fastest
specialized graph processing systems.
• Seamlessly work with both graphs and collections.
• Choose from a growing library of graph
algorithms.
www.edureka.co
Copyright © 2017, edureka and/or its affiliates. All rights reserved.
www.edureka.co
www.edureka.co
HADOOP MACHINE
LEARNING FRAMEWORKS
MAJOR HADOOP COMPONENTS
H2O
• H2O is open-source software for big-data analysis.
• H2O allows to fit thousands of potential models as
part of discovering patterns in data.
• H2O uses iterative methods that provide quick
answers using all of the client's data.
www.edureka.co
MAJOR HADOOP COMPONENTS
ORYX
• A generic lambda architecture tier, providing
batch/speed/serving layers.
• Oryx is designed with specialization for real-time
large scale machine learning
• End-to-End implementation of the standard ML
algorithms as applications.
www.edureka.co
MAJOR HADOOP COMPONENTS
SPARK MLlib
• Spark MLlib is a scalable Machine Learning
Library.
• It enables us to perform Machine Learning
operations in Spark.
www.edureka.co
MAJOR HADOOP COMPONENTS
AVRO
• Avro is a row-oriented remote procedure call and
data serialization.
• Used in Dynamic typing and Schema Evolution
and many more.
• Avro is used in Data Serialization and RPC.
www.edureka.co
MAJOR HADOOP COMPONENTS
THRIFT
• It is an Interface definition language and binary
communication protocol.
• It allows users to define data types and service
interfaces in a simple definition file
• Thrift is used in building RPC Clients and Servers.
www.edureka.co
MAJOR HADOOP COMPONENTS
MAHOUT
• Implementations of distributed machine learning
algorithms.
• Store and process big data in a distributed
environment across clusters of computers
using simple programming models
www.edureka.co
Copyright © 2017, edureka and/or its affiliates. All rights reserved.
www.edureka.co
www.edureka.co
HADOOP CLUSTER
MANAGEMENT SOFTWARE
www.edureka.co
MAJOR HADOOP COMPONENTS
AMBAARI
• Hadoop Cluster Management Software.
• Ambari enables system administrators to
provision, manage and monitor a Hadoop cluster.
www.edureka.co
MAJOR HADOOP COMPONENTS
ZOOKEEPER
• Centralized Open-source Server
• Manage configuration across nodes
• Implement reliable messaging
• Implement redundant services
• Synchronize process execution
www.edureka.co
Copyright © 2017, edureka and/or its affiliates. All rights reserved.
www.edureka.co
www.edureka.co

More Related Content

What's hot

Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure DatabricksJames Serra
 
Big data visualization
Big data visualizationBig data visualization
Big data visualizationAnurag Gupta
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Edureka!
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop TechnologyManish Borkar
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data EngineeringDurga Gadiraju
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshJeffrey T. Pollock
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 

What's hot (20)

Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Big data visualization
Big data visualizationBig data visualization
Big data visualization
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Big data-ppt
Big data-pptBig data-ppt
Big data-ppt
 
Data Engineering Basics
Data Engineering BasicsData Engineering Basics
Data Engineering Basics
 
Fraud and Risk in Big Data
Fraud and Risk in Big DataFraud and Risk in Big Data
Fraud and Risk in Big Data
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 

Similar to Hadoop Components Guide

Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingDataWorks Summit
 
hadoop eco system regarding big data analytics.pptx
hadoop eco system regarding big data analytics.pptxhadoop eco system regarding big data analytics.pptx
hadoop eco system regarding big data analytics.pptxmrudulasb
 
Parquet and AVRO
Parquet and AVROParquet and AVRO
Parquet and AVROairisData
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Data Con LA
 
BDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data AnalyticsBDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data AnalyticsNetajiGandi1
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2Imviplav
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoopmarkgrover
 
Bdm hadoop ecosystem
Bdm hadoop ecosystemBdm hadoop ecosystem
Bdm hadoop ecosystemAmit Bhardwaj
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark FundamentalsZahra Eskandari
 
2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data IntegrationJeffrey T. Pollock
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 

Similar to Hadoop Components Guide (20)

Getting started big data
Getting started big dataGetting started big data
Getting started big data
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
hadoop eco system regarding big data analytics.pptx
hadoop eco system regarding big data analytics.pptxhadoop eco system regarding big data analytics.pptx
hadoop eco system regarding big data analytics.pptx
 
Spark_Part 1
Spark_Part 1Spark_Part 1
Spark_Part 1
 
Parquet and AVRO
Parquet and AVROParquet and AVRO
Parquet and AVRO
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
 
BDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data AnalyticsBDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data Analytics
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
SQL On Hadoop
SQL On HadoopSQL On Hadoop
SQL On Hadoop
 
Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
Bdm hadoop ecosystem
Bdm hadoop ecosystemBdm hadoop ecosystem
Bdm hadoop ecosystem
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Meetup Oracle Database BCN: 2.1 Data Management Trends
Meetup Oracle Database BCN: 2.1 Data Management TrendsMeetup Oracle Database BCN: 2.1 Data Management Trends
Meetup Oracle Database BCN: 2.1 Data Management Trends
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
 
2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 

More from Edureka!

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaEdureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaEdureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaEdureka!
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaEdureka!
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaEdureka!
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaEdureka!
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaEdureka!
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaEdureka!
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaEdureka!
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaEdureka!
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | EdurekaEdureka!
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEdureka!
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEdureka!
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaEdureka!
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaEdureka!
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaEdureka!
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaEdureka!
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaEdureka!
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | EdurekaEdureka!
 

More from Edureka! (20)

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
 

Recently uploaded

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Recently uploaded (20)

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

Hadoop Components Guide

  • 1. Copyright © 2017, edureka and/or its affiliates. All rights reserved.
  • 3. HADOOP CORE COMPONENTS HADOOP ARCHITECTURE www.edureka.co WHAT IS HADOOP? MAJOR HADOOP COMPONENTS
  • 5. www.edureka.co WHAT IS HADOOP? HADOOP Hadoop is an open source distributed processing framework that manages data processing and storage for big data applications running in clustered systems.
  • 7. HADOOP CORE COMPONENTS MAPREDUCE COMMON UTILITIES HDFS YARN www.edureka.co
  • 8. HADOOP CORE COMPONENTS NAMENODE RESOURCE MANAGER SECONDARY NAMENODE DATANODE NODEMANAGER HDFS YARN Hadoop MASTER SLAVE www.edureka.co
  • 10. HADOOP ARCHITECTURE NAMENODE SECONDARY NAMENODE FS-image Edit Log Edit Log (New) FS-image Edit Log FS-image (Final) www.edureka.co
  • 13. Storage Managers General Purpose Execution Engines Data abstraction Engines Machine Learning Engines Machine Learning Engines Database Management Engines Resource Management YARN Storage HDFS General Purpose Execution Engines General Purpose Execution Engines Hadoop Cluster Management Software Graph Processing Frameworks Realtime Data Streaming Frameworks www.edureka.co
  • 14. Copyright © 2017, edureka and/or its affiliates. All rights reserved. www.edureka.co www.edureka.co HADOOP STORAGE MANAGERS
  • 15. MAJOR HADOOP COMPONENTS HDFS • Hadoop Distributed File System. • Primary Data Storage Unit in Hadoop. • Used in Distributed Data Processing environment. www.edureka.co
  • 16. MAJOR HADOOP COMPONENTS HCATALOG • Hadoop Storage Management layer. • Exposes Tabular data of Hive metastore to other applications like Pig, MapReduce etc. www.edureka.co
  • 17. MAJOR HADOOP COMPONENTS ZOOKEEPER • Centralized Open-source Server • Used to provide a distributed configuration service, synchronization service, and naming registry for large distributed systems. www.edureka.co
  • 18. MAJOR HADOOP COMPONENTS OOZIE • Server-based workflow scheduling system • It Schedules jobs in Apache Hadoop Jobs • Used to manage Directed Acyclical Graphs (DAGs) www.edureka.co
  • 19. Copyright © 2017, edureka and/or its affiliates. All rights reserved. www.edureka.co www.edureka.co GENERAL PURPOSE EXECUTION ENGINES
  • 20. MAJOR HADOOP COMPONENTS MAPREDUCE • Software Framework for distributed processing . • It splits data into chunks to enable map, filter and other operations. • Used in Functional Programming. www.edureka.co
  • 21. MAJOR HADOOP COMPONENTS SPARK • General Purpose Cluster Computing Framework. • It can perform Real-time data streaming and ETL • Used for Micro-Batch Processing. www.edureka.co
  • 22. MAJOR HADOOP COMPONENTS TEZ • High performance Data processing tool. • Executes series of MapReduce Jobs as single Job • Used to Batch Processing environment www.edureka.co
  • 23. Copyright © 2017, edureka and/or its affiliates. All rights reserved. www.edureka.co www.edureka.co HADOOP DATABASE MANAGEMENT ENGINES
  • 24. MAJOR HADOOP COMPONENTS HIVE • Data Warehouse Software Project • Enables SQL like queries for Databases. • Used in ETL, Hive DDL and DML www.edureka.co
  • 25. MAJOR HADOOP COMPONENTS SPARK SQL • Distributed SQL Query engine • Enables Structured Data Processing. • Used importing data from RDDs, Hive, Parquet files etc. www.edureka.co
  • 26. MAJOR HADOOP COMPONENTS IMPALA • In-Memory Processing Query engine • Integrates with HIVE metastore to share the table information between the components. • Used to process data in Hadoop Clusters www.edureka.co
  • 27. MAJOR HADOOP COMPONENTS APACHE DRILL • Low Latency Distributed Query engine • Combines a variety of data stores just by using a single query. • Used to support different kinds of NoSQL Data bases. www.edureka.co
  • 28. MAJOR HADOOP COMPONENTS HBASE • Open source, non-relational distributed database • Combines a variety of data stores just by using a single query. www.edureka.co
  • 29. Copyright © 2017, edureka and/or its affiliates. All rights reserved. www.edureka.co www.edureka.co HADOOP DATA ABSTRACTION ENGINES
  • 30. MAJOR HADOOP COMPONENTS APACHE PIG • High level scripting language • Enables users to write complex data transformations • Performs ETL and analyses huge Datasets. www.edureka.co
  • 31. MAJOR HADOOP COMPONENTS APACHE SQOOP • Command-line interface application for transferring data between relational databases and Hadoop. • Data Ingesting tool. • Enables to import and export structured data in an enterprise level www.edureka.co
  • 32. Copyright © 2017, edureka and/or its affiliates. All rights reserved. www.edureka.co www.edureka.co HADOOP REAL-TIME STREAMING FRAMEWORKS
  • 33. MAJOR HADOOP COMPONENTS SPARK STREAMING • Spark Streaming is an extension of the core SparkAPI. • Enables scalable, high-throughput, fault- tolerant stream processing of live data streams • Spark Streaming provides a high-level abstraction called discretized stream for continuous data streaming. www.edureka.co
  • 34. MAJOR HADOOP COMPONENTS APACHE KAFKA • Open-source stream-processing software • Ingests and moves large amounts of data very quickly. • Uses publish and subscribe to streams of records. www.edureka.co
  • 35. MAJOR HADOOP COMPONENTS APACHE FLUME • Open-source Distributed and Reliable software • Architecture is based on Streaming Data Flows • Collecting, Aggregating and Moving large logs of Data. www.edureka.co
  • 36. Copyright © 2017, edureka and/or its affiliates. All rights reserved. www.edureka.co www.edureka.co HADOOP GRAPH PROCESSING FRAMEWORK
  • 37. MAJOR HADOOP COMPONENTS APACHE GIRAPH • Iterative graph processing framework. • Utilizes Apache Hadoop's MapReduce implementation to process graphs. • Used to analyse social media data www.edureka.co
  • 38. MAJOR HADOOP COMPONENTS APACHE GRAPHX • GraphX is Apache Spark's API for graphs and graph-parallel computation. • Comparable performance to the fastest specialized graph processing systems. • Seamlessly work with both graphs and collections. • Choose from a growing library of graph algorithms. www.edureka.co
  • 39. Copyright © 2017, edureka and/or its affiliates. All rights reserved. www.edureka.co www.edureka.co HADOOP MACHINE LEARNING FRAMEWORKS
  • 40. MAJOR HADOOP COMPONENTS H2O • H2O is open-source software for big-data analysis. • H2O allows to fit thousands of potential models as part of discovering patterns in data. • H2O uses iterative methods that provide quick answers using all of the client's data. www.edureka.co
  • 41. MAJOR HADOOP COMPONENTS ORYX • A generic lambda architecture tier, providing batch/speed/serving layers. • Oryx is designed with specialization for real-time large scale machine learning • End-to-End implementation of the standard ML algorithms as applications. www.edureka.co
  • 42. MAJOR HADOOP COMPONENTS SPARK MLlib • Spark MLlib is a scalable Machine Learning Library. • It enables us to perform Machine Learning operations in Spark. www.edureka.co
  • 43. MAJOR HADOOP COMPONENTS AVRO • Avro is a row-oriented remote procedure call and data serialization. • Used in Dynamic typing and Schema Evolution and many more. • Avro is used in Data Serialization and RPC. www.edureka.co
  • 44. MAJOR HADOOP COMPONENTS THRIFT • It is an Interface definition language and binary communication protocol. • It allows users to define data types and service interfaces in a simple definition file • Thrift is used in building RPC Clients and Servers. www.edureka.co
  • 45. MAJOR HADOOP COMPONENTS MAHOUT • Implementations of distributed machine learning algorithms. • Store and process big data in a distributed environment across clusters of computers using simple programming models www.edureka.co
  • 46. Copyright © 2017, edureka and/or its affiliates. All rights reserved. www.edureka.co www.edureka.co HADOOP CLUSTER MANAGEMENT SOFTWARE
  • 47. www.edureka.co MAJOR HADOOP COMPONENTS AMBAARI • Hadoop Cluster Management Software. • Ambari enables system administrators to provision, manage and monitor a Hadoop cluster. www.edureka.co
  • 48. MAJOR HADOOP COMPONENTS ZOOKEEPER • Centralized Open-source Server • Manage configuration across nodes • Implement reliable messaging • Implement redundant services • Synchronize process execution www.edureka.co
  • 49. Copyright © 2017, edureka and/or its affiliates. All rights reserved. www.edureka.co