HADOOPCOMPONENTS
&ARCHITECTURE:
BIGDATA&HADOOP
TRAINING
UNDERSTAND HOW THE HADOOP ECOSYSTEM WORKS TO MASTER
APACHE HADOOP SKILLS AND GAIN IN-DEPTH KNOWLEDGE OF BIG
DATA ECOSYSTEM AND HADOOP ARCHITECTURE. HOWEVER, BEFORE
YOU ENROLL FOR ANY BIG DATA HADOOP TRAINING COURSE IT IS
NECESSARY TO GET SOME BASIC IDEA ON HOW THE HADOOP
ECOSYSTEM WORKS. LEARN ABOUT THE VARIOUS HADOOP
COMPONENTS THAT CONSTITUTE THE APACHE HADOOP
ARCHITECTURE IN THIS PRESENTATION.
DEFINING ARCHITECTURE COMPONENTS OF THE BIG
DATA ECOSYSTEM
CORE HADOOP COMPONENTS
• Hadoop Common
• 2) Hadoop Distributed File System (HDFS)
• 3) MapReduce- Distributed Data Processing Framework of Apache Hadoop
• 4)YARN
•
• Read More in Detail about Hadoop Components -
https://www.dezyre.com/article/hadoop-components-and-architecture-big-
data-and-hadoop-training/114
DATA ACCESS COMPONENTS OF HADOOP ECOSYSTEM-
PIG AND HIVE
APACHE PIG
• Apache Pig is a convenient tool developed by Yahoo for analysing
huge data sets efficiently and easily. It provides a high level data
flow language Pig Latin that is optimized, extensible and easy to
use.
APACHE HIVE
• Hive developed by Facebook is a data warehouse built on top of
Hadoop and provides a simple language known as HiveQL similar
to SQL for querying, data summarization and analysis. Hive
makes querying faster through indexing.
DATA INTEGRATION COMPONENTS OF HADOOP ECOSYSTEM-
SQOOP AND FLUME
APACHE SQOOP
• Sqoop component is used for importing data from external
sources into related Hadoop components like HDFS, HBase or
Hive. It can also be used for exporting data from Hadoop o other
external structured data stores.
FLUME
• Flume component is used to gather and aggregate large amounts
of data. Apache Flume is used for collecting data from its origin
and sending it back to the resting location (HDFS).
DATA STORAGE COMPONENT OF HADOOP ECOSYSTEM –
HBASE
HBASE
• HBase is a column-oriented database that uses HDFS for
underlying storage of data. HBase supports random reads and
also batch computations using MapReduce. With HBase NoSQL
database enterprise can create large tables with millions of rows
and columns on hardware machine.
MONITORING, MANAGEMENT AND ORCHESTRATION
COMPONENTS OF HADOOP ECOSYSTEM- OOZIE AND
ZOOKEEPER
OOZIE
• Oozie is a workflow scheduler where the workflows are
expressed as Directed Acyclic Graphs. Oozie runs in a Java servlet
container Tomcat and makes use of a database to store all the
running workflow instances, their states ad variables along with
the workflow definitions to manage Hadoop jobs (MapReduce,
Sqoop, Pig and Hive).
ZOOKEEPER
• Zookeeper is the king of coordination and provides simple, fast, reliable and
ordered operational services for a Hadoop cluster. Zookeeper is responsible
for synchronization service, distributed configuration service and for
providing a naming registry for distributed systems.
•
• To Know about other Hadoop Components -
https://www.dezyre.com/article/hadoop-components-and-architecture-big-
data-and-hadoop-training/114

Big Data and Hadoop Components

  • 1.
  • 2.
    UNDERSTAND HOW THEHADOOP ECOSYSTEM WORKS TO MASTER APACHE HADOOP SKILLS AND GAIN IN-DEPTH KNOWLEDGE OF BIG DATA ECOSYSTEM AND HADOOP ARCHITECTURE. HOWEVER, BEFORE YOU ENROLL FOR ANY BIG DATA HADOOP TRAINING COURSE IT IS NECESSARY TO GET SOME BASIC IDEA ON HOW THE HADOOP ECOSYSTEM WORKS. LEARN ABOUT THE VARIOUS HADOOP COMPONENTS THAT CONSTITUTE THE APACHE HADOOP ARCHITECTURE IN THIS PRESENTATION.
  • 3.
    DEFINING ARCHITECTURE COMPONENTSOF THE BIG DATA ECOSYSTEM
  • 5.
    CORE HADOOP COMPONENTS •Hadoop Common • 2) Hadoop Distributed File System (HDFS) • 3) MapReduce- Distributed Data Processing Framework of Apache Hadoop • 4)YARN • • Read More in Detail about Hadoop Components - https://www.dezyre.com/article/hadoop-components-and-architecture-big- data-and-hadoop-training/114
  • 6.
    DATA ACCESS COMPONENTSOF HADOOP ECOSYSTEM- PIG AND HIVE
  • 7.
    APACHE PIG • ApachePig is a convenient tool developed by Yahoo for analysing huge data sets efficiently and easily. It provides a high level data flow language Pig Latin that is optimized, extensible and easy to use.
  • 8.
    APACHE HIVE • Hivedeveloped by Facebook is a data warehouse built on top of Hadoop and provides a simple language known as HiveQL similar to SQL for querying, data summarization and analysis. Hive makes querying faster through indexing.
  • 9.
    DATA INTEGRATION COMPONENTSOF HADOOP ECOSYSTEM- SQOOP AND FLUME
  • 10.
    APACHE SQOOP • Sqoopcomponent is used for importing data from external sources into related Hadoop components like HDFS, HBase or Hive. It can also be used for exporting data from Hadoop o other external structured data stores.
  • 11.
    FLUME • Flume componentis used to gather and aggregate large amounts of data. Apache Flume is used for collecting data from its origin and sending it back to the resting location (HDFS).
  • 12.
    DATA STORAGE COMPONENTOF HADOOP ECOSYSTEM – HBASE
  • 13.
    HBASE • HBase isa column-oriented database that uses HDFS for underlying storage of data. HBase supports random reads and also batch computations using MapReduce. With HBase NoSQL database enterprise can create large tables with millions of rows and columns on hardware machine.
  • 14.
    MONITORING, MANAGEMENT ANDORCHESTRATION COMPONENTS OF HADOOP ECOSYSTEM- OOZIE AND ZOOKEEPER
  • 15.
    OOZIE • Oozie isa workflow scheduler where the workflows are expressed as Directed Acyclic Graphs. Oozie runs in a Java servlet container Tomcat and makes use of a database to store all the running workflow instances, their states ad variables along with the workflow definitions to manage Hadoop jobs (MapReduce, Sqoop, Pig and Hive).
  • 16.
    ZOOKEEPER • Zookeeper isthe king of coordination and provides simple, fast, reliable and ordered operational services for a Hadoop cluster. Zookeeper is responsible for synchronization service, distributed configuration service and for providing a naming registry for distributed systems. • • To Know about other Hadoop Components - https://www.dezyre.com/article/hadoop-components-and-architecture-big- data-and-hadoop-training/114