Hadoop Technologies
Architecture Overview

@senthil245

Mail - senthil245@gmail.com
DISTRIBUTED CLUSTER ARCHITECTURE: MASTER/SLAVE
HADOOP CORE
MAPREDUCE PATTERNS
WHEN MAPREDUCE
Since the MapReduce is running within a
cluster of computing nodes, the architecture is
very scalable.
• I...
APACHE SQOOP
APACHE FLUME
APACHE CHUKWA
HDFS
APACHE OOZIE – WORKFLOW SCHEDULER (CHECK AZKABAN & LINKEDIN OPENSOURCE)
PIG AND HQL (DO

NOT USE

HQL)
APACHE S4 (STREAM PROCESSING)(ALSO CHECK KAFKA

AND

STORM)
APACHE ZOOKEEPER SERVICE (ALSO CHECK APACHE HUE)
APACHE HIVE
APACHE HCATALOG, HIVE

AND

HBASE
Hadoop Ecosystem Architecture Overview
Upcoming SlideShare
Loading in...5
×

Hadoop Ecosystem Architecture Overview

2,173

Published on

Hadoop Ecosystems overview and diagrams - helps to understand list of subprojects in hadoop in diagramatic way.

Published in: Technology, News & Politics
0 Comments
10 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,173
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
10
Embeds 0
No embeds

No notes for slide

Hadoop Ecosystem Architecture Overview

  1. 1. Hadoop Technologies Architecture Overview @senthil245 Mail - senthil245@gmail.com
  2. 2. DISTRIBUTED CLUSTER ARCHITECTURE: MASTER/SLAVE
  3. 3. HADOOP CORE
  4. 4. MAPREDUCE PATTERNS
  5. 5. WHEN MAPREDUCE Since the MapReduce is running within a cluster of computing nodes, the architecture is very scalable. • In other words, if the data size is increased by the factor of x, the performance should be still constant if we are adding a predictable/fixed factor of y. The graph on the right is illustrating the relationship between the size of the data (xaxis) and processing time (y-axis). •The blue color curve is the process using traditional programming. On the other hand, the black color curve is the process using Hadoop. When the data size is small, traditional programming is better performance because the bootstrap of Hadoop is expensive (Copy the data within the cluster, inter-nodes communication, etc.). Once the data size is big enough, the penalty of the Hadoop bootstrap becomes invisible. •Hence Hadoop is best suited for Big Data crunching ideally in terms of petaBytes and is not suited for implementing common data integration patterns
  6. 6. APACHE SQOOP
  7. 7. APACHE FLUME
  8. 8. APACHE CHUKWA
  9. 9. HDFS
  10. 10. APACHE OOZIE – WORKFLOW SCHEDULER (CHECK AZKABAN & LINKEDIN OPENSOURCE)
  11. 11. PIG AND HQL (DO NOT USE HQL)
  12. 12. APACHE S4 (STREAM PROCESSING)(ALSO CHECK KAFKA AND STORM)
  13. 13. APACHE ZOOKEEPER SERVICE (ALSO CHECK APACHE HUE)
  14. 14. APACHE HIVE
  15. 15. APACHE HCATALOG, HIVE AND HBASE

×