Session Objectives
Introduction
Hadoop: Introduction
Hadoop: Architecture
Hadoop:Map-reduce
DFS and HDFS
Summary
Introduction
Big Data:?
Challenges of BIG DATA
Big Data Sources
Use cases of Big
Big data characteristics :
• Volume : Extremely large amount of data
• Velocity: Rate at which the data is generated
• Variety: Structured, semi-structured, unstructured data,
Big data Technologies:
Hadoop : Introduction
Hadoop. It allows the distributed processing on large
clusters and datasets
It’s a data management technology or a framework
Why Hadoop?
• Scalable.
• Cost effective.
• Fast,
• Resilient to failure.
Hadoop : Cluster
Master with 5 slave , 50TB data for next 5 months
Hadoop : Cluster
Hadoop : Components
Hadoop1 : Daemons
Hadoop2 : Daemons
1. Name Node
2. Data node
3. Secondary name node
4. Resource manager :
5. Node Manager
Hadoop : Master Slave Architecture
Hadoop
HDFS
Name Node Data Node
MR/YARN
Resource Manager
Node Manager
Master
Slave
Hadoop : Master Slave Architecture
Hadoop : Modes of Operations
Hadoop1 vs Hadoop2
Hadoop1 Limitations
Single point failure
Low resource utilization
Less scalable as compare to Hadoop2
Ecosystem
DFS and HDFS
DFS and HDFS
HDFS Read Operation
HDFS Write Operation
File block and replication
Thank You

BIG DATA Session 6