More Related Content More from Skillspeed (16) Introduction to Hadoop 2.0 & YARN | Hadoop 2.0 & YARN Fundamentals | Hadoop 2.0 & YARN Architecture2. Slide 2© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Session Objectives
ᗍ Introduction to Big Data and Hadoop
ᗍ Understanding Hadoop 2.0 and its features
ᗍ Understanding the differences between Hadoop 1.x and 2.x
ᗍ Understanding YARN
3. Slide 3© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Big Data and its Challenges
4. Slide 4© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Big Data and its Challenges
Big data is the term for a collection of data sets so
large and complex that it becomes difficult to
process using on-hand database management
tools or traditional data processing applications
Systems / Enterprises generate huge amount of
data from Terabytes to and even Petabytes of
information
It’s very difficult to manage such huge data……
Get Started with BIG Data & Hadoop
5. Slide 5© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Who Generates Big Data?
Have you ever wondered how Google, Facebook or LinkedIn manages to store and utilize the huge data?
Today, it is becoming a problem for all of us to manage such BIG DATA…. Get Started with BIG Data & Hadoop
6. Slide 6© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop can be used for easy processing of such huge Data…..
We will answer how?
Before that let’s understand what is Hadoop?
Get Started with BIG Data & Hadoop
7. Slide 7© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop and its Characteristics
Apache Hadoop is a framework that allows the distributed processing of large data sets across clusters of
commodity computers using a simple programming model
It is an Open-source Data Management technology with scale-out storage and distributed processing
Hadoop
Characteristics
Flexible
Reliable
Economical
Scalable Get Started with BIG Data & Hadoop
8. Slide 8© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop Ecosystem
Flume Sqoop
Import Or Export
Unstructured or
Semi-Structured data Structured Data
Apache Oozie (Workflow)
HDFS
(Hadoop Distributed File System)
Pig Latin
Data Analysis
Hive
DW System
MapReduce Framework HBase
Other
YARN
Frameworks (MPI,
GIRAPH)
YARN
Cluster Resource Management
Get Started with BIG Data & Hadoop
9. © 2015 BlueCamphor Technologies (P) Ltd. Slide 9© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Next Generation Hadoop
Get Started with BIG Data & Hadoop
10. © 2015 BlueCamphor Technologies (P) Ltd. Slide 10© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop 1.x
Client
NameNode
Secondary
NameNode
Job Tracker
Data Node Data Node
Task Tracker
Map Reduce
Task Tracker
Map Reduce
Task Tracker
Map Reduce
Data Node
Task Tracker
Map Reduce
Data Node
Data
Blocks
…….
HDFS Map Reduce
Get Started with BIG Data & Hadoop
11. © 2015 BlueCamphor Technologies (P) Ltd. Slide 11© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Challenges for Hadoop 1.x
Problem Description
NameNode – No horizontal Scalability
Single NameNode and Single Namespaces, limited by NameNode
RAM
NameNode – No high Availability (HA)
NameNode is single point of failure, need manual recovery using
Secondary NameNode in case of failure
Job Tracker – Overburdened
Spends significant amount of time and effort managing the life-
cycle of applications
MRv1 – Only Map and Reduce Tasks
Humongous amount of data stored in HDFS remains unutilized and
cannot be used for other workloads such as graph processing etc.
Get Started with BIG Data & Hadoop
12. © 2015 BlueCamphor Technologies (P) Ltd. Slide 12© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop 2.x Features
Property Hadoop 1.0 Hadoop 2.0
Federation
One Namenode and
Namespaces
Multiple Namenode and
Namespaces
High Availability Not Present Highly Available
YARN – Processing Control
and Multi-tenancy
JobTracker, Task Tracker
Resource Manager, Node
Manager, App Master, Capacity
Scheduler
Other Important Hadoop 2.0 Features
ᗍ HDFS Snapshots
ᗍ NFSv3 access to data in HDFS
ᗍ Support for running Hadoop on MS Windows
ᗍ Binary Compatibility for MapReduce applications built on Hadoop 1.0
ᗍ Substantial amount of Integration testing with rest of the projects (Such as PIG, HIVE) in Hadoop ecosystem
Get Started with BIG Data & Hadoop
13. © 2015 BlueCamphor Technologies (P) Ltd. Slide 13© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
HDFS 1.x Vs 2.x
Pool k Pool n
NS 1 NS k NS n
NN-1 NN-k NN-n
Block Pools
DataNode 1
….
DataNode 2
….
DataNode m
….
Common Storage
BlockStorageNamespace
…. ….
Hadoop 2.0
NameNode
NS
Block Management
.….
Storage
NamespaceBlockStorage
Hadoop 1.0
Pool 1
Datanode Datanode
Get Started with BIG Data & Hadoop
14. © 2015 BlueCamphor Technologies (P) Ltd. Slide 14© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop 2.x – High Availability
Client
Secondary
NameNode
Active
NameNode
Shared edit
logs
Standby
NameNode
Resource
Manager
Data Node Data Node
Node Manager
Container
App
Master
Node Manager
Container
App
Master
Node Manager
Container
App
Master
Node Manager
Container
App
Master
Data Node Data Node
HDFS YARN
Read edit logs and applies to
its own namespace
All name space edits logged
to shared NFS storage; single
writer (fencing)
Next
Generation
MapReduce
NameNode
High
Availability
**Not necessary to
configure secondary
NameNode
Get Started with BIG Data & Hadoop
15. © 2015 BlueCamphor Technologies (P) Ltd. Slide 15© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop 1.x Vs 2.x Ecosystem
Apache Oozie (Workflow)
HIVE DW
System
Pig Latin
Data
Analysis
MapReduce Framework
HBase
HDFS
(Hadoop Distributed File System)
Apache Oozie (Workflow)
HIVE DW
System
Pig Latin
Data
Analysis
Other YARN
Frameworks
(MPI,
GIRAPH)
HBaseMapReduce Framework
YARN
Cluster Resource Management
HDFS
(Hadoop Distributed File System)
Get Started with BIG Data & Hadoop
16. © 2015 BlueCamphor Technologies (P) Ltd. Slide 16© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
YARN Flow
YARN = Yet Another Resource Negotiator
JobHistory
Server
Resource
Manager
Client
Client
Container
App
Master
Node Manager
App
Master
Container
Node Manager
Container Container
Node ManagerMapReduce Status
Job Submission
Node Status
Resource Request
Resource Manager
ᗍ Cluster Level Resource Manager
ᗍ Long life, High Quality Hardware
Node Manager
ᗍ One per Data Node
ᗍ Monitors Resources on Data Node
Application Master
ᗍ One per application
ᗍ Short life
ᗍ Manages task/scheduling
Get Started with BIG Data & Hadoop
17. Slide 17© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Job Trends – Hadoop
Get Started with BIG Data & Hadoop
18. Slide 18© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Course Topics
Module 1
Introduction to Big
Data and Hadoop
Module 2
HDFS Internals, Hadoop
Configurations and
Data Loading
Module 3
Introduction to Map
Reduce
Module 4
Advanced Map Reduce
Concepts
Module 5
Introduction to Pig
Module 6
Advanced Pig and
Introduction to Hive
Module 7
Advanced Hive
Concepts
Module 8
Extending Hive and
HBase Introduction
Module 9
Advanced HBase and
Oozie Introduction
Module 10
Project Set-up
Discussion
Get Started with BIG Data & Hadoop
19. Slide 19© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Why SkillSpeed?
Course
Curriculum
from Industry
Experts
Instructor Led
Live Virtual
Sessions
Lifetime access
to Course
Content via
LMS
100% Placement
Assistance
24x7 Support
Get Started with BIG Data & Hadoop
20. Slide 20© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Corporate Partners
Get Started with BIG Data & Hadoop
21. Slide 21© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Lines open 24/7
To know more about the course, Please contact:
IND +91-90660-20904 USA 1866-607-6547 (Toll Free)
Or reach us at
sales@skillspeed.com
Contact us..
Editor's Notes SkillSpeed offer virtual instructor lead courses designed to bridge the time to competency gap experienced by the technology companies. USP of SkillSpeed is the subject matter expert (SME). SMEs are industry experts and has a good understanding and hands-on industry experience of the technology.
This industry expert designs, develops, and delivers the course.
SkillSpeed provides you:
Course Curriculum from Industry Experts
Instructor Led Live Virtual Sessions
Real life industry case studies
- Live Virtual Interactions Interaction with industry experts
- Lifetime access to all course content via the LMS
- 24*7 support
- 100% placement assistance