Newyorksys.com
An Overview of Hadoop
Hadoop is a open-source tool which can be used
effectively in processing huge volumes of data sets. ...
Agenda
 What is Hadoop.
 Why do we need Hadoop.
 How Hadoop works.
 HDFS Architecture.
 What is Map – Reduce.
 Hadoo...
What is Hadoop
 Hadoop is an open Sourse Framework.
 Developed by Apache Software Foundation.
 Used for distributed pro...
Why do we need Hadoop
 Data is growing faster.
 Need to process multi petabytes of data.
 The performance of traditiona...
How Hadoop Works
 The Hadoop core consists of two modules :
 Hadoop Distributed File System (HDFS) [Storage].
 Map Redu...
HDFS Architecture
What is Map – Reduce
 Map Reduce plays a key role in hadoop framework.
 Map Reduce is a Programming model for writing
ap...
Hadoop Cluster
 A Hadoop Cluster consist of multiple machines Which
can be classified into 3 types
 Namenode
 Secondary...
Hadoop Processes
 Below are the daemons (Processes) Which runs in a
cluster.
Name node (Runs on a master machine)
Job Tra...
Topology of a Hadoop Cluster
Distinction
 Simple – Hadoop allows users to quickly write efficient
parallel code.
 Reliable – Because Hadoop runs on c...
Prerequisites
 Linux bases operating system (Mac
OS, Redhat, ubuntu)
 Java 1.6 or higher version
 Disk space ( To hold ...
Newyorksys.com
 NewyorkSys is one of the leading top Training and
Consulting Company in US. We have certified trainers.
W...
Newyorksys.com
The End
Upcoming SlideShare
Loading in...5
×

Hadoop online training overview

1,262

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,262
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
57
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Hadoop online training overview

  1. 1. Newyorksys.com
  2. 2. An Overview of Hadoop Hadoop is a open-source tool which can be used effectively in processing huge volumes of data sets. It works in a distributed computing scenario. Hadoop is one of the best solution for addressing the issue of big data. Newyorksys has the best trainers who provides the best online training for Hadoop by using the state of the art training methodologies
  3. 3. Agenda  What is Hadoop.  Why do we need Hadoop.  How Hadoop works.  HDFS Architecture.  What is Map – Reduce.  Hadoop Cluster.  Hadoop Processes.  Topology of a Hadoop Cluster.  Distinction of Hadoop Framework .  Prerequisites to learn hadoop.
  4. 4. What is Hadoop  Hadoop is an open Sourse Framework.  Developed by Apache Software Foundation.  Used for distributed processing of large date sets.  It works across clusters of computers using a simple programming model (Map-Reduce).
  5. 5. Why do we need Hadoop  Data is growing faster.  Need to process multi petabytes of data.  The performance of traditional applications is decreasing.  The number of machines in a cluster is not constant.  Failure is expected, rather than exceptional.
  6. 6. How Hadoop Works  The Hadoop core consists of two modules :  Hadoop Distributed File System (HDFS) [Storage].  Map Reduce [Processing]. Mapper Reducer
  7. 7. HDFS Architecture
  8. 8. What is Map – Reduce  Map Reduce plays a key role in hadoop framework.  Map Reduce is a Programming model for writing applications that rapidly process large amount of data.  Mapper – is a function that processes input data to generate intermediate output data.  Reducer – Merges all intermediate data from all mappers and generate final output data.
  9. 9. Hadoop Cluster  A Hadoop Cluster consist of multiple machines Which can be classified into 3 types  Namenode  Secondary Namenode  Datanode
  10. 10. Hadoop Processes  Below are the daemons (Processes) Which runs in a cluster. Name node (Runs on a master machine) Job Tracker (Runs on a master machine) Data node (Runs on slave machines) Task Tracker (Runs on slave machines)
  11. 11. Topology of a Hadoop Cluster
  12. 12. Distinction  Simple – Hadoop allows users to quickly write efficient parallel code.  Reliable – Because Hadoop runs on commodity hardware, it can face frequent automatically handle such failures.  Scalable – we can increase or decrease the number of nodes (machine) in hadoop cluster.
  13. 13. Prerequisites  Linux bases operating system (Mac OS, Redhat, ubuntu)  Java 1.6 or higher version  Disk space ( To hold HDFS data and it’s replications )  Ram (Recommended 2GB)  A cluster of computers.  You can even install Hadoop on single machine.
  14. 14. Newyorksys.com  NewyorkSys is one of the leading top Training and Consulting Company in US. We have certified trainers. We will provide Online Training, Fast Track online training, with job assistance. We are providing excellent Training in all courses. We also help you in resume preparation and provide job assistance till you get job. For more details Visit : http://www.newyorksys.com 15 Roaring Brook Rd, Chappaqua, NY 10514. USA: +1-718-313-0499 & 718-305-1757 E:enquiry@newyorksys.us
  15. 15. Newyorksys.com The End
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×