Introduction To Hadoop Administration - SpringPeople

© SpringPeople Software Private Limited, All Rights Reserved.© SpringPeople Software Private Limited, All Rights Reserved.
Introduction to
Administration

© SpringPeople Software Private Limited, All Rights Reserved.
What is Hadoop?
• Hadoop is a free, Java-based programming
framework that supports the processing of
large data sets in a distributed computing
environment. It is part of the Apache
project sponsored by the Apache Software
Foundation.

What is HDFS?
The Hadoop Distributed File System (HDFS) is a distributed file
system designed to run on commodity hardware. It has many
similarities with existing distributed file systems. However, the
differences from other distributed file systems are significant. HDFS
is highly fault-tolerant and is designed to be deployed on low-cost
hardware. HDFS provides high throughput access to application
data and is suitable for applications that have large data sets. HDFS
relaxes a few POSIX requirements to enable streaming access to file
system data. HDFS was originally built as infrastructure for the
Apache Nutch web search engine project. HDFS is now an Apache
Hadoop subproject.

HDFS Architecture

What is Hadoop Cluster?
• A Hadoop cluster is a special type of computational cluster
designed specifically for storing and analyzing huge amounts
of unstructured data in a distributed computing environment.
• Such clusters run Hadoop's open source distributed
processing software on low-cost commodity computers.
• Hadoop clusters are known for boosting the speed of data
analysis applications. They also are highly scalable.
• Hadoop clusters also are highly resistant to failure because
each piece of data is copied onto other cluster nodes, which
ensures that the data is not lost if one node fails.

What is MapReduce?
• Hadoop MapReduce (Hadoop Map/Reduce) is a
software framework for distributed processing of
large data sets on compute clusters of commodity
hardware. It is a sub-project of the
Apache Hadoop project. The framework takes
care of scheduling tasks, monitoring them and re-
executing any failed tasks.

• MapReduce has undergone a complete overhaul in hadoop-
0.23 and we now have, what we call, MapReduce 2.0 (MRv2)
or YARN.
• Apache™ Hadoop® YARN is a sub-project of Hadoop at the
Apache Software Foundation introduced in Hadoop 2.0 that
separates the resource management and processing
components. YARN was born of a need to enable a broader
array of interaction patterns for data stored in HDFS beyond
MapReduce. The YARN-based architecture of Hadoop 2.0
provides a more general processing platform that is not
constrained to MapReduce.
Apache Hadoop NextGen MapReduce (YARN)

How you can master Hadoop
Administration?
Become an expert in 2 days.
World class Hadoop Administration training by the
industry experts.
More Details

Suggested Audience & Other Details
• Prerequisites: Basic knowledge of unix and system
administration. Prior knowledge of Hadoop is not required.
• Suggested Audience:
– Developers
– Architects
• Duration – 2 Days
Syllabus

For further info/assistance contact
training@springpeople.com
+91 80 656 79700
www.springpeople.com
Our Partners

Introduction To Hadoop Administration - SpringPeople

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introduction To Hadoop Administration - SpringPeople

Similar to Introduction To Hadoop Administration - SpringPeople (20)

More from SpringPeople

More from SpringPeople (20)

Recently uploaded

Recently uploaded (20)

Introduction To Hadoop Administration - SpringPeople