Hadoop online training

HISTORY OF HADOOP
Hadoop was created by Doug Cutting, the creator
of Apache Lucene, the widely used text search
library. Hadoop has its origins in Apache Nutch, an
open source web search engine, itself a part of the
Lucene project.

 The name Hadoop is not an acronym; it’s a made-up name. The project’s
creator, Doug Cutting, explains how the name came about:
The name my kid gave a stuffed yellow elephant. Short,
relatively easy to spell and pronounce, meaningless, and not used
elsewhere: those are my naming criteria. Kids are good at generating
such. Googol is a kid’s term.
 Subprojects and “contrib” modules in Hadoop also tend to have names
that are unre-lated to their function, often with an elephant or other animal
theme (“Pig,” for example). Smaller components are given more
descriptive (and therefore more mun-dane) names. This is a good
principle, as it means you can generally work out what something does
from its name. For example, the jobtracker9 keeps track of MapReduce
jobs.

INTRODUCTION TO HADOOP
Hadoop is an open source software framework that supports data-
intensive distributed applications. It is licensed under the Apache v2
license, and generally known as Apache Hadoop.
Hadoop has been developed based on a paper originally written by
Google on MapReduce system and applies concepts of functional
programming; It is written in Java programming language and is the
highest-level Apache project being constructed and used by a global
community of contributors.

INTRODUCTION TO HADOOP
Big giants like Yahoo and Facebook are using Hadoop as an
integral part of their functioning – in 2008, Yahoo! Inc. established the
world’s largest Hadoop production application. Also, the Yahoo! Search
Webmap is a Hadoop application that runs on over 10,000 core Linux
clusters, generating data that is now widely used in every Yahoo! Web
search query. On the other hand, Facebook uses Apache Hadoop to
keep track of its billions of user profiles as well as all the data related to
them like their images, posts, comments, videos, etc.

Hadoop is not a database:
Hadoop an efficient distributed file system and not a
database. It is designed specifically for information that comes
in many forms, such as server log files or personal productivity
documents. Anything that can be stored as a file can be placed
in a Hadoop repository.

Hadoop is used for:
 Search - Yahoo, Amazon, Zvents
 Log processing - Facebook, Yahoo
 Data Warehouse - Facebook, AOL
 Video and Image Analysis - New York Times, Eyealike

WHY HADOOP ?
Hadoop is a free, Java-based programming framework
that supports the processing of large data sets in a distributed
computing environment.
Because Hadoop is open source and can run on commodity
hardware, the initial cost savings are dramatic and continue to
grow as your organizational data grows.
It is part of the Apache project sponsored by the Apache
Software Foundation.

WHY HADOOP ?
Single Source of Truth:-
With the enterprise data warehouse approach,
organizations find their data scattered across many systems and
silos. This decentralized environment can result in slow processing
and inefficient data analysis. Hadoop makes it possible to
consolidate your data and business intelligence capabilities within
an Enterprise Data Hub. The ability to save all organizational data
at its lowest level of granularity and bring all archive data into an
Enterprise Data Hub gives business users greater and faster
access to data.

WHY HADOOP ?
Faster Data Processing:-
In legacy environments, traditional ETL and batch
processes can take hours, days, or even weeks, in a world where
businesses require access to data in minutes or seconds or even
sub-seconds. Hadoop excels at high-volume batch processing.
Because of its parallel processing, Hadoop can perform batch
processes 10 times faster than on a single thread server or on the
mainframe.

WHY HADOOP ?
Get More for Less:-
The true beauty of Hadoop is its ability to cost-effectively scale to
rapidly growing data demands. With its distributed computing power,
Hadoop configures across a cluster of commodity servers, or nodes. By
augmenting its EDW environment with Hadoop, the enterprise can
decrease its cost per terabyte of storage. With cheaper storage,
organizations can keep more data that was previously too expensive to
warehouse. This allows for the capture and storage of data from any
source within the organization while decreasing the amount of data that
is “thrown away” during data cleansing.

HADOOP INTERNAL SOFTWARE ARCHITECTURE

COMPONENTS OF HADOOP
The current Apache Hadoop ecosystem consists of
the Hadoop kernel, MapReduce, the Hadoop distributed file
system (HDFS) and a number of related projects such as
Apache Hive, HBase and Zookeeper. MapReduce and
Hadoop distributed file system (HDFS) are the main
component of Hadoop.
MapReduce:
The framework that understands and assigns work to
the nodes in a cluster

COMPONENTS OF HADOOP
Hadoop distributed file system (HDFS):
HDFS is the file system that spans all the nodes in a Hadoop
cluster for data storage. It links together the file systems on many
local nodes to make them into one big file system. HDFS assumes
nodes will fail, so it achieves reliability by replicating data across
multiple nodes.

ADVANTAGE OF HADOOP
 Hadoop is Scalable
 Hadoop is Cost effective
 Hadoop is Flexible
 Hadoop is Fault tolerant

PREREQUISITE TO LEARN HADOOP ?
There is no strict prerequisite to start learning
Hadoop.
However, if you want to become an expert in
Hadoop and make an excellent career, you should
have at least basic knowledge of Java and Linux

IS JAVA REQUIRED TO LEARN HADOOP?
Knowing Java is an added advantage, but Java is not
strictly a prerequisite for working with Hadoop.
Why Java is not strictly a prerequisite:
Tools like Hive and Pig that are built on top of Hadoop offer
their own high-level languages for working with data on your cluster. If
you want to write your own MapReduce code, you can do so in any
language (e.g. Perl, Python, Ruby, C, etc.) that supports reading from
standard input and writing to standard output with Hadoop Streaming

IS JAVA REQUIRED TO LEARN HADOOP?
Added advantage of Java in Hadoop:
Although you can use Streaming to write your map
and reduce functions in the language of your choice, there
are some advanced features that are (at present) only
available via the Java API.

LINUX IS EXTRA BENEFIT WHILE LEARNING HADOOP?
Hadoop can run on Windows, it was built initially on Linux
and Linux is the preferred method for both installing and managing
Hadoop.
Having a solid understanding of getting around in a Linux
shell will also help you tremendously in digesting Hadoop,
especially with regards to many of the HDFS command line
parameters

COURSE CONTENT
Hadoop Introduction and Overview:
• What is Hadoop?
• History of Hadoop
• Building Blocks – Hadoop Eco-System
• Who is behind Hadoop?
• What Hadoop is good for and what it is not
Hadoop Distributed File System (HDFS):
• HDFS Overview and Architecture
• HDFS Installation
• Hadoop File System Shell
• File System Java API

COURSE CONTENT
Map/Reduce:
• Map/Reduce Overview and Architecture
• Installation
• Developing Map/Red Jobs
• Input and Output Formats
• Job Configuration
• Job Submission
• HDFS as a Source and Sink
• HBase as a Source and Sink
• Hadoop Streaming

COURSE CONTENT
HBase:
• HBase Overview and Architecture
• HBase Installation
• HBase Shell
• CRUD operations
• Scanning and Batching
• Filters
• HBase Key Design

COURSE CONTENT
Pig:
• Pig Overview
• Installation
• Pig Latin
• Pig with HDFS
Hive:
• Hive Overview
• Installation
• Hive QL

COURSE CONTENT
Sqoop:
• Sqoop Overview
• Installation
• Imports and Exports
Zoo Keeper:
• Zoo Keeper Overview
• Installation
• Server Mantainace
Putting it all together:
• Distributed installations

PLEASE CHECK THE LINK
http://www.keylabstraining.com/hadoop-online-
training-hyderabad-bangalore

PLEASE CONTACT:
 +91-9550-645-679 (India)
 +1-908-366-7933 (USA)
 Skype id : keylabstraining
 Email id : info@keylabstraining.com

Hadoop online training

More Related Content

What's hot

Viewers also liked

Similar to Hadoop online training

Recently uploaded

Hadoop online training