Hadoop Tutorial

Big Data and Hadoop
By –
Ujjwal Kumar Gupta

Contents
Why Big Data & Hadoop
Drawbacks of Traditional Database
Hadoop History
What is Hadoop & How it Works
Hadoop Cluster
Hadoop Ecosystem

Following are the reasons why Big Data is needed:
● 90% of the data in the world today has been created in the last two years alone.
● 80% of the data is unstructured or exists in widely varying structures, which are
difficult to analyze.
● Structured formats have some limitations with respect to handling large quantities
of data.
● It is difficult to integrate information distributed across multiple systems.
● Most business users do not know what should be analyzed.
● Potentially valuable data is dormant or discarded.
● It is too expensive to justify the integration of large volumes of unstructured data.
● A lot of information has a short, useful lifespan.
● Context adds meaning to the existing information.
Why Big Data & Hadoop ?

Drawbacks of Traditional Database
Expensive - Out of Reach for small & mid-
size company
Scalability – As Data Grows Expanding the
system is a Challenging task
Time Consuming – It takes lots of time to
store & process data

What is Hadoop
 Open source framework designed for storage and
processing of large scale data on clusters of commodity
hardware
 Created by Doug Cutting in 2006.
 Cutting named the program after his son’s toy elephant.

How Hadoop Works
When data is loaded onto the system it is divided into
blocks
Typically 64MB or 128MB
Tasks are divided into two phases
Map tasks which are done on small portions of data
where the data is stored
Reduce tasks which combine data to produce the final
output
A master program allocates work to individual nodes

Big Data Sources
The sources of Big Data are:
● web logs;
● sensor networks;
● social media;
● internet text and documents;
● internet pages;
● search index data;
● atmospheric science, astronomy, biochemical and medical records;
● scientific research;
● military surveillance; and
● photography archives.

Hadoop Tutorial

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Hadoop Tutorial

Similar to Hadoop Tutorial (20)

Recently uploaded

Recently uploaded (20)

Hadoop Tutorial