2. Contents
Why Big Data & Hadoop
Drawbacks of Traditional Database
Hadoop History
What is Hadoop & How it Works
Hadoop Cluster
Hadoop Ecosystem
3. Following are the reasons why Big Data is needed:
● 90% of the data in the world today has been created in the last two years alone.
● 80% of the data is unstructured or exists in widely varying structures, which are
difficult to analyze.
● Structured formats have some limitations with respect to handling large quantities
of data.
● It is difficult to integrate information distributed across multiple systems.
● Most business users do not know what should be analyzed.
● Potentially valuable data is dormant or discarded.
● It is too expensive to justify the integration of large volumes of unstructured data.
● A lot of information has a short, useful lifespan.
● Context adds meaning to the existing information.
Why Big Data & Hadoop ?
5. Drawbacks of Traditional Database
Expensive - Out of Reach for small & mid-
size company
Scalability – As Data Grows Expanding the
system is a Challenging task
Time Consuming – It takes lots of time to
store & process data
6.
7. What is Hadoop
Open source framework designed for storage and
processing of large scale data on clusters of commodity
hardware
Created by Doug Cutting in 2006.
Cutting named the program after his son’s toy elephant.
8. How Hadoop Works
When data is loaded onto the system it is divided into
blocks
Typically 64MB or 128MB
Tasks are divided into two phases
Map tasks which are done on small portions of data
where the data is stored
Reduce tasks which combine data to produce the final
output
A master program allocates work to individual nodes
11. Big Data Sources
The sources of Big Data are:
● web logs;
● sensor networks;
● social media;
● internet text and documents;
● internet pages;
● search index data;
● atmospheric science, astronomy, biochemical and medical records;
● scientific research;
● military surveillance; and
● photography archives.