Bigdata and hadoop

BIG DATA AND
HADOOP
-ADITI YADAV

CONTENT :
 What is Big-Data ?
 Why Big-Data ?
 When Big-Data is really a problem ?
 MapReduce
 HDFS
 YARN Framework and Common Utilities
2
• INTRODUCTION
• Hadoop – Big-Data solution
• Architecture of Hadoop
• Hadoop in Industry
• Conclusion

What is Big-Data ?
• Big Data is a collection of large datasets that cannot be
processed using traditional computing techniques.
• It is not a technique or a tool but involves many areas
of business and technology.
Why Big-Data ?
“90% of the world’s data was generated in the last few years.”
Due to new technologies, devices, and communication means
like social networking sites, the amount of data produced by
mankind is growing rapidly.
3

4
The big question : “When BIG-DATA is really a problem?
Big-Data is really a problem when the following operations have to be
performed on it –
>STORAGE >TRANSFER
>ANALYSIS >PRESENTATION
>SEARCHING >SHARING

HADOOP
the solution to Big-Data handling
 It is an opensource framework
written in java
 It is a framework that allows for the
distributed processing of large data
sets across clusters of computers
using simple programming models.
 It is designed to scale up from single
servers to thousands of machines
,each offering local computation and
storage.
5

6
HADOOP ARCHITECTURE :
Hadoop is designed to scale up from
single server to thousands of
machines, each offering local
computation and storage .
At the core,Hadoop has two major
layers namely:
• MapReduce
(Processing/Computation layer)
• Hadoop Distributed File
System(Storage layer).

7
MapReduce is a processing technique and a
program model for distributed computing
based on java. The MapReduce algorithm
contains two important tasks, namely Map
and Reduce.
Map : it takes a set of data and converts it
into another set of data, where individual
elements are broken down into tuples
(key/value pairs).
Reduce task : it takes the output from a map
as an input and combines those data tuples
into a smaller set of tuples.
As the sequence of the name MapReduce
implies, the reduce task is always performed
after the map job.
MapReduce :

8
Hadoop File System(HDFS) Overview
HDFS holds very large amount of data and provides easier access. To store such huge
data, the files are stored across multiple machines , in redundant fashion to rescue the
system from possible data losses in case of failure. HDFS also makes applications
available to parallel processing
Features of HDFS:
• It is suitable for the distributed storage and processing.
• Hadoop provides a command interface to interact with HDFS.
• The built-in servers of namenode and datanode help users to easily check the status of
cluster.
• Streaming access to file system data.
• HDFS provides file permissions and authentication

9
Architecture of HDFS :
HDFS follows the master slave
architecture and has the following
elements:
• Namenode: manages filesystem
namespace and regulates clients
access to files.
• Datanode: performs read-write
operations on the file system as per
the client request.
• Block:Generally the user data is
stored in the files of HDFS. The file in
a file system will be divided into one
or more segments and/or stored in
individual data nodes. These file
segments are called as blocks

10
YARN Framework :
The fundamental idea of YARN is to split
up the functionalities of resource
management and job
scheduling/monitoring into separate
daemons.
COMMON UTILITIES :
Hadoop Common refers to the collection of common utilities and libraries that
support other Hadoop modules. It is an essential part or module of the Apache
Hadoop Framework, along with the Hadoop Distributed File System (HDFS),
Hadoop YARN and Hadoop MapReduce. Hadoop Common is also known as
Hadoop Core.

11
HADOOP IN INDUSTRY:
• Prominent users of Hadoop are:
 Amazon
 Facebook
 Adobe
 Ebay
 Yahoo
 IIIT Hyderabad
• Apache Hadoop takes top prize at media Guardian Innovation awards
in march 2011.
• Hadoop wins Terabyte Sort Benchmark in July 2012.

“
“You can have data without
information , but you cannot have
information without data.”
- Daniel Keys Moran
12

~Conclusion~
 It reduces traffic on capture , storage , search , sharing , analysis and
visualization.
 A huge amount of data could be stored and large computations could be done in
a single compound with full safety and security at cheap cost
 BIG-DATA and BIG-DATA solutions is one of the burning issues in the present IT
industry so , working on them will surely make us more useful to the industry.
13

REFERENCES ~
 HADOOP bigdata analysis
framework – Tutorialspoint
SIMPLY EASY LEARNING.
 Hadoop: The Definitive Guide
 http://hadoop.apache.org/
14

Bigdata and hadoop

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Bigdata and hadoop

Similar to Bigdata and hadoop (20)

Recently uploaded

Recently uploaded (20)

Bigdata and hadoop