Big data(hadoop)

A
Presentation
On
Big Data
&
Hadoop
Submitted To:-
Mrs. Sonika Narang
Mrs. Poonam Beri
Submitted By:-
Ms. Shabnam
34633

Big data means really a big data, it is a collection of
large & complex data that it becomes difficult to
process using traditional data processing
applications.

Black Box Data
Social Media Data
Stock Exchange Data
Power Grid Data
Transport Data

3Vs /Characterizing BIG
DATA
Volume
Variety
Velocity

TYPES OF BIG DATA
 Structured Data:-Relational Data
 Semi-Structured Data:-XML Data
 Unstructured Data:-PDF ,Word ,Text ,Media Logs etc.

 Daily, updation of 0.5 PBs on FACEBOOK including 40 millions PHOTOS.
 Daily ,videos uploading on YOUTUBE that can be watched for 1 year
continously.
 Also affect INTERNET SEARCH,FINANCE & BUSINESS INFORMATION
 Challenge include in CAPTURE,SEARCHING,SHARING,ANALY-
SIS,STORAGE & VISUALIZATION of data.

LIMITATION
Can’t Deal With Huge Amount of Data
SO TRADITIONAL APPROACH FAILS

Then the
ACTUAL SOLUTION
of
BIG DATA IS NAMED

 A software framework for distributed processing of large datasets
across large clusters of computers
 Large datasets  Terabytes or petabytes of data
 Large clusters  hundreds or thousands of nodes
 Open-source implementation for Google MAPREDUCE
 Based on a simple data model, anydatawillfit

 2005: Doug Cutting and Michael J. Cafarella and team developed Hadoop
to support distribution for the Nutch search engine project.
 Doug named it after his son's toy elephant
 The project was funded by YAHOO
 2006: Yahoo gave the project to APACHE SOFTWARE FOUNDATION.

Architecture of hdoop
MapReduce
HDFS
Hdoop Common

 A software frameawork for distributing computation of
huge data.
 Consists of two main phases
◦ Map
◦ Reduce
 The Map Task: converts input into individually broken
elements.
 The Reduce Task: takes the output from a map task as
input and combines.

How MapReduce Works??
We Love India We 1 Love 1
Love 1 India 1
India 1 We 2
We Play Cricket We 1 Tennis 1
Play 1 Play 1
Tennis
MAP REDUCE
We Love India
We Play Cricket

HDFS
Distributed File system used by Hadoop is (HDFS).
Based on the Google File System (GFS).
Designed to run on thousands of clusters of small
computers.
HDFS uses a MASTERSLAVE ARCHITECTURE

 Master node is called namenode.
 Slave node is called datanode.
 Master (Name Node) manages the file system metadata.
 Slave( DataNodes) store the actual data.
 A file in an HDFS is split into several blocks
 Blocks are stored in a set of DataNodes.
 NameNode the maps blocks to the DataNodes.
 The DataNodes takes care of read, write, creation and deletion
operatons based on instruction given by NameNode.

Provides access to HDFS.
Contains Java libraries and utilities
Contains the necessary java files &
scripts to start HADOOP.

ADVANTAGES OF HADOOP
Designed to detect & handle
failures.
• Automation distribution of data across
the machines.
Doesn’t rely on hardware for fault
tolerance.
• Servers can be added or removed
dynamically.

Big data(hadoop)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Big data(hadoop)

Similar to Big data(hadoop) (20)

Recently uploaded

Recently uploaded (20)

Big data(hadoop)