SlideShare a Scribd company logo
PRESENTED TO: PRESENTED BY:
DR. AJEET SINGH POONIA CHUNKY KUMAR
INTRODUCTION
 Hadoop is a framework that allows for the distributed processing of
large data sets across clusters of commodity computer using a simple
programming model.
 It is an open-source data management with scale-out storage &
distributed processing.
 The objective of this tool is to support running applications on
BigData.
 It is an open-source set of tools and distributed under Apache license.
BigData
• Big data is a term used to describe the voluminous amount of unstructured
and semi-structured data a company creates.
• Data that would take too much time and cost too much money to load into a
relational database for analysis.
• Big data doesn't refer to any specific quantity, the term is often used when
speaking about petabytes and exabytes of data.
Characteristics of Big Data
Volume
• Data
quantity
Velocity
• Data
Speed
Variety
• Data
Types
What Caused The Problem?
1
2 1
2
Year
Standard Hard Drive Size
(in Mb)
1990 1370
2010 1000000
Year
Data Transfer Rate
(Mbps)
1990 4.4
2010 100
Traditional approach
So,What Is The Problem?
 The transfer speed is around 100 MB/s
 A standard disk is 1 Terabyte
 Time to read entire disk= 10000 seconds or 3 Hours!
 Increase in processing time may not be as helpful because
• Network bandwidth is now more of a limiting factor
• Physical limits of processor chips have been reached
So What do We Do?
•The obvious solution is that we use multiple
processors to solve the same problem by
fragmenting it into pieces.
•Imagine if we had 100 drives, each holding
one hundredth of the data. Working in
parallel, we could read the data in under two
minutes.
Hadoop approach
Hadoop core component
There are two parts of Hadoop:-
 HDFS (Hadoop distributed file system)
 Mapreduce (Processing)
MapReduce
 Hadoop limits the amount of communication which can be performed by
the processes, as each individual record is processed by a task in isolation
from one another
 By restricting the communication between nodes, Hadoop makes the
distributed system much more reliable. Individual node failures can be
worked around by restarting tasks on other machines.
 The other workers continue to operate as though nothing went wrong,
leaving the challenging aspects of partially restarting the program to the
underlying Hadoop layer.
Map : (in_value,in_key)(out_key, intermediate_value)
Reduce: (out_key, intermediate_value) (out_value list)
What is MapReduce?
 MapReduce is a programming model
 Programs written in this functional style are automatically parallelized and executed
on a large cluster of commodity machines
 MapReduce is an associated implementation for processing and generating large
data sets.
MapReduce
MAP
map function that
processes a key/value pair
to generate a set of
intermediate key/value
pairs
REDUCE
and a reduce function
that merges all
intermediate values
associated with the same
intermediate key.
The Programming Model Of MapReduce
 Map, written by the user, takes an input pair and produces a set of
intermediate key/value pairs. The MapReduce library groups together
all intermediate values associated with the same intermediate key I and
passes them to the Reduce
 The Reduce function, also written by the user, accepts an intermediate key I
and a set of values for that key. It merges together these values to form a
possibly smaller set of values
How MapReduce Works
 A Map-Reduce job usually splits the input data-set into independent chunks
which are processed by the map tasks in a completely parallel manner.
 The framework sorts the outputs of the maps, which are then input to the
reduce tasks.
 Typically both the input and the output of the job are stored in a file-
system. The framework takes care of scheduling tasks, monitoring them and
re-executes the failed tasks.
 A MapReduce job is a unit of work that the client wants to be performed: it
consists of the input data, the MapReduce program, and configuration
information. Hadoop runs the job by dividing it into tasks, of which there
are two types: map tasks and reduce tasks
Fault Tolerance
 There are two types of nodes that control the job execution process:
tasktrackers and jobtrackers
 The jobtracker coordinates all the jobs run on the system by scheduling
tasks to run on tasktrackers.
 Tasktrackers run tasks and send progress reports to the jobtracker, which
keeps a record of the overall progress of each job.
 If a tasks fails, the jobtracker can reschedule it on a different tasktracker.
MapReduce data flow with multiple reduce tasks
Mapreduce data flow with no reduce task
Combiner Functions
• Many MapReduce jobs are limited by the bandwidth available on the
cluster.
• In order to minimize the data transferred between the map and reduce tasks,
combiner functions are introduced.
• Hadoop allows the user to specify a combiner function to be run on the map
output—the combiner function’s output forms the input to the reduce
function.
• Combiner finctions can help cut down the amount of data shuffled between
the maps and the reduces.
Hadoop Streaming:
• Hadoop provides an API to MapReduce that allows you to write your
map and reduce functions in languages other than Java.
• Hadoop Streaming uses Unix standard streams as the interface
between Hadoop and your program, so you can use any language
that can read standard input and write to standard output to write
your MapReduce program.
Hadoop Pipes:
• Hadoop Pipes is the name of the C++ interface to Hadoop MapReduce.
• Unlike Streaming, which uses standard input and output to
communicate with the map and reduce code, Pipes uses sockets as the
channel over which the tasktracker communicates with the process
running the C++ map or reduce function. JNI is not used.
HADOOP DISTRIBUTED FILESYSTEM (HDFS)
 Filesystems that manage the storage across a network of machines are
called distributed filesystems.
 Hadoop comes with a distributed filesystem called HDFS, which stands for
Hadoop Distributed Filesystem.
 HDFS, the Hadoop Distributed File System, is a distributed file system
designed to hold very large amounts of data (terabytes or even petabytes),
and provide high-throughput access to this information.
Namenodes and Datanodes
 A HDFS cluster has two types of node operating in a master-worker
pattern: a namenode (the master) and a number of datanodes
(workers).
 The namenode manages the filesystem namespace. It maintains the
filesystem tree and the metadata for all the files and directories in the
tree.
 Datanodes are the work horses of the filesystem. They store and
retrieve blocks when they are told to (by clients or the namenode), and
they report back to the namenode periodically with lists of blocks that
they are storing.
 Without the namenode, the filesystem cannot be used. In fact, if the
machine running the namenode were obliterated, all the files on the
filesystem would be lost since there would be no way of knowing how
to reconstruct the files from the blocks on the datanodes.
 Important to make the namenode resilient to failure, and Hadoop
provides two mechanisms for this:
1. is to back up the files that make up the persistent state of the
filesystem metadata. Hadoop can be configured so that the namenode
writes its persistent state to multiple filesystems.
2. Another solution is to run a secondary namenode. The secondary
namenode usually runs on a separate physical machine, since it
requires plenty of CPU and as much memory as the namenode to
perform the merge. It keeps a copy of the merged namespace image,
which can be used in the event of the namenode failing
File System Namespace
 HDFS supports a traditional hierarchical file organization. A user or an
application can create and remove files, move a file from one directory
to another, rename a file, create directories and store files inside these
directories.
 HDFS does not yet implement user quotas or access permissions. HDFS
does not support hard links or soft links. However, the HDFS
architecture does not preclude implementing these features.
 The Namenode maintains the file system namespace. Any change to
the file system namespace or its properties is recorded by the
Namenode. An application can specify the number of replicas of a file
that should be maintained by HDFS. The number of copies of a file is
called the replication factor of that file. This information is stored by
the Namenode.
REFERENCES
 www.wikipedia.com
 www.Slideshare.com
 www.computereducation.org
 http://hadoop.apache.org/

More Related Content

What's hot

Map Reduce basics
Map Reduce basicsMap Reduce basics
Map Reduce basics
Abhishek Mukherjee
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
Bhavesh Padharia
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
Nalini Mehta
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoop
veeracynixit
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
rohitraj268
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
IIIT-H
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
Jazan University
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
Konstantin V. Shvachko
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
Varun Narang
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
kapa rohit
 
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Cognizant
 
Harnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution TimesHarnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution Times
David Tjahjono,MD,MBA(UK)
 
MapReduce
MapReduceMapReduce
MapReduce
KavyaGo
 
Hadoop - HDFS
Hadoop - HDFSHadoop - HDFS
Hadoop - HDFS
KavyaGo
 

What's hot (16)

Map Reduce basics
Map Reduce basicsMap Reduce basics
Map Reduce basics
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoop
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
 
HDFS Architecture
HDFS ArchitectureHDFS Architecture
HDFS Architecture
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
 
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
 
Harnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution TimesHarnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution Times
 
MapReduce
MapReduceMapReduce
MapReduce
 
Hadoop - HDFS
Hadoop - HDFSHadoop - HDFS
Hadoop - HDFS
 
Understanding hdfs
Understanding hdfsUnderstanding hdfs
Understanding hdfs
 

Viewers also liked

Frank111
Frank111Frank111
Frank111
Frank Frank Tfm
 
Tareaa
TareaaTareaa
CNN - Socially responsible IT has soul
CNN - Socially responsible IT has soulCNN - Socially responsible IT has soul
CNN - Socially responsible IT has soul
abandonedregion54
 
Общенациональная система принятия решений «Вече»
Общенациональная система принятия решений «Вече»Общенациональная система принятия решений «Вече»
Общенациональная система принятия решений «Вече»
mike3b
 
This is Responster
This is ResponsterThis is Responster
This is Responster
Responster
 
Cppt
CpptCppt
Prez, Salider
Prez, SaliderPrez, Salider
Prez, Salider
Frank Frank Tfm
 
Game sense approach
Game sense approachGame sense approach
Game sense approach
ashleighmay21
 
Frank111
Frank111Frank111
Frank111
Frank Frank Tfm
 
Pictures portfolioy45wy456y6rue
Pictures portfolioy45wy456y6ruePictures portfolioy45wy456y6rue
Pictures portfolioy45wy456y6rue
tgrwnhyrten
 
Bp you tourist
Bp you touristBp you tourist
Bp you tourist
Alessio Bonelli
 
Teddy Roosevelt - Progressive President
Teddy Roosevelt - Progressive PresidentTeddy Roosevelt - Progressive President
Teddy Roosevelt - Progressive President
mrsdanielslh
 
Frank Prezzy
Frank PrezzyFrank Prezzy
Frank Prezzy
Frank Frank Tfm
 
Internal audit ท่าเรือแหลมฉบัง
Internal audit ท่าเรือแหลมฉบังInternal audit ท่าเรือแหลมฉบัง
Internal audit ท่าเรือแหลมฉบัง
Apisit Kulapant
 
Carte desen tehnic & geometrie descriptiva
Carte desen tehnic & geometrie descriptivaCarte desen tehnic & geometrie descriptiva
Carte desen tehnic & geometrie descriptiva
Adrian Ionescu
 
Chirurgia degli elementi dentari inclusi
Chirurgia degli elementi dentari inclusiChirurgia degli elementi dentari inclusi
Chirurgia degli elementi dentari inclusi
Dental Evo
 
Boomeon Investor Pitch Deck
Boomeon Investor Pitch DeckBoomeon Investor Pitch Deck
Boomeon Investor Pitch Deck
David Dewhirst
 
Minimal intervention in Dentistry
Minimal intervention in Dentistry Minimal intervention in Dentistry
Minimal intervention in Dentistry
Dental Evo
 

Viewers also liked (18)

Frank111
Frank111Frank111
Frank111
 
Tareaa
TareaaTareaa
Tareaa
 
CNN - Socially responsible IT has soul
CNN - Socially responsible IT has soulCNN - Socially responsible IT has soul
CNN - Socially responsible IT has soul
 
Общенациональная система принятия решений «Вече»
Общенациональная система принятия решений «Вече»Общенациональная система принятия решений «Вече»
Общенациональная система принятия решений «Вече»
 
This is Responster
This is ResponsterThis is Responster
This is Responster
 
Cppt
CpptCppt
Cppt
 
Prez, Salider
Prez, SaliderPrez, Salider
Prez, Salider
 
Game sense approach
Game sense approachGame sense approach
Game sense approach
 
Frank111
Frank111Frank111
Frank111
 
Pictures portfolioy45wy456y6rue
Pictures portfolioy45wy456y6ruePictures portfolioy45wy456y6rue
Pictures portfolioy45wy456y6rue
 
Bp you tourist
Bp you touristBp you tourist
Bp you tourist
 
Teddy Roosevelt - Progressive President
Teddy Roosevelt - Progressive PresidentTeddy Roosevelt - Progressive President
Teddy Roosevelt - Progressive President
 
Frank Prezzy
Frank PrezzyFrank Prezzy
Frank Prezzy
 
Internal audit ท่าเรือแหลมฉบัง
Internal audit ท่าเรือแหลมฉบังInternal audit ท่าเรือแหลมฉบัง
Internal audit ท่าเรือแหลมฉบัง
 
Carte desen tehnic & geometrie descriptiva
Carte desen tehnic & geometrie descriptivaCarte desen tehnic & geometrie descriptiva
Carte desen tehnic & geometrie descriptiva
 
Chirurgia degli elementi dentari inclusi
Chirurgia degli elementi dentari inclusiChirurgia degli elementi dentari inclusi
Chirurgia degli elementi dentari inclusi
 
Boomeon Investor Pitch Deck
Boomeon Investor Pitch DeckBoomeon Investor Pitch Deck
Boomeon Investor Pitch Deck
 
Minimal intervention in Dentistry
Minimal intervention in Dentistry Minimal intervention in Dentistry
Minimal intervention in Dentistry
 

Similar to Cppt

Distributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxDistributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptx
Uttara University
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
rebeccatho
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
Manoj Jangalva
 
Hadoop overview.pdf
Hadoop overview.pdfHadoop overview.pdf
Hadoop overview.pdf
Sunil D Patil
 
2.1-HADOOP.pdf
2.1-HADOOP.pdf2.1-HADOOP.pdf
2.1-HADOOP.pdf
MarianJRuben
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
Hitendra Kumar
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
Mr. Ankit
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoop
veeracynixit
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
Roushan Sinha
 
Hadoop map reduce
Hadoop map reduceHadoop map reduce
Hadoop map reduce
VijayMohan Vasu
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop Guide
Simplilearn
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewNisanth Simon
 
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
paperpublications3
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabad
sreehari orienit
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
Atul Kushwaha
 
BIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdfBIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdf
DIVYA370851
 
A data aware caching 2415
A data aware caching 2415A data aware caching 2415
A data aware caching 2415
SANTOSH WAYAL
 

Similar to Cppt (20)

Distributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxDistributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptx
 
hadoop
hadoophadoop
hadoop
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
 
Hadoop overview.pdf
Hadoop overview.pdfHadoop overview.pdf
Hadoop overview.pdf
 
2.1-HADOOP.pdf
2.1-HADOOP.pdf2.1-HADOOP.pdf
2.1-HADOOP.pdf
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Anju
AnjuAnju
Anju
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoop
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Hadoop map reduce
Hadoop map reduceHadoop map reduce
Hadoop map reduce
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop Guide
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
 
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabad
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
BIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdfBIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdf
 
A data aware caching 2415
A data aware caching 2415A data aware caching 2415
A data aware caching 2415
 

Recently uploaded

The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
Steve Thomason
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
rosedainty
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
Fundacja Rozwoju Społeczeństwa Przedsiębiorczego
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
Celine George
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
AzmatAli747758
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
EduSkills OECD
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Excellence Foundation for South Sudan
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
PedroFerreira53928
 

Recently uploaded (20)

The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
 

Cppt

  • 1. PRESENTED TO: PRESENTED BY: DR. AJEET SINGH POONIA CHUNKY KUMAR
  • 2. INTRODUCTION  Hadoop is a framework that allows for the distributed processing of large data sets across clusters of commodity computer using a simple programming model.  It is an open-source data management with scale-out storage & distributed processing.  The objective of this tool is to support running applications on BigData.  It is an open-source set of tools and distributed under Apache license.
  • 3. BigData • Big data is a term used to describe the voluminous amount of unstructured and semi-structured data a company creates. • Data that would take too much time and cost too much money to load into a relational database for analysis. • Big data doesn't refer to any specific quantity, the term is often used when speaking about petabytes and exabytes of data.
  • 4. Characteristics of Big Data Volume • Data quantity Velocity • Data Speed Variety • Data Types
  • 5. What Caused The Problem? 1 2 1 2 Year Standard Hard Drive Size (in Mb) 1990 1370 2010 1000000 Year Data Transfer Rate (Mbps) 1990 4.4 2010 100
  • 7. So,What Is The Problem?  The transfer speed is around 100 MB/s  A standard disk is 1 Terabyte  Time to read entire disk= 10000 seconds or 3 Hours!  Increase in processing time may not be as helpful because • Network bandwidth is now more of a limiting factor • Physical limits of processor chips have been reached
  • 8. So What do We Do? •The obvious solution is that we use multiple processors to solve the same problem by fragmenting it into pieces. •Imagine if we had 100 drives, each holding one hundredth of the data. Working in parallel, we could read the data in under two minutes.
  • 10.
  • 11. Hadoop core component There are two parts of Hadoop:-  HDFS (Hadoop distributed file system)  Mapreduce (Processing)
  • 12. MapReduce  Hadoop limits the amount of communication which can be performed by the processes, as each individual record is processed by a task in isolation from one another  By restricting the communication between nodes, Hadoop makes the distributed system much more reliable. Individual node failures can be worked around by restarting tasks on other machines.  The other workers continue to operate as though nothing went wrong, leaving the challenging aspects of partially restarting the program to the underlying Hadoop layer. Map : (in_value,in_key)(out_key, intermediate_value) Reduce: (out_key, intermediate_value) (out_value list)
  • 13. What is MapReduce?  MapReduce is a programming model  Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines  MapReduce is an associated implementation for processing and generating large data sets. MapReduce MAP map function that processes a key/value pair to generate a set of intermediate key/value pairs REDUCE and a reduce function that merges all intermediate values associated with the same intermediate key.
  • 14. The Programming Model Of MapReduce  Map, written by the user, takes an input pair and produces a set of intermediate key/value pairs. The MapReduce library groups together all intermediate values associated with the same intermediate key I and passes them to the Reduce
  • 15.  The Reduce function, also written by the user, accepts an intermediate key I and a set of values for that key. It merges together these values to form a possibly smaller set of values
  • 16. How MapReduce Works  A Map-Reduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner.  The framework sorts the outputs of the maps, which are then input to the reduce tasks.  Typically both the input and the output of the job are stored in a file- system. The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks.  A MapReduce job is a unit of work that the client wants to be performed: it consists of the input data, the MapReduce program, and configuration information. Hadoop runs the job by dividing it into tasks, of which there are two types: map tasks and reduce tasks
  • 17.
  • 18. Fault Tolerance  There are two types of nodes that control the job execution process: tasktrackers and jobtrackers  The jobtracker coordinates all the jobs run on the system by scheduling tasks to run on tasktrackers.  Tasktrackers run tasks and send progress reports to the jobtracker, which keeps a record of the overall progress of each job.  If a tasks fails, the jobtracker can reschedule it on a different tasktracker.
  • 19.
  • 20.
  • 21. MapReduce data flow with multiple reduce tasks
  • 22. Mapreduce data flow with no reduce task
  • 23.
  • 24. Combiner Functions • Many MapReduce jobs are limited by the bandwidth available on the cluster. • In order to minimize the data transferred between the map and reduce tasks, combiner functions are introduced. • Hadoop allows the user to specify a combiner function to be run on the map output—the combiner function’s output forms the input to the reduce function. • Combiner finctions can help cut down the amount of data shuffled between the maps and the reduces.
  • 25. Hadoop Streaming: • Hadoop provides an API to MapReduce that allows you to write your map and reduce functions in languages other than Java. • Hadoop Streaming uses Unix standard streams as the interface between Hadoop and your program, so you can use any language that can read standard input and write to standard output to write your MapReduce program.
  • 26. Hadoop Pipes: • Hadoop Pipes is the name of the C++ interface to Hadoop MapReduce. • Unlike Streaming, which uses standard input and output to communicate with the map and reduce code, Pipes uses sockets as the channel over which the tasktracker communicates with the process running the C++ map or reduce function. JNI is not used.
  • 27. HADOOP DISTRIBUTED FILESYSTEM (HDFS)  Filesystems that manage the storage across a network of machines are called distributed filesystems.  Hadoop comes with a distributed filesystem called HDFS, which stands for Hadoop Distributed Filesystem.  HDFS, the Hadoop Distributed File System, is a distributed file system designed to hold very large amounts of data (terabytes or even petabytes), and provide high-throughput access to this information.
  • 28. Namenodes and Datanodes  A HDFS cluster has two types of node operating in a master-worker pattern: a namenode (the master) and a number of datanodes (workers).  The namenode manages the filesystem namespace. It maintains the filesystem tree and the metadata for all the files and directories in the tree.  Datanodes are the work horses of the filesystem. They store and retrieve blocks when they are told to (by clients or the namenode), and they report back to the namenode periodically with lists of blocks that they are storing.
  • 29.  Without the namenode, the filesystem cannot be used. In fact, if the machine running the namenode were obliterated, all the files on the filesystem would be lost since there would be no way of knowing how to reconstruct the files from the blocks on the datanodes.  Important to make the namenode resilient to failure, and Hadoop provides two mechanisms for this: 1. is to back up the files that make up the persistent state of the filesystem metadata. Hadoop can be configured so that the namenode writes its persistent state to multiple filesystems. 2. Another solution is to run a secondary namenode. The secondary namenode usually runs on a separate physical machine, since it requires plenty of CPU and as much memory as the namenode to perform the merge. It keeps a copy of the merged namespace image, which can be used in the event of the namenode failing
  • 30. File System Namespace  HDFS supports a traditional hierarchical file organization. A user or an application can create and remove files, move a file from one directory to another, rename a file, create directories and store files inside these directories.  HDFS does not yet implement user quotas or access permissions. HDFS does not support hard links or soft links. However, the HDFS architecture does not preclude implementing these features.  The Namenode maintains the file system namespace. Any change to the file system namespace or its properties is recorded by the Namenode. An application can specify the number of replicas of a file that should be maintained by HDFS. The number of copies of a file is called the replication factor of that file. This information is stored by the Namenode.
  • 31. REFERENCES  www.wikipedia.com  www.Slideshare.com  www.computereducation.org  http://hadoop.apache.org/