SlideShare a Scribd company logo
BY ANKIT PRASAD
CSE 3RD YEAR
NSEC
What is Big Data?
Big Data is a collection of large datasets that
cannot be processed using traditional
computing techniques.
Big Data includes huge volume, high velocity,
and extensible variety of data.
Classification of Big
Data
The data in it will be of three types:
Structured data: Relational data.
Semi Structured data: XML data.
Unstructured data: Word, PDF, Text, Media
Logs.
Big Data Challenges
The major challenges associated with big data:
Capturing data
Storage
Searching
Sharing
Transfer
Analysis
 Presentation
's Solution
MapReduce
It is a parallel programming model for writing
distributed applications.
It can efficiently process multi-terabyte data-
sets.
Runs on large clusters of commodity
hardware in a reliable, fault-tolerant manner.
Introduction to Hadoop
Hadoop was developed by Doug Cutting.
Hadoop is an Apache open source
framework written in java.
 Hadoop allows distributed storage and
processing of large datasets across clusters of
computers.
Hadoop Architecture
Hadoop has the two major layers namely:
Processing/Computation layer (MapReduce)
Storage layer (Hadoop Distributed File
System)
Other modules of Hadoop Framework includes:
Hadoop Common
 Hadoop YARN(Yet Another Resource
Negotiator)
What is MapReduce?
The MapReduce algorithm contains two
important tasks, namely Map and Reduce.
Map takes a set of data and breaks
individual elements into tuples (key/value
pairs).
Reduce takes Map’s output as an input and
combines those data tuples forming a
smaller set of tuples.
Under the MapReduce model, the data
processing primitives are called mappers and
reducers.
MapReduce Algorithm
Hadoop initiates Map stage by issuing
mapping task to appropriate servers in the
cluster.
Map stage:
The input file or directory, stored in the HDFS is
passed to the mapper function line by line.
The mapper processes the data and creates
several small chunks of data(key/value pairs).
Hadoop monitors for task completion and
initiates shuffle stage.
Shuffle stage:
The framework groups data from all mappers
by the keys and splits them among the
appropriate servers for the reduce stage.
Reduce stage:
The Reducer processes the data coming from
the mapper, producing a new set of output,
that is stored in the HDFS.
The framework manages all the details of
data-passing and copying between the
nodes in the cluster.
Hadoop Distributed File
System
HDFS is based on the Google File System.
It is highly fault-tolerant and is designed to be
deployed on low-cost hardware.
It is suitable for applications having large
datasets.
These files are stored in redundant fashion to
rescue the system from possible data losses in
case of failure.
HDFS Architecture
Namenode:
It acts as a master server that manages the
file system namespace.
Regulates client’s access to files.
Datanode:
These nodes manage the data storage of
their system.
And performs read-write and block
operations regulated by namenode.
Block:
It is the minimum amount of data that HDFS
can read/ write.
The files are divided into one or more blocks.
Blocks are stored in individual data nodes.
Hadoop Common
It provides essential services and basic
processes such as abstraction of the
underlying operating system and its file
system.
It assumes that hardware failures are
common and should be automatically
handled by the Framework.
It also contains the necessary Java Archive
(JAR) files and scripts required to start
Hadoop.
Hadoop YARN
ResourceManager:
It is a clustering platform that helps to
manage and allocate resources to
applications and schedule tasks.
ApplicationMasters:
 Responsible for negotiating resources with
the ResourceManager and for working
with the Node Managers to execute and
monitor the tasks.
NodeManager:
Takes instructions from the ResourceManager
and manage resources on its own node.
How Does Hadoop
Work?
Data is initially divided into directories and
files. Files are divided into uniform sized blocks
of 128M and 64M.
These files are then distributed across various
cluster nodes for further processing
supervised by the HDFS.
Blocks are replicated for handling hardware
failure.
Checking that the code was executed
successfully.
Performing the sort that takes place between
the map and reduce stages.
Sending the sorted data to a certain
computer.
Writing the debugging logs for each job.
Applications of Hadoop
Black Box Data
Social Media Data
Stock Exchange Data
Transport Data
Search Engine Data
Prominent users of
Hadoop
The Search Webmap is a Hadoop
application that runs on a big Linux cluster.
In 2010, Facebook claimed that they had the
largest Hadoop cluster in the world.
The New York Times used 100
instances and a Hadoop application to
process 4 TB data into 11 million PDFs in a day
at a computation cost of about $240.
Advantages of Hadoop
Hadoop is open source and compatible on
all the platforms since it is Java based.
Hadoop does not rely on hardware to
provide fault-tolerance and high availability.
Servers can be added or removed from the
cluster dynamically without interruption.
Hadoop efficiently utilizes the underlying
parallelism of the CPU cores in distributed
systems .
References:
www.tutorialspoint.com/hadoop/
https://en.wikipedia.org/wiki/Apache_Hado
op
https://hadoop.apache.org/docs/r2.7.1/had
oop-yarn/hadoop-yarn-site/YARN.html
https://hortonworks.com/blog/apache-
hadoop-yarn-resourcemanager/
http://saphanatutorial.com/how-yarn-
overcomes-mapreduce-limitations-in-
hadoop-2-0/
Hadoop

More Related Content

What's hot

Harnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution TimesHarnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution Times
David Tjahjono,MD,MBA(UK)
 
TCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYATCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYAAditya Srinivasan
 
Hadoop
HadoopHadoop
Apache Hadoop - Big Data Engineering
Apache Hadoop - Big Data EngineeringApache Hadoop - Big Data Engineering
Apache Hadoop - Big Data Engineering
BADR
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : Nutshell
Khalid Imran
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data Hadoop
Apache Apex
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
saisreealekhya
 
IJET-V3I2P14
IJET-V3I2P14IJET-V3I2P14
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
sravya raju
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
datastack
 
Cred_hadoop_presenatation
Cred_hadoop_presenatationCred_hadoop_presenatation
Cred_hadoop_presenatationAshish Saraf
 
Hadoop An Introduction
Hadoop An IntroductionHadoop An Introduction
Hadoop An Introduction
Mohanasundaram Ponnusamy
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asia
Muhammad Rifqi
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – Hadoop
IOSR Journals
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
AshishRathore72
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data StackZubair Nabi
 
Iaetsd mapreduce streaming over cassandra datasets
Iaetsd mapreduce streaming over cassandra datasetsIaetsd mapreduce streaming over cassandra datasets
Iaetsd mapreduce streaming over cassandra datasets
Iaetsd Iaetsd
 

What's hot (18)

Harnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution TimesHarnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution Times
 
TCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYATCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYA
 
Hadoop
HadoopHadoop
Hadoop
 
Apache Hadoop - Big Data Engineering
Apache Hadoop - Big Data EngineeringApache Hadoop - Big Data Engineering
Apache Hadoop - Big Data Engineering
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : Nutshell
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data Hadoop
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
 
IJET-V3I2P14
IJET-V3I2P14IJET-V3I2P14
IJET-V3I2P14
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
 
Cred_hadoop_presenatation
Cred_hadoop_presenatationCred_hadoop_presenatation
Cred_hadoop_presenatation
 
Hadoop An Introduction
Hadoop An IntroductionHadoop An Introduction
Hadoop An Introduction
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asia
 
paper
paperpaper
paper
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – Hadoop
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
 
Iaetsd mapreduce streaming over cassandra datasets
Iaetsd mapreduce streaming over cassandra datasetsIaetsd mapreduce streaming over cassandra datasets
Iaetsd mapreduce streaming over cassandra datasets
 

Viewers also liked

Enterprise mobility michaelsentonas
Enterprise mobility michaelsentonasEnterprise mobility michaelsentonas
Enterprise mobility michaelsentonas
ITband
 
CV_TEKTAMELIDIS_Alkiviadis_December_2014
CV_TEKTAMELIDIS_Alkiviadis_December_2014CV_TEKTAMELIDIS_Alkiviadis_December_2014
CV_TEKTAMELIDIS_Alkiviadis_December_2014Alkiviadis TEKTAMELIDIS
 
3.3 Le sous projet GISST
3.3 Le sous projet GISST3.3 Le sous projet GISST
3.3 Le sous projet GISST
grisicap
 
微博开发者大会:微博,社会化媒体趋势 By 李开复
微博开发者大会:微博,社会化媒体趋势 By 李开复微博开发者大会:微博,社会化媒体趋势 By 李开复
微博开发者大会:微博,社会化媒体趋势 By 李开复
ITband
 
分会场二Storage foundation 中的多通道技术
分会场二Storage foundation 中的多通道技术分会场二Storage foundation 中的多通道技术
分会场二Storage foundation 中的多通道技术
ITband
 
CV_TEKTAMELIDIS_Alkiviadis_December_2014
CV_TEKTAMELIDIS_Alkiviadis_December_2014CV_TEKTAMELIDIS_Alkiviadis_December_2014
CV_TEKTAMELIDIS_Alkiviadis_December_2014Alkiviadis TEKTAMELIDIS
 
Tourism Cloud Calendar 2017
Tourism Cloud Calendar 2017Tourism Cloud Calendar 2017
Tourism Cloud Calendar 2017
Anil G
 
Social Media For Business Part 1
Social Media For Business Part 1Social Media For Business Part 1
Social Media For Business Part 1
Succeed In Every Way
 
AWS Summit Sydney: Life’s Too Short...for Cloud without Analytics
AWS Summit Sydney: Life’s Too Short...for Cloud without AnalyticsAWS Summit Sydney: Life’s Too Short...for Cloud without Analytics
AWS Summit Sydney: Life’s Too Short...for Cloud without Analytics
Lee Atchison
 
Maximize cloud and application performance with hundreds of operations bridge...
Maximize cloud and application performance with hundreds of operations bridge...Maximize cloud and application performance with hundreds of operations bridge...
Maximize cloud and application performance with hundreds of operations bridge...
Stefan Bergstein
 
The Famous Grouse 40 years old
The Famous Grouse 40 years oldThe Famous Grouse 40 years old
The Famous Grouse 40 years old
Petra de Boevere
 
Bankovní API ve světě
Bankovní API ve světěBankovní API ve světě
Bankovní API ve světě
Petr Dvorak
 
medi sys corp the intenscare
medi sys corp the intenscaremedi sys corp the intenscare
medi sys corp the intenscare
Maninerror Hanif
 
Case Study on HR-Performance Appraisal
Case Study on HR-Performance AppraisalCase Study on HR-Performance Appraisal
Case Study on HR-Performance Appraisal
Sharon Mansoor
 
Case analysis of thomas green
Case analysis of thomas greenCase analysis of thomas green
Case analysis of thomas green
Kaushik Chakraborty
 
Global Landscape Study on P2G Payments: Summary of in-country consumer resear...
Global Landscape Study on P2G Payments: Summary of in-country consumer resear...Global Landscape Study on P2G Payments: Summary of in-country consumer resear...
Global Landscape Study on P2G Payments: Summary of in-country consumer resear...
CGAP
 
Customer experience strategy development methodology v1.6
Customer experience strategy development methodology v1.6Customer experience strategy development methodology v1.6
Customer experience strategy development methodology v1.6
Roberto Suarez-Ojedis
 

Viewers also liked (18)

Enterprise mobility michaelsentonas
Enterprise mobility michaelsentonasEnterprise mobility michaelsentonas
Enterprise mobility michaelsentonas
 
CV_TEKTAMELIDIS_Alkiviadis_December_2014
CV_TEKTAMELIDIS_Alkiviadis_December_2014CV_TEKTAMELIDIS_Alkiviadis_December_2014
CV_TEKTAMELIDIS_Alkiviadis_December_2014
 
3.3 Le sous projet GISST
3.3 Le sous projet GISST3.3 Le sous projet GISST
3.3 Le sous projet GISST
 
微博开发者大会:微博,社会化媒体趋势 By 李开复
微博开发者大会:微博,社会化媒体趋势 By 李开复微博开发者大会:微博,社会化媒体趋势 By 李开复
微博开发者大会:微博,社会化媒体趋势 By 李开复
 
分会场二Storage foundation 中的多通道技术
分会场二Storage foundation 中的多通道技术分会场二Storage foundation 中的多通道技术
分会场二Storage foundation 中的多通道技术
 
CV_TEKTAMELIDIS_Alkiviadis_December_2014
CV_TEKTAMELIDIS_Alkiviadis_December_2014CV_TEKTAMELIDIS_Alkiviadis_December_2014
CV_TEKTAMELIDIS_Alkiviadis_December_2014
 
Tourism Cloud Calendar 2017
Tourism Cloud Calendar 2017Tourism Cloud Calendar 2017
Tourism Cloud Calendar 2017
 
Social Media For Business Part 1
Social Media For Business Part 1Social Media For Business Part 1
Social Media For Business Part 1
 
AWS Summit Sydney: Life’s Too Short...for Cloud without Analytics
AWS Summit Sydney: Life’s Too Short...for Cloud without AnalyticsAWS Summit Sydney: Life’s Too Short...for Cloud without Analytics
AWS Summit Sydney: Life’s Too Short...for Cloud without Analytics
 
Maximize cloud and application performance with hundreds of operations bridge...
Maximize cloud and application performance with hundreds of operations bridge...Maximize cloud and application performance with hundreds of operations bridge...
Maximize cloud and application performance with hundreds of operations bridge...
 
The Famous Grouse 40 years old
The Famous Grouse 40 years oldThe Famous Grouse 40 years old
The Famous Grouse 40 years old
 
Bankovní API ve světě
Bankovní API ve světěBankovní API ve světě
Bankovní API ve světě
 
medi sys corp the intenscare
medi sys corp the intenscaremedi sys corp the intenscare
medi sys corp the intenscare
 
Final Report SAFE
Final Report SAFEFinal Report SAFE
Final Report SAFE
 
Case Study on HR-Performance Appraisal
Case Study on HR-Performance AppraisalCase Study on HR-Performance Appraisal
Case Study on HR-Performance Appraisal
 
Case analysis of thomas green
Case analysis of thomas greenCase analysis of thomas green
Case analysis of thomas green
 
Global Landscape Study on P2G Payments: Summary of in-country consumer resear...
Global Landscape Study on P2G Payments: Summary of in-country consumer resear...Global Landscape Study on P2G Payments: Summary of in-country consumer resear...
Global Landscape Study on P2G Payments: Summary of in-country consumer resear...
 
Customer experience strategy development methodology v1.6
Customer experience strategy development methodology v1.6Customer experience strategy development methodology v1.6
Customer experience strategy development methodology v1.6
 

Similar to Hadoop

Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
Nalini Mehta
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
Manoj Jangalva
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
Mr. Ankit
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
Atul Kushwaha
 
G017143640
G017143640G017143640
G017143640
IOSR Journals
 
Distributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxDistributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptx
Uttara University
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
Ankan Banerjee
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction
Xuan-Chao Huang
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
rebeccatho
 
Hadoop overview.pdf
Hadoop overview.pdfHadoop overview.pdf
Hadoop overview.pdf
Sunil D Patil
 
Seminar ppt
Seminar pptSeminar ppt
Seminar ppt
RajatTripathi34
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorial
vinayiqbusiness
 
Big data ppt
Big data pptBig data ppt
Big data ppt
Shweta Sahu
 
Big data
Big dataBig data
Big data
revathireddyb
 
Cppt Hadoop
Cppt HadoopCppt Hadoop
Cppt Hadoop
chunkypandey12
 
Cppt
CpptCppt
Cppt
CpptCppt
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
datastack
 
2.1-HADOOP.pdf
2.1-HADOOP.pdf2.1-HADOOP.pdf
2.1-HADOOP.pdf
MarianJRuben
 

Similar to Hadoop (20)

Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
hadoop
hadoophadoop
hadoop
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
G017143640
G017143640G017143640
G017143640
 
Distributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxDistributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptx
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Hadoop overview.pdf
Hadoop overview.pdfHadoop overview.pdf
Hadoop overview.pdf
 
Seminar ppt
Seminar pptSeminar ppt
Seminar ppt
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorial
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data
Big dataBig data
Big data
 
Cppt Hadoop
Cppt HadoopCppt Hadoop
Cppt Hadoop
 
Cppt
CpptCppt
Cppt
 
Cppt
CpptCppt
Cppt
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
2.1-HADOOP.pdf
2.1-HADOOP.pdf2.1-HADOOP.pdf
2.1-HADOOP.pdf
 

Recently uploaded

一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 

Recently uploaded (20)

一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 

Hadoop

  • 1. BY ANKIT PRASAD CSE 3RD YEAR NSEC
  • 2. What is Big Data? Big Data is a collection of large datasets that cannot be processed using traditional computing techniques. Big Data includes huge volume, high velocity, and extensible variety of data.
  • 3. Classification of Big Data The data in it will be of three types: Structured data: Relational data. Semi Structured data: XML data. Unstructured data: Word, PDF, Text, Media Logs.
  • 4. Big Data Challenges The major challenges associated with big data: Capturing data Storage Searching Sharing Transfer Analysis  Presentation
  • 5. 's Solution MapReduce It is a parallel programming model for writing distributed applications. It can efficiently process multi-terabyte data- sets. Runs on large clusters of commodity hardware in a reliable, fault-tolerant manner.
  • 6.
  • 7. Introduction to Hadoop Hadoop was developed by Doug Cutting. Hadoop is an Apache open source framework written in java.  Hadoop allows distributed storage and processing of large datasets across clusters of computers.
  • 8. Hadoop Architecture Hadoop has the two major layers namely: Processing/Computation layer (MapReduce) Storage layer (Hadoop Distributed File System) Other modules of Hadoop Framework includes: Hadoop Common  Hadoop YARN(Yet Another Resource Negotiator)
  • 9. What is MapReduce? The MapReduce algorithm contains two important tasks, namely Map and Reduce. Map takes a set of data and breaks individual elements into tuples (key/value pairs). Reduce takes Map’s output as an input and combines those data tuples forming a smaller set of tuples.
  • 10. Under the MapReduce model, the data processing primitives are called mappers and reducers.
  • 11. MapReduce Algorithm Hadoop initiates Map stage by issuing mapping task to appropriate servers in the cluster. Map stage: The input file or directory, stored in the HDFS is passed to the mapper function line by line. The mapper processes the data and creates several small chunks of data(key/value pairs). Hadoop monitors for task completion and initiates shuffle stage.
  • 12. Shuffle stage: The framework groups data from all mappers by the keys and splits them among the appropriate servers for the reduce stage. Reduce stage: The Reducer processes the data coming from the mapper, producing a new set of output, that is stored in the HDFS. The framework manages all the details of data-passing and copying between the nodes in the cluster.
  • 13. Hadoop Distributed File System HDFS is based on the Google File System. It is highly fault-tolerant and is designed to be deployed on low-cost hardware. It is suitable for applications having large datasets. These files are stored in redundant fashion to rescue the system from possible data losses in case of failure.
  • 14. HDFS Architecture Namenode: It acts as a master server that manages the file system namespace. Regulates client’s access to files. Datanode: These nodes manage the data storage of their system. And performs read-write and block operations regulated by namenode.
  • 15. Block: It is the minimum amount of data that HDFS can read/ write. The files are divided into one or more blocks. Blocks are stored in individual data nodes.
  • 16. Hadoop Common It provides essential services and basic processes such as abstraction of the underlying operating system and its file system. It assumes that hardware failures are common and should be automatically handled by the Framework. It also contains the necessary Java Archive (JAR) files and scripts required to start Hadoop.
  • 17. Hadoop YARN ResourceManager: It is a clustering platform that helps to manage and allocate resources to applications and schedule tasks. ApplicationMasters:  Responsible for negotiating resources with the ResourceManager and for working with the Node Managers to execute and monitor the tasks.
  • 18. NodeManager: Takes instructions from the ResourceManager and manage resources on its own node.
  • 19. How Does Hadoop Work? Data is initially divided into directories and files. Files are divided into uniform sized blocks of 128M and 64M. These files are then distributed across various cluster nodes for further processing supervised by the HDFS. Blocks are replicated for handling hardware failure. Checking that the code was executed successfully.
  • 20. Performing the sort that takes place between the map and reduce stages. Sending the sorted data to a certain computer. Writing the debugging logs for each job.
  • 21. Applications of Hadoop Black Box Data Social Media Data Stock Exchange Data Transport Data Search Engine Data
  • 22. Prominent users of Hadoop The Search Webmap is a Hadoop application that runs on a big Linux cluster. In 2010, Facebook claimed that they had the largest Hadoop cluster in the world. The New York Times used 100 instances and a Hadoop application to process 4 TB data into 11 million PDFs in a day at a computation cost of about $240.
  • 23. Advantages of Hadoop Hadoop is open source and compatible on all the platforms since it is Java based. Hadoop does not rely on hardware to provide fault-tolerance and high availability. Servers can be added or removed from the cluster dynamically without interruption. Hadoop efficiently utilizes the underlying parallelism of the CPU cores in distributed systems .