SlideShare a Scribd company logo
1 of 25
HADOOP
DEVELOPED BY : JAYDEEP PATEL(13MCA63)
KULDEEP PATEL(13MCA64)
WHAT IS BIG DATA?
THE TERM BIG DATA STANDS FOR COLLECTION OF DATA SETS THAT ARE TOO
LARGE AND COMPLEX ,SO IT IS DIFFICULT TO CAPTURE , STORE , SEARCH AND
ANALYZE USING TRADITIONAL DATA PROCESSING APPLICATIONS.
 BIG DATA = SORTED DATA + UNSORTED DATA
 SORTED DATA
 UNSORTED DATA
CHARACTERISTICS OF BIG DATA
3VS (VOLUME, VARIETY AND VELOCITY) ARE DEFINING PROPERTIES OR
DIMENSIONS OF BIG DATA.
VOLUME REFERS TO THE AMOUNT OF DATA.
VARIETY REFERS TO THE NUMBER OF TYPES OF DATA.
VELOCITY REFERS TO THE SPEED OF DATA PROCESSING.
Continue…
SERVER3
SERVER2
SERVER1
SERVER6
SERVER5
SERVER4
SO HADOOP IS..
• A PRODUCT OF APACHE SOFTWARE FOUNDATION.
• A SOFTWARE FRAMEWORK WRITTEN IN JAVA.
• IT SUPPORTS CROSS-PLATFORM.
• IT IS OPEN SOURCE.
HADOOP FRAMEWORK IS BUILT OF :
1. HADOOP COMMON
2. HDFS
3. HADOOP YARN
4. MAPREDUCE
HDFS
IT IS A SPECIALLY DESIGN FILE SYSTEM FOR STORING HUGE DATA SETS WITH
CLUSTER OF COMMODITY HARDWARE STREAMING ACCESS PLATFORM.
• CLUSTER
• COMMODITY HARDWARE
• STREAMING ACCESS PLATFORM
• SPECIALLY DESIGN FILE SYSTEM
5 SERVICES PROVIDED BY HDFS
• NAME NODE
• SECONDARY NAME NODE
• JOB TRACKER
• DATA NODE
• TASK TRACKER
Name node
Secondary name node
Job tracker
Data node
Task tracker
client Namenode
1 2 3
4
5 6
DN DN DN
DN DN DN
A.Text
B.Text
C.Text
Request for File A.Text
(1,2,6) Available
client
Map
Job Tracker
1 2 3
4
5 6
TT TT TT
TT TT TT
A.Text (1,2,6)
B.Text
C.Text
Logic
INSTALLATION
REQUIREMENT FOR INSTALLATION
o JAVA 1.6.X , PREFERABLY FROM SUN MUSTBE INSTALLED
o SSH MUST BE INSTALLED AND SSHD MUST BE RUNNING TO USE THE HADOOP SCRIPTS THAT
MANAGE REMOTE HADOOP DAEMONS
o INSTALL HADOOP-2.3.0 AND HADOOP-2.3-CONFIG-MASTER
o WWW.HADOOP.APACHE.ORG
INSTALL JAVA
SET PATH OF JAVA IN ENVIRONMENT VARIABLES
REPLACE YARN.CMD IN HADOOP 2.3.0.TAR.GZ IN BIN FOLDER
REPLACE WHOLE HADOOP FOLDER FROM CONFIG MASTER TO TAR.GZ FOLDER
SET HADOOP PATH IN ENVIRONMENT VARIABLES
OPEN CMD AND RUN HADOOP
FLOW CHART OF WORD COUNT JOB
FILE.TXT 200MB
Input File(File.txt)
Input Split Input Split Input Split Input Split
Mapper Mapper Mapper Mapper
64mb
64mb
64mb
8mb
Record
Reader
Record
Reader
Record
Reader
Record
Reader
(byteoffset , entireline)
(0 , hi how are you?)
(17 , how is your job?)
(how,1)(what,1)
(is,1)(your,1)
(how,1)(is,1)
(brother,1)(now,1)
INTERMEDIATE DATA
Mapper Mapper Mapper Mapper
(what,1)
(is,1) (your,1)
(how,1) (is,1)
(brother,1) (now,1)
(time,1) (is,1)
(the,1)
(how,1)(hi,1)
(is,1)(how,1)
(are,1)(your,1)
(you,1)(job,1)
(how,1)(what,1)
(is,1)(your,1)
(how,1)(is,1)
(sister,1)(family,1)
(what,1)
(is,1) (use,1)
(of,1) (hadoop,1)
Intermediate Data Shuffling Sorting
(how,1,1,1,1,1) Reducer(how,5)
COMPLETE FLOW
Input File(File.txt)
Input Split
Record
Reader
Mapper
Reducer
Record
writer
Output File
OUTPUT
(are,1)
(brother,1)
(family,1)
(hadoop,1)
(hi,1)
(how,4)
(is,6)
(job,1)
(now,1)
(of,1)
(sister,1)
(the,2)
(time,1)
(use,1)
(what,2)
(you,1)
(your,4)
THANK YOU!!!

More Related Content

What's hot

Big Data - Fast Machine Learning at Scale + Couchbase
Big Data - Fast Machine Learning at Scale + CouchbaseBig Data - Fast Machine Learning at Scale + Couchbase
Big Data - Fast Machine Learning at Scale + CouchbaseFujio Turner
 
Big Data - Load, Index & Query the EZ way - HPCC Systems
Big Data - Load, Index & Query the EZ way - HPCC SystemsBig Data - Load, Index & Query the EZ way - HPCC Systems
Big Data - Load, Index & Query the EZ way - HPCC SystemsFujio Turner
 
Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Rohit Agrawal
 
An introduction to Big-Data processing applying hadoop
An introduction to Big-Data processing applying hadoopAn introduction to Big-Data processing applying hadoop
An introduction to Big-Data processing applying hadoopAmir Sedighi
 
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...CIARD Movement
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.elliando dias
 
Big Data - In-Memory Index / Sub Second Query engine - Roxie - HPCC Systems
Big Data - In-Memory Index / Sub Second Query engine - Roxie - HPCC SystemsBig Data - In-Memory Index / Sub Second Query engine - Roxie - HPCC Systems
Big Data - In-Memory Index / Sub Second Query engine - Roxie - HPCC SystemsFujio Turner
 
Practical Hadoop using Pig
Practical Hadoop using PigPractical Hadoop using Pig
Practical Hadoop using PigDavid Wellman
 
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所Ryuji Tamagawa
 
Introduction to Hadoop
Introduction to Hadoop Introduction to Hadoop
Introduction to Hadoop Sudarshan Pant
 
20171012 found IT #9 PySparkの勘所
20171012 found  IT #9 PySparkの勘所20171012 found  IT #9 PySparkの勘所
20171012 found IT #9 PySparkの勘所Ryuji Tamagawa
 
Beginner Apache Spark Presentation
Beginner Apache Spark PresentationBeginner Apache Spark Presentation
Beginner Apache Spark PresentationNidhin Pattaniyil
 
Spark - Alexis Seigneurin (English)
Spark - Alexis Seigneurin (English)Spark - Alexis Seigneurin (English)
Spark - Alexis Seigneurin (English)Alexis Seigneurin
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingMitsuharu Hamba
 
PySparkの勘所(20170630 sapporo db analytics showcase)
PySparkの勘所(20170630 sapporo db analytics showcase) PySparkの勘所(20170630 sapporo db analytics showcase)
PySparkの勘所(20170630 sapporo db analytics showcase) Ryuji Tamagawa
 
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxData
 
20170210 sapporotechbar7
20170210 sapporotechbar720170210 sapporotechbar7
20170210 sapporotechbar7Ryuji Tamagawa
 

What's hot (20)

Big Data - Fast Machine Learning at Scale + Couchbase
Big Data - Fast Machine Learning at Scale + CouchbaseBig Data - Fast Machine Learning at Scale + Couchbase
Big Data - Fast Machine Learning at Scale + Couchbase
 
Big Data - Load, Index & Query the EZ way - HPCC Systems
Big Data - Load, Index & Query the EZ way - HPCC SystemsBig Data - Load, Index & Query the EZ way - HPCC Systems
Big Data - Load, Index & Query the EZ way - HPCC Systems
 
Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2
 
Hadoop bootcamp getting started
Hadoop bootcamp getting startedHadoop bootcamp getting started
Hadoop bootcamp getting started
 
An introduction to Big-Data processing applying hadoop
An introduction to Big-Data processing applying hadoopAn introduction to Big-Data processing applying hadoop
An introduction to Big-Data processing applying hadoop
 
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.
 
Big Data - In-Memory Index / Sub Second Query engine - Roxie - HPCC Systems
Big Data - In-Memory Index / Sub Second Query engine - Roxie - HPCC SystemsBig Data - In-Memory Index / Sub Second Query engine - Roxie - HPCC Systems
Big Data - In-Memory Index / Sub Second Query engine - Roxie - HPCC Systems
 
Practical Hadoop using Pig
Practical Hadoop using PigPractical Hadoop using Pig
Practical Hadoop using Pig
 
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
 
Introduction to Hadoop
Introduction to Hadoop Introduction to Hadoop
Introduction to Hadoop
 
20171012 found IT #9 PySparkの勘所
20171012 found  IT #9 PySparkの勘所20171012 found  IT #9 PySparkの勘所
20171012 found IT #9 PySparkの勘所
 
Visualizing and Analyzing HDF-EOS5 and HDF5 data with NCL
Visualizing and Analyzing HDF-EOS5 and HDF5 data with NCLVisualizing and Analyzing HDF-EOS5 and HDF5 data with NCL
Visualizing and Analyzing HDF-EOS5 and HDF5 data with NCL
 
Beginner Apache Spark Presentation
Beginner Apache Spark PresentationBeginner Apache Spark Presentation
Beginner Apache Spark Presentation
 
Spark - Alexis Seigneurin (English)
Spark - Alexis Seigneurin (English)Spark - Alexis Seigneurin (English)
Spark - Alexis Seigneurin (English)
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReading
 
PySparkの勘所(20170630 sapporo db analytics showcase)
PySparkの勘所(20170630 sapporo db analytics showcase) PySparkの勘所(20170630 sapporo db analytics showcase)
PySparkの勘所(20170630 sapporo db analytics showcase)
 
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
 
20170210 sapporotechbar7
20170210 sapporotechbar720170210 sapporotechbar7
20170210 sapporotechbar7
 

Viewers also liked

The Bing Platform that Powers Cortana
The Bing Platform that Powers CortanaThe Bing Platform that Powers Cortana
The Bing Platform that Powers CortanaSavas Parastatidis
 
Japan - The land of rising sun
Japan - The land of rising sunJapan - The land of rising sun
Japan - The land of rising sunAmod Tawade
 
Cortana : A Microsoft Virtual Personal Assistant
Cortana : A Microsoft Virtual Personal AssistantCortana : A Microsoft Virtual Personal Assistant
Cortana : A Microsoft Virtual Personal AssistantSushil Kumar Sharma
 
MICROSOFT CORTANA
MICROSOFT  CORTANAMICROSOFT  CORTANA
MICROSOFT CORTANAKANISHK
 
Introducing The Amazon Echo
Introducing The Amazon EchoIntroducing The Amazon Echo
Introducing The Amazon EchoMicah Flores
 
Virtual personal assistant
Virtual personal assistantVirtual personal assistant
Virtual personal assistantShubham Bhalekar
 
Please meet Amazon Alexa and the Alexa Skills Kit
Please meet Amazon Alexa and the Alexa Skills KitPlease meet Amazon Alexa and the Alexa Skills Kit
Please meet Amazon Alexa and the Alexa Skills KitAmazon Web Services
 
(MBL301) Creating Voice Experiences Using Amazon Alexa
(MBL301) Creating Voice Experiences Using Amazon Alexa(MBL301) Creating Voice Experiences Using Amazon Alexa
(MBL301) Creating Voice Experiences Using Amazon AlexaAmazon Web Services
 

Viewers also liked (11)

Cortana
CortanaCortana
Cortana
 
The Bing Platform that Powers Cortana
The Bing Platform that Powers CortanaThe Bing Platform that Powers Cortana
The Bing Platform that Powers Cortana
 
Japan - The land of rising sun
Japan - The land of rising sunJapan - The land of rising sun
Japan - The land of rising sun
 
Cortana
Cortana Cortana
Cortana
 
Cortana : A Microsoft Virtual Personal Assistant
Cortana : A Microsoft Virtual Personal AssistantCortana : A Microsoft Virtual Personal Assistant
Cortana : A Microsoft Virtual Personal Assistant
 
MICROSOFT CORTANA
MICROSOFT  CORTANAMICROSOFT  CORTANA
MICROSOFT CORTANA
 
Introducing The Amazon Echo
Introducing The Amazon EchoIntroducing The Amazon Echo
Introducing The Amazon Echo
 
Amazon Echo
Amazon EchoAmazon Echo
Amazon Echo
 
Virtual personal assistant
Virtual personal assistantVirtual personal assistant
Virtual personal assistant
 
Please meet Amazon Alexa and the Alexa Skills Kit
Please meet Amazon Alexa and the Alexa Skills KitPlease meet Amazon Alexa and the Alexa Skills Kit
Please meet Amazon Alexa and the Alexa Skills Kit
 
(MBL301) Creating Voice Experiences Using Amazon Alexa
(MBL301) Creating Voice Experiences Using Amazon Alexa(MBL301) Creating Voice Experiences Using Amazon Alexa
(MBL301) Creating Voice Experiences Using Amazon Alexa
 

Similar to Hadoop MapReduce Word Count Project

Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopSavvycom Savvycom
 
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014NoSQLmatters
 
Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Pilot Hadoop Towards 2500 Nodes and Cluster RedundancyPilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Pilot Hadoop Towards 2500 Nodes and Cluster RedundancyStuart Pook
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weitingWei Ting Chen
 
Q4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationQ4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationRob Emanuele
 
Analyzing Big data in R and Scala using Apache Spark 17-7-19
Analyzing Big data in R and Scala using Apache Spark  17-7-19Analyzing Big data in R and Scala using Apache Spark  17-7-19
Analyzing Big data in R and Scala using Apache Spark 17-7-19Ahmed Elsayed
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksDatabricks
 
Aerospike AdTech Gets Hacked in Lower Manhattan
Aerospike AdTech Gets Hacked in Lower ManhattanAerospike AdTech Gets Hacked in Lower Manhattan
Aerospike AdTech Gets Hacked in Lower ManhattanAerospike
 
You Snooze You Lose or How to Win in Ad Tech?
You Snooze You Lose or How to Win in Ad Tech?You Snooze You Lose or How to Win in Ad Tech?
You Snooze You Lose or How to Win in Ad Tech?Aerospike, Inc.
 
Open Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOCOpen Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOCSheetal Dolas
 
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...
Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...Facultad de Informática UCM
 
[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...
[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...
[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...Andrew Liu
 
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...Glenn K. Lockwood
 
Spark to DocumentDB connector
Spark to DocumentDB connectorSpark to DocumentDB connector
Spark to DocumentDB connectorDenny Lee
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Sumeet Singh
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkCloudera, Inc.
 

Similar to Hadoop MapReduce Word Count Project (20)

Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & Hadoop
 
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Pilot Hadoop Towards 2500 Nodes and Cluster RedundancyPilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
 
Q4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationQ4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis Presentation
 
Analyzing Big data in R and Scala using Apache Spark 17-7-19
Analyzing Big data in R and Scala using Apache Spark  17-7-19Analyzing Big data in R and Scala using Apache Spark  17-7-19
Analyzing Big data in R and Scala using Apache Spark 17-7-19
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
 
Aerospike AdTech Gets Hacked in Lower Manhattan
Aerospike AdTech Gets Hacked in Lower ManhattanAerospike AdTech Gets Hacked in Lower Manhattan
Aerospike AdTech Gets Hacked in Lower Manhattan
 
You Snooze You Lose or How to Win in Ad Tech?
You Snooze You Lose or How to Win in Ad Tech?You Snooze You Lose or How to Win in Ad Tech?
You Snooze You Lose or How to Win in Ad Tech?
 
Open Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOCOpen Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOC
 
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...
Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...
 
[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...
[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...
[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...
 
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
 
Spark to DocumentDB connector
Spark to DocumentDB connectorSpark to DocumentDB connector
Spark to DocumentDB connector
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod Narasimha
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache Spark
 

Recently uploaded

Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 

Recently uploaded (20)

Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 

Hadoop MapReduce Word Count Project

  • 1. HADOOP DEVELOPED BY : JAYDEEP PATEL(13MCA63) KULDEEP PATEL(13MCA64)
  • 2. WHAT IS BIG DATA? THE TERM BIG DATA STANDS FOR COLLECTION OF DATA SETS THAT ARE TOO LARGE AND COMPLEX ,SO IT IS DIFFICULT TO CAPTURE , STORE , SEARCH AND ANALYZE USING TRADITIONAL DATA PROCESSING APPLICATIONS.  BIG DATA = SORTED DATA + UNSORTED DATA  SORTED DATA  UNSORTED DATA
  • 3. CHARACTERISTICS OF BIG DATA 3VS (VOLUME, VARIETY AND VELOCITY) ARE DEFINING PROPERTIES OR DIMENSIONS OF BIG DATA. VOLUME REFERS TO THE AMOUNT OF DATA. VARIETY REFERS TO THE NUMBER OF TYPES OF DATA. VELOCITY REFERS TO THE SPEED OF DATA PROCESSING.
  • 6. SO HADOOP IS.. • A PRODUCT OF APACHE SOFTWARE FOUNDATION. • A SOFTWARE FRAMEWORK WRITTEN IN JAVA. • IT SUPPORTS CROSS-PLATFORM. • IT IS OPEN SOURCE. HADOOP FRAMEWORK IS BUILT OF : 1. HADOOP COMMON 2. HDFS 3. HADOOP YARN 4. MAPREDUCE
  • 7. HDFS IT IS A SPECIALLY DESIGN FILE SYSTEM FOR STORING HUGE DATA SETS WITH CLUSTER OF COMMODITY HARDWARE STREAMING ACCESS PLATFORM. • CLUSTER • COMMODITY HARDWARE • STREAMING ACCESS PLATFORM • SPECIALLY DESIGN FILE SYSTEM
  • 8. 5 SERVICES PROVIDED BY HDFS • NAME NODE • SECONDARY NAME NODE • JOB TRACKER • DATA NODE • TASK TRACKER Name node Secondary name node Job tracker Data node Task tracker
  • 9. client Namenode 1 2 3 4 5 6 DN DN DN DN DN DN A.Text B.Text C.Text Request for File A.Text (1,2,6) Available
  • 10. client Map Job Tracker 1 2 3 4 5 6 TT TT TT TT TT TT A.Text (1,2,6) B.Text C.Text Logic
  • 12. REQUIREMENT FOR INSTALLATION o JAVA 1.6.X , PREFERABLY FROM SUN MUSTBE INSTALLED o SSH MUST BE INSTALLED AND SSHD MUST BE RUNNING TO USE THE HADOOP SCRIPTS THAT MANAGE REMOTE HADOOP DAEMONS o INSTALL HADOOP-2.3.0 AND HADOOP-2.3-CONFIG-MASTER o WWW.HADOOP.APACHE.ORG
  • 14. SET PATH OF JAVA IN ENVIRONMENT VARIABLES
  • 15. REPLACE YARN.CMD IN HADOOP 2.3.0.TAR.GZ IN BIN FOLDER
  • 16.
  • 17. REPLACE WHOLE HADOOP FOLDER FROM CONFIG MASTER TO TAR.GZ FOLDER
  • 18. SET HADOOP PATH IN ENVIRONMENT VARIABLES
  • 19. OPEN CMD AND RUN HADOOP
  • 20.
  • 21. FLOW CHART OF WORD COUNT JOB FILE.TXT 200MB Input File(File.txt) Input Split Input Split Input Split Input Split Mapper Mapper Mapper Mapper 64mb 64mb 64mb 8mb Record Reader Record Reader Record Reader Record Reader (byteoffset , entireline) (0 , hi how are you?) (17 , how is your job?) (how,1)(what,1) (is,1)(your,1) (how,1)(is,1) (brother,1)(now,1)
  • 22. INTERMEDIATE DATA Mapper Mapper Mapper Mapper (what,1) (is,1) (your,1) (how,1) (is,1) (brother,1) (now,1) (time,1) (is,1) (the,1) (how,1)(hi,1) (is,1)(how,1) (are,1)(your,1) (you,1)(job,1) (how,1)(what,1) (is,1)(your,1) (how,1)(is,1) (sister,1)(family,1) (what,1) (is,1) (use,1) (of,1) (hadoop,1) Intermediate Data Shuffling Sorting (how,1,1,1,1,1) Reducer(how,5)
  • 23. COMPLETE FLOW Input File(File.txt) Input Split Record Reader Mapper Reducer Record writer Output File