Hadoop MapReduce Word Count Project

•Download as PPTX, PDF•

1 like•405 views

Jaydeep Patel

Needs of Hadoop. Installation of Hadoop. Framework of Hadoop. Sample program of Hadop.

Software

HADOOP
DEVELOPED BY : JAYDEEP PATEL(13MCA63)
KULDEEP PATEL(13MCA64)

WHAT IS BIG DATA?
THE TERM BIG DATA STANDS FOR COLLECTION OF DATA SETS THAT ARE TOO
LARGE AND COMPLEX ,SO IT IS DIFFICULT TO CAPTURE , STORE , SEARCH AND
ANALYZE USING TRADITIONAL DATA PROCESSING APPLICATIONS.
 BIG DATA = SORTED DATA + UNSORTED DATA
 SORTED DATA
 UNSORTED DATA

CHARACTERISTICS OF BIG DATA
3VS (VOLUME, VARIETY AND VELOCITY) ARE DEFINING PROPERTIES OR
DIMENSIONS OF BIG DATA.
VOLUME REFERS TO THE AMOUNT OF DATA.
VARIETY REFERS TO THE NUMBER OF TYPES OF DATA.
VELOCITY REFERS TO THE SPEED OF DATA PROCESSING.

SERVER3
SERVER2
SERVER1
SERVER6
SERVER5
SERVER4

SO HADOOP IS..
• A PRODUCT OF APACHE SOFTWARE FOUNDATION.
• A SOFTWARE FRAMEWORK WRITTEN IN JAVA.
• IT SUPPORTS CROSS-PLATFORM.
• IT IS OPEN SOURCE.
HADOOP FRAMEWORK IS BUILT OF :
1. HADOOP COMMON
2. HDFS
3. HADOOP YARN
4. MAPREDUCE

HDFS
IT IS A SPECIALLY DESIGN FILE SYSTEM FOR STORING HUGE DATA SETS WITH
CLUSTER OF COMMODITY HARDWARE STREAMING ACCESS PLATFORM.
• CLUSTER
• COMMODITY HARDWARE
• STREAMING ACCESS PLATFORM
• SPECIALLY DESIGN FILE SYSTEM

5 SERVICES PROVIDED BY HDFS
• NAME NODE
• SECONDARY NAME NODE
• JOB TRACKER
• DATA NODE
• TASK TRACKER
Name node
Secondary name node
Job tracker
Data node
Task tracker

client Namenode
1 2 3
4
5 6
DN DN DN
DN DN DN
A.Text
B.Text
C.Text
Request for File A.Text
(1,2,6) Available

client
Map
Job Tracker
1 2 3
4
5 6
TT TT TT
TT TT TT
A.Text (1,2,6)
B.Text
C.Text
Logic

REQUIREMENT FOR INSTALLATION
o JAVA 1.6.X , PREFERABLY FROM SUN MUSTBE INSTALLED
o SSH MUST BE INSTALLED AND SSHD MUST BE RUNNING TO USE THE HADOOP SCRIPTS THAT
MANAGE REMOTE HADOOP DAEMONS
o INSTALL HADOOP-2.3.0 AND HADOOP-2.3-CONFIG-MASTER
o WWW.HADOOP.APACHE.ORG

SET PATH OF JAVA IN ENVIRONMENT VARIABLES

REPLACE YARN.CMD IN HADOOP 2.3.0.TAR.GZ IN BIN FOLDER

REPLACE WHOLE HADOOP FOLDER FROM CONFIG MASTER TO TAR.GZ FOLDER

SET HADOOP PATH IN ENVIRONMENT VARIABLES

FLOW CHART OF WORD COUNT JOB
FILE.TXT 200MB
Input File(File.txt)
Input Split Input Split Input Split Input Split
Mapper Mapper Mapper Mapper
64mb
64mb
64mb
8mb
Record
Reader
Record
Reader
Record
Reader
Record
Reader
(byteoffset , entireline)
(0 , hi how are you?)
(17 , how is your job?)
(how,1)(what,1)
(is,1)(your,1)
(how,1)(is,1)
(brother,1)(now,1)

INTERMEDIATE DATA
Mapper Mapper Mapper Mapper
(what,1)
(is,1) (your,1)
(how,1) (is,1)
(brother,1) (now,1)
(time,1) (is,1)
(the,1)
(how,1)(hi,1)
(is,1)(how,1)
(are,1)(your,1)
(you,1)(job,1)
(how,1)(what,1)
(is,1)(your,1)
(how,1)(is,1)
(sister,1)(family,1)
(what,1)
(is,1) (use,1)
(of,1) (hadoop,1)
Intermediate Data Shuffling Sorting
(how,1,1,1,1,1) Reducer(how,5)

COMPLETE FLOW
Input File(File.txt)
Input Split
Record
Reader
Mapper
Reducer
Record
writer
Output File

OUTPUT
(are,1)
(brother,1)
(family,1)
(hadoop,1)
(hi,1)
(how,4)
(is,6)
(job,1)
(now,1)
(of,1)
(sister,1)
(the,2)
(time,1)
(use,1)
(what,2)
(you,1)
(your,4)

What's hot

Big Data - Fast Machine Learning at Scale + CouchbaseFujio Turner

Big Data - Load, Index & Query the EZ way - HPCC SystemsFujio Turner

Hadoop Cluster Configuration and Data Loading - Module 2Rohit Agrawal

Hadoop bootcamp getting startedJWORKS powered by Ordina

An introduction to Big-Data processing applying hadoopAmir Sedighi

Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...CIARD Movement

Hadoop introductionRabindra Nath Nandi

Hadoop - Simple. Scalable.elliando dias

Big Data - In-Memory Index / Sub Second Query engine - Roxie - HPCC SystemsFujio Turner

Practical Hadoop using PigDavid Wellman

20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所Ryuji Tamagawa

Introduction to Hadoop Sudarshan Pant

20171012 found IT #9 PySparkの勘所Ryuji Tamagawa

Visualizing and Analyzing HDF-EOS5 and HDF5 data with NCLThe HDF-EOS Tools and Information Center

Beginner Apache Spark PresentationNidhin Pattaniyil

Spark - Alexis Seigneurin (English)Alexis Seigneurin

Hive vs Pig for HadoopSourceCodeReadingMitsuharu Hamba

PySparkの勘所（20170630 sapporo db analytics showcase） Ryuji Tamagawa

InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxData

20170210 sapporotechbar7Ryuji Tamagawa

What's hot (20)

Big Data - Fast Machine Learning at Scale + Couchbase

Big Data - Load, Index & Query the EZ way - HPCC Systems

Hadoop Cluster Configuration and Data Loading - Module 2

Hadoop bootcamp getting started

An introduction to Big-Data processing applying hadoop

Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...

Hadoop introduction

Hadoop - Simple. Scalable.

Big Data - In-Memory Index / Sub Second Query engine - Roxie - HPCC Systems

Practical Hadoop using Pig

20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所

Introduction to Hadoop

20171012 found IT #9 PySparkの勘所

Visualizing and Analyzing HDF-EOS5 and HDF5 data with NCL

Beginner Apache Spark Presentation

Spark - Alexis Seigneurin (English)

Hive vs Pig for HadoopSourceCodeReading

PySparkの勘所（20170630 sapporo db analytics showcase）

InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...

20170210 sapporotechbar7

Viewers also liked

CortanaSTIinnsbruck

The Bing Platform that Powers CortanaSavas Parastatidis

Japan - The land of rising sunAmod Tawade

Cortana Arun S Kurup

Cortana : A Microsoft Virtual Personal AssistantSushil Kumar Sharma

MICROSOFT CORTANAKANISHK

Introducing The Amazon EchoMicah Flores

Amazon EchoHarshit Shah

Virtual personal assistantShubham Bhalekar

Please meet Amazon Alexa and the Alexa Skills KitAmazon Web Services

(MBL301) Creating Voice Experiences Using Amazon AlexaAmazon Web Services

Viewers also liked (11)

Cortana

The Bing Platform that Powers Cortana

Japan - The land of rising sun

Cortana

Cortana : A Microsoft Virtual Personal Assistant

MICROSOFT CORTANA

Introducing The Amazon Echo

Amazon Echo

Virtual personal assistant

Please meet Amazon Alexa and the Alexa Skills Kit

(MBL301) Creating Voice Experiences Using Amazon Alexa

Similar to Hadoop MapReduce Word Count Project

Introduction of Big data, NoSQL & HadoopSavvycom Savvycom

Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014NoSQLmatters

Introduction to Hadoop AdministrationRamesh Pabba - seeking new projects

Pilot Hadoop Towards 2500 Nodes and Cluster RedundancyStuart Pook

20150704 benchmark and user experience in sahara weitingWei Ting Chen

Q4 2016 GeoTrellis PresentationRob Emanuele

Analyzing Big data in R and Scala using Apache Spark 17-7-19Ahmed Elsayed

Jump Start into Apache® Spark™ and DatabricksDatabricks

Aerospike AdTech Gets Hacked in Lower ManhattanAerospike

You Snooze You Lose or How to Win in Ad Tech?Aerospike, Inc.

Open Security Operations Center - OpenSOCSheetal Dolas

Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...Facultad de Informática UCM

[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...Andrew Liu

The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...Glenn K. Lockwood

Spark to DocumentDB connectorDenny Lee

Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit

Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Sumeet Singh

Hadoop 3.0 - Revolution or evolution?Uwe Printz

The Future of Hadoop: A deeper look at Apache SparkCloudera, Inc.

Similar to Hadoop MapReduce Word Count Project (20)

Introduction of Big data, NoSQL & Hadoop

Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014

Introduction to Hadoop Administration

Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy

20150704 benchmark and user experience in sahara weiting

Q4 2016 GeoTrellis Presentation

Analyzing Big data in R and Scala using Apache Spark 17-7-19

Jump Start into Apache® Spark™ and Databricks

Aerospike AdTech Gets Hacked in Lower Manhattan

You Snooze You Lose or How to Win in Ad Tech?

Open Security Operations Center - OpenSOC

Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...

[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...

The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...

Spark to DocumentDB connector

Spark Summit EU talk by Debasish Das and Pramod Narasimha

Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...

Hadoop 3.0 - Revolution or evolution?

The Future of Hadoop: A deeper look at Apache Spark

Recently uploaded

Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531

(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01

Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden

DNT_Corporate presentation know about usDynamic Netsoft

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.

why an Opensea Clone Script might be your perfect match.pdfjoe51371421

KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app

EY_Graph Database Powered SustainabilityNeo4j

Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ

What is Binary Language? Computer Number SystemsJheuzeDellosa

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110

Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08

Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh

What is Fashion PLM and Why Do You Need ItWave PLM

Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171

chapter--4-software-project-planning.pptkotipi9215

Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3

Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH

Recently uploaded (20)

Hand gesture recognition PROJECT PPT.pptx

(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...

Engage Usergroup 2024 - The Good The Bad_The Ugly

DNT_Corporate presentation know about us

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...

why an Opensea Clone Script might be your perfect match.pdf

KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx

EY_Graph Database Powered Sustainability

Cloud Management Software Platforms: OpenStack

What is Binary Language? Computer Number Systems

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...

Unit 1.1 Excite Part 1, class 9, cbse...

Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝

What is Fashion PLM and Why Do You Need It

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf

chapter--4-software-project-planning.ppt

Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data

Der Spagat zwischen BIAS und FAIRNESS (2024)

Hadoop MapReduce Word Count Project

1. HADOOP DEVELOPED BY : JAYDEEP PATEL(13MCA63) KULDEEP PATEL(13MCA64)

2. WHAT IS BIG DATA? THE TERM BIG DATA STANDS FOR COLLECTION OF DATA SETS THAT ARE TOO LARGE AND COMPLEX ,SO IT IS DIFFICULT TO CAPTURE , STORE , SEARCH AND ANALYZE USING TRADITIONAL DATA PROCESSING APPLICATIONS.  BIG DATA = SORTED DATA + UNSORTED DATA  SORTED DATA  UNSORTED DATA

3. CHARACTERISTICS OF BIG DATA 3VS (VOLUME, VARIETY AND VELOCITY) ARE DEFINING PROPERTIES OR DIMENSIONS OF BIG DATA. VOLUME REFERS TO THE AMOUNT OF DATA. VARIETY REFERS TO THE NUMBER OF TYPES OF DATA. VELOCITY REFERS TO THE SPEED OF DATA PROCESSING.

4. Continue…

5. SERVER3 SERVER2 SERVER1 SERVER6 SERVER5 SERVER4

6. SO HADOOP IS.. • A PRODUCT OF APACHE SOFTWARE FOUNDATION. • A SOFTWARE FRAMEWORK WRITTEN IN JAVA. • IT SUPPORTS CROSS-PLATFORM. • IT IS OPEN SOURCE. HADOOP FRAMEWORK IS BUILT OF : 1. HADOOP COMMON 2. HDFS 3. HADOOP YARN 4. MAPREDUCE

7. HDFS IT IS A SPECIALLY DESIGN FILE SYSTEM FOR STORING HUGE DATA SETS WITH CLUSTER OF COMMODITY HARDWARE STREAMING ACCESS PLATFORM. • CLUSTER • COMMODITY HARDWARE • STREAMING ACCESS PLATFORM • SPECIALLY DESIGN FILE SYSTEM

8. 5 SERVICES PROVIDED BY HDFS • NAME NODE • SECONDARY NAME NODE • JOB TRACKER • DATA NODE • TASK TRACKER Name node Secondary name node Job tracker Data node Task tracker

9. client Namenode 1 2 3 4 5 6 DN DN DN DN DN DN A.Text B.Text C.Text Request for File A.Text (1,2,6) Available

10. client Map Job Tracker 1 2 3 4 5 6 TT TT TT TT TT TT A.Text (1,2,6) B.Text C.Text Logic

11. INSTALLATION

12. REQUIREMENT FOR INSTALLATION o JAVA 1.6.X , PREFERABLY FROM SUN MUSTBE INSTALLED o SSH MUST BE INSTALLED AND SSHD MUST BE RUNNING TO USE THE HADOOP SCRIPTS THAT MANAGE REMOTE HADOOP DAEMONS o INSTALL HADOOP-2.3.0 AND HADOOP-2.3-CONFIG-MASTER o WWW.HADOOP.APACHE.ORG

13. INSTALL JAVA

14. SET PATH OF JAVA IN ENVIRONMENT VARIABLES

15. REPLACE YARN.CMD IN HADOOP 2.3.0.TAR.GZ IN BIN FOLDER

16.

17. REPLACE WHOLE HADOOP FOLDER FROM CONFIG MASTER TO TAR.GZ FOLDER

18. SET HADOOP PATH IN ENVIRONMENT VARIABLES

19. OPEN CMD AND RUN HADOOP

20.

21. FLOW CHART OF WORD COUNT JOB FILE.TXT 200MB Input File(File.txt) Input Split Input Split Input Split Input Split Mapper Mapper Mapper Mapper 64mb 64mb 64mb 8mb Record Reader Record Reader Record Reader Record Reader (byteoffset , entireline) (0 , hi how are you?) (17 , how is your job?) (how,1)(what,1) (is,1)(your,1) (how,1)(is,1) (brother,1)(now,1)

22. INTERMEDIATE DATA Mapper Mapper Mapper Mapper (what,1) (is,1) (your,1) (how,1) (is,1) (brother,1) (now,1) (time,1) (is,1) (the,1) (how,1)(hi,1) (is,1)(how,1) (are,1)(your,1) (you,1)(job,1) (how,1)(what,1) (is,1)(your,1) (how,1)(is,1) (sister,1)(family,1) (what,1) (is,1) (use,1) (of,1) (hadoop,1) Intermediate Data Shuffling Sorting (how,1,1,1,1,1) Reducer(how,5)

23. COMPLETE FLOW Input File(File.txt) Input Split Record Reader Mapper Reducer Record writer Output File

24. OUTPUT (are,1) (brother,1) (family,1) (hadoop,1) (hi,1) (how,4) (is,6) (job,1) (now,1) (of,1) (sister,1) (the,2) (time,1) (use,1) (what,2) (you,1) (your,4)

25. THANK YOU!!!

Hadoop MapReduce Word Count Project

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (11)

Similar to Hadoop MapReduce Word Count Project

Similar to Hadoop MapReduce Word Count Project (20)

Recently uploaded

Recently uploaded (20)

Hadoop MapReduce Word Count Project