SlideShare a Scribd company logo
1 of 25
HADOOP
DEVELOPED BY : JAYDEEP PATEL(13MCA63)
KULDEEP PATEL(13MCA64)
WHAT IS BIG DATA?
THE TERM BIG DATA STANDS FOR COLLECTION OF DATA SETS THAT ARE TOO
LARGE AND COMPLEX ,SO IT IS DIFFICULT TO CAPTURE , STORE , SEARCH AND
ANALYZE USING TRADITIONAL DATA PROCESSING APPLICATIONS.
 BIG DATA = SORTED DATA + UNSORTED DATA
 SORTED DATA
 UNSORTED DATA
CHARACTERISTICS OF BIG DATA
3VS (VOLUME, VARIETY AND VELOCITY) ARE DEFINING PROPERTIES OR
DIMENSIONS OF BIG DATA.
VOLUME REFERS TO THE AMOUNT OF DATA.
VARIETY REFERS TO THE NUMBER OF TYPES OF DATA.
VELOCITY REFERS TO THE SPEED OF DATA PROCESSING.
Continue…
SERVER3
SERVER2
SERVER1
SERVER6
SERVER5
SERVER4
SO HADOOP IS..
• A PRODUCT OF APACHE SOFTWARE FOUNDATION.
• A SOFTWARE FRAMEWORK WRITTEN IN JAVA.
• IT SUPPORTS CROSS-PLATFORM.
• IT IS OPEN SOURCE.
HADOOP FRAMEWORK IS BUILT OF :
1. HADOOP COMMON
2. HDFS
3. HADOOP YARN
4. MAPREDUCE
HDFS
IT IS A SPECIALLY DESIGN FILE SYSTEM FOR STORING HUGE DATA SETS WITH
CLUSTER OF COMMODITY HARDWARE STREAMING ACCESS PLATFORM.
• CLUSTER
• COMMODITY HARDWARE
• STREAMING ACCESS PLATFORM
• SPECIALLY DESIGN FILE SYSTEM
5 SERVICES PROVIDED BY HDFS
• NAME NODE
• SECONDARY NAME NODE
• JOB TRACKER
• DATA NODE
• TASK TRACKER
Name node
Secondary name node
Job tracker
Data node
Task tracker
client Namenode
1 2 3
4
5 6
DN DN DN
DN DN DN
A.Text
B.Text
C.Text
Request for File A.Text
(1,2,6) Available
client
Map
Job Tracker
1 2 3
4
5 6
TT TT TT
TT TT TT
A.Text (1,2,6)
B.Text
C.Text
Logic
INSTALLATION
REQUIREMENT FOR INSTALLATION
o JAVA 1.6.X , PREFERABLY FROM SUN MUSTBE INSTALLED
o SSH MUST BE INSTALLED AND SSHD MUST BE RUNNING TO USE THE HADOOP SCRIPTS THAT
MANAGE REMOTE HADOOP DAEMONS
o INSTALL HADOOP-2.3.0 AND HADOOP-2.3-CONFIG-MASTER
o WWW.HADOOP.APACHE.ORG
INSTALL JAVA
SET PATH OF JAVA IN ENVIRONMENT VARIABLES
REPLACE YARN.CMD IN HADOOP 2.3.0.TAR.GZ IN BIN FOLDER
REPLACE WHOLE HADOOP FOLDER FROM CONFIG MASTER TO TAR.GZ FOLDER
SET HADOOP PATH IN ENVIRONMENT VARIABLES
OPEN CMD AND RUN HADOOP
FLOW CHART OF WORD COUNT JOB
FILE.TXT 200MB
Input File(File.txt)
Input Split Input Split Input Split Input Split
Mapper Mapper Mapper Mapper
64mb
64mb
64mb
8mb
Record
Reader
Record
Reader
Record
Reader
Record
Reader
(byteoffset , entireline)
(0 , hi how are you?)
(17 , how is your job?)
(how,1)(what,1)
(is,1)(your,1)
(how,1)(is,1)
(brother,1)(now,1)
INTERMEDIATE DATA
Mapper Mapper Mapper Mapper
(what,1)
(is,1) (your,1)
(how,1) (is,1)
(brother,1) (now,1)
(time,1) (is,1)
(the,1)
(how,1)(hi,1)
(is,1)(how,1)
(are,1)(your,1)
(you,1)(job,1)
(how,1)(what,1)
(is,1)(your,1)
(how,1)(is,1)
(sister,1)(family,1)
(what,1)
(is,1) (use,1)
(of,1) (hadoop,1)
Intermediate Data Shuffling Sorting
(how,1,1,1,1,1) Reducer(how,5)
COMPLETE FLOW
Input File(File.txt)
Input Split
Record
Reader
Mapper
Reducer
Record
writer
Output File
OUTPUT
(are,1)
(brother,1)
(family,1)
(hadoop,1)
(hi,1)
(how,4)
(is,6)
(job,1)
(now,1)
(of,1)
(sister,1)
(the,2)
(time,1)
(use,1)
(what,2)
(you,1)
(your,4)
THANK YOU!!!

More Related Content

What's hot

Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.
elliando dias
 
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
Ryuji Tamagawa
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReading
Mitsuharu Hamba
 

What's hot (20)

Big Data - Fast Machine Learning at Scale + Couchbase
Big Data - Fast Machine Learning at Scale + CouchbaseBig Data - Fast Machine Learning at Scale + Couchbase
Big Data - Fast Machine Learning at Scale + Couchbase
 
Big Data - Load, Index & Query the EZ way - HPCC Systems
Big Data - Load, Index & Query the EZ way - HPCC SystemsBig Data - Load, Index & Query the EZ way - HPCC Systems
Big Data - Load, Index & Query the EZ way - HPCC Systems
 
Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2
 
Hadoop bootcamp getting started
Hadoop bootcamp getting startedHadoop bootcamp getting started
Hadoop bootcamp getting started
 
An introduction to Big-Data processing applying hadoop
An introduction to Big-Data processing applying hadoopAn introduction to Big-Data processing applying hadoop
An introduction to Big-Data processing applying hadoop
 
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.
 
Big Data - In-Memory Index / Sub Second Query engine - Roxie - HPCC Systems
Big Data - In-Memory Index / Sub Second Query engine - Roxie - HPCC SystemsBig Data - In-Memory Index / Sub Second Query engine - Roxie - HPCC Systems
Big Data - In-Memory Index / Sub Second Query engine - Roxie - HPCC Systems
 
Practical Hadoop using Pig
Practical Hadoop using PigPractical Hadoop using Pig
Practical Hadoop using Pig
 
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
 
Introduction to Hadoop
Introduction to Hadoop Introduction to Hadoop
Introduction to Hadoop
 
20171012 found IT #9 PySparkの勘所
20171012 found  IT #9 PySparkの勘所20171012 found  IT #9 PySparkの勘所
20171012 found IT #9 PySparkの勘所
 
Visualizing and Analyzing HDF-EOS5 and HDF5 data with NCL
Visualizing and Analyzing HDF-EOS5 and HDF5 data with NCLVisualizing and Analyzing HDF-EOS5 and HDF5 data with NCL
Visualizing and Analyzing HDF-EOS5 and HDF5 data with NCL
 
Beginner Apache Spark Presentation
Beginner Apache Spark PresentationBeginner Apache Spark Presentation
Beginner Apache Spark Presentation
 
Spark - Alexis Seigneurin (English)
Spark - Alexis Seigneurin (English)Spark - Alexis Seigneurin (English)
Spark - Alexis Seigneurin (English)
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReading
 
PySparkの勘所(20170630 sapporo db analytics showcase)
PySparkの勘所(20170630 sapporo db analytics showcase) PySparkの勘所(20170630 sapporo db analytics showcase)
PySparkの勘所(20170630 sapporo db analytics showcase)
 
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
 
20170210 sapporotechbar7
20170210 sapporotechbar720170210 sapporotechbar7
20170210 sapporotechbar7
 

Viewers also liked

Introducing The Amazon Echo
Introducing The Amazon EchoIntroducing The Amazon Echo
Introducing The Amazon Echo
Micah Flores
 

Viewers also liked (11)

Cortana
CortanaCortana
Cortana
 
The Bing Platform that Powers Cortana
The Bing Platform that Powers CortanaThe Bing Platform that Powers Cortana
The Bing Platform that Powers Cortana
 
Japan - The land of rising sun
Japan - The land of rising sunJapan - The land of rising sun
Japan - The land of rising sun
 
Cortana
Cortana Cortana
Cortana
 
Cortana : A Microsoft Virtual Personal Assistant
Cortana : A Microsoft Virtual Personal AssistantCortana : A Microsoft Virtual Personal Assistant
Cortana : A Microsoft Virtual Personal Assistant
 
MICROSOFT CORTANA
MICROSOFT  CORTANAMICROSOFT  CORTANA
MICROSOFT CORTANA
 
Introducing The Amazon Echo
Introducing The Amazon EchoIntroducing The Amazon Echo
Introducing The Amazon Echo
 
Amazon Echo
Amazon EchoAmazon Echo
Amazon Echo
 
Virtual personal assistant
Virtual personal assistantVirtual personal assistant
Virtual personal assistant
 
Please meet Amazon Alexa and the Alexa Skills Kit
Please meet Amazon Alexa and the Alexa Skills KitPlease meet Amazon Alexa and the Alexa Skills Kit
Please meet Amazon Alexa and the Alexa Skills Kit
 
(MBL301) Creating Voice Experiences Using Amazon Alexa
(MBL301) Creating Voice Experiences Using Amazon Alexa(MBL301) Creating Voice Experiences Using Amazon Alexa
(MBL301) Creating Voice Experiences Using Amazon Alexa
 

Similar to Hadoop

20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
Wei Ting Chen
 

Similar to Hadoop (20)

Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & Hadoop
 
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Pilot Hadoop Towards 2500 Nodes and Cluster RedundancyPilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
 
Q4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationQ4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis Presentation
 
Analyzing Big data in R and Scala using Apache Spark 17-7-19
Analyzing Big data in R and Scala using Apache Spark  17-7-19Analyzing Big data in R and Scala using Apache Spark  17-7-19
Analyzing Big data in R and Scala using Apache Spark 17-7-19
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
 
Aerospike AdTech Gets Hacked in Lower Manhattan
Aerospike AdTech Gets Hacked in Lower ManhattanAerospike AdTech Gets Hacked in Lower Manhattan
Aerospike AdTech Gets Hacked in Lower Manhattan
 
You Snooze You Lose or How to Win in Ad Tech?
You Snooze You Lose or How to Win in Ad Tech?You Snooze You Lose or How to Win in Ad Tech?
You Snooze You Lose or How to Win in Ad Tech?
 
Open Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOCOpen Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOC
 
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...
Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...
 
[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...
[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...
[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure D...
 
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
 
Spark to DocumentDB connector
Spark to DocumentDB connectorSpark to DocumentDB connector
Spark to DocumentDB connector
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod Narasimha
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache Spark
 

Recently uploaded

The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 

Recently uploaded (20)

A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 

Hadoop

  • 1. HADOOP DEVELOPED BY : JAYDEEP PATEL(13MCA63) KULDEEP PATEL(13MCA64)
  • 2. WHAT IS BIG DATA? THE TERM BIG DATA STANDS FOR COLLECTION OF DATA SETS THAT ARE TOO LARGE AND COMPLEX ,SO IT IS DIFFICULT TO CAPTURE , STORE , SEARCH AND ANALYZE USING TRADITIONAL DATA PROCESSING APPLICATIONS.  BIG DATA = SORTED DATA + UNSORTED DATA  SORTED DATA  UNSORTED DATA
  • 3. CHARACTERISTICS OF BIG DATA 3VS (VOLUME, VARIETY AND VELOCITY) ARE DEFINING PROPERTIES OR DIMENSIONS OF BIG DATA. VOLUME REFERS TO THE AMOUNT OF DATA. VARIETY REFERS TO THE NUMBER OF TYPES OF DATA. VELOCITY REFERS TO THE SPEED OF DATA PROCESSING.
  • 6. SO HADOOP IS.. • A PRODUCT OF APACHE SOFTWARE FOUNDATION. • A SOFTWARE FRAMEWORK WRITTEN IN JAVA. • IT SUPPORTS CROSS-PLATFORM. • IT IS OPEN SOURCE. HADOOP FRAMEWORK IS BUILT OF : 1. HADOOP COMMON 2. HDFS 3. HADOOP YARN 4. MAPREDUCE
  • 7. HDFS IT IS A SPECIALLY DESIGN FILE SYSTEM FOR STORING HUGE DATA SETS WITH CLUSTER OF COMMODITY HARDWARE STREAMING ACCESS PLATFORM. • CLUSTER • COMMODITY HARDWARE • STREAMING ACCESS PLATFORM • SPECIALLY DESIGN FILE SYSTEM
  • 8. 5 SERVICES PROVIDED BY HDFS • NAME NODE • SECONDARY NAME NODE • JOB TRACKER • DATA NODE • TASK TRACKER Name node Secondary name node Job tracker Data node Task tracker
  • 9. client Namenode 1 2 3 4 5 6 DN DN DN DN DN DN A.Text B.Text C.Text Request for File A.Text (1,2,6) Available
  • 10. client Map Job Tracker 1 2 3 4 5 6 TT TT TT TT TT TT A.Text (1,2,6) B.Text C.Text Logic
  • 12. REQUIREMENT FOR INSTALLATION o JAVA 1.6.X , PREFERABLY FROM SUN MUSTBE INSTALLED o SSH MUST BE INSTALLED AND SSHD MUST BE RUNNING TO USE THE HADOOP SCRIPTS THAT MANAGE REMOTE HADOOP DAEMONS o INSTALL HADOOP-2.3.0 AND HADOOP-2.3-CONFIG-MASTER o WWW.HADOOP.APACHE.ORG
  • 14. SET PATH OF JAVA IN ENVIRONMENT VARIABLES
  • 15. REPLACE YARN.CMD IN HADOOP 2.3.0.TAR.GZ IN BIN FOLDER
  • 16.
  • 17. REPLACE WHOLE HADOOP FOLDER FROM CONFIG MASTER TO TAR.GZ FOLDER
  • 18. SET HADOOP PATH IN ENVIRONMENT VARIABLES
  • 19. OPEN CMD AND RUN HADOOP
  • 20.
  • 21. FLOW CHART OF WORD COUNT JOB FILE.TXT 200MB Input File(File.txt) Input Split Input Split Input Split Input Split Mapper Mapper Mapper Mapper 64mb 64mb 64mb 8mb Record Reader Record Reader Record Reader Record Reader (byteoffset , entireline) (0 , hi how are you?) (17 , how is your job?) (how,1)(what,1) (is,1)(your,1) (how,1)(is,1) (brother,1)(now,1)
  • 22. INTERMEDIATE DATA Mapper Mapper Mapper Mapper (what,1) (is,1) (your,1) (how,1) (is,1) (brother,1) (now,1) (time,1) (is,1) (the,1) (how,1)(hi,1) (is,1)(how,1) (are,1)(your,1) (you,1)(job,1) (how,1)(what,1) (is,1)(your,1) (how,1)(is,1) (sister,1)(family,1) (what,1) (is,1) (use,1) (of,1) (hadoop,1) Intermediate Data Shuffling Sorting (how,1,1,1,1,1) Reducer(how,5)
  • 23. COMPLETE FLOW Input File(File.txt) Input Split Record Reader Mapper Reducer Record writer Output File