SlideShare a Scribd company logo
by Project Guide
Deep P Mehta(1498016) Manish R Solanki
Topics To Be Covered
 Various Type Of Computing
 What is Hadoop?
 Disadvantage Of Distributed System &How Hadoop
Overcomes it?
 Hadoop Architecture
 HDFS(Hadoop Distributed File System)
 Map Reduce
 Example Program of Map Reduce(Word Count)
Today Scenario Of Data
• The New York Stock Exchange generates about 4−5
terabytes of data per day.
• Facebook hosts more than 240 billion photos, growing
at 7 petabytes per month.
• Ancestry.com, the genealogy site, stores around 10
petabytes of data.
• The Internet Archive stores around 18.5 petabytes of
data.
Distributed Computing
•A distributed Computing is a
model in which components
located on networked
computers communicate and
coordinate their actions by
passing messages.
Parallel Computing
•Parallel computing is a type of
computation in which many
calculations or the execution of
processes are carried out
simultaneously
Grid Computing
•Grid computing combines
computers to reach a common
goal, to solve a single task at a time
Volunteer Computing
•Volunteer computing is a
type of distributed computing
in which computer donate
their computing resources
(such as processing power and
storage) to "projects".
Cloud Computing
•Cloud computing is a type of
Internet -based computing that
provides shared computer processing
resources and data to computers and
other devices on demand
What Cause The Problem in
Distributed System?
 The Transfer speed is around 100Mbps
 Consider a disk is of 1 Terabyte
 Time to read a disk =10000 seconds around 3 hours
 Increase in time may not be helpful because
 Network Bandwith problem
 Processor limit have been reached
Issues Involved in Distributed
System
 Hardware Problems
As we start using hardware the chances of
failure are very high
 Combing data after analysis
while combining data after analysis from one
disk with other disk it cause failure or data loss
Hadoop
• Hadoop is an software
framework for distributed
storage and distributed
processing
•It Is Built from commodity
hardware.
• Hadoop is designed with a
fundamental assumption of
• Hardware failure
•Largedata Processing
Hadoop
 Doug Cutting and Michael J. Cafarella developed
Hadoop in year 2005
Hadoop Approach To Distributed System
 Hadoop provides a simplified programming model,
which allows users to quickly write and test distributed
system and its efficient automatic distribution of data
and work across machines and in turn utilizing the
underlying parallelism of cpu cores
Advantages Of Hadoop
• High scalability and
availability
• Use commodity (cheap!)
hardware with little
redundancy
• Fault-tolerance
• Move computation rather
than data
Move Computation Rather then Data
Hadoop Architecture
HDFS Architecture
HDFS Architecture
 NameNode
It run as master server
Application
 Manages the file system namespace.
 Regulates client’s access to files.
 It also executes file system operations such as
renaming, closing, and opening files and directories.
HDFS Architecture
 DataNode
These node runs as slave&manage data storage of
their system
 Application
 Datanodes perform read-write operations on the file
systems, as per client request.
 They also perform operations such as block creation,
deletion, and replication according to the instructions
of the namenode.
HDFS Architecture
 Block
The minimum amount of data that HDFS can read
or write is called a Block. The default block size is
64MB, but it can be increased as per the need to
change in HDFS configuration.
Map Reduce Architecture
Map Reduce Architecture
 Map Function
In map phase processing by extracting the input data from
the splits. For each record parsed by the “InputFormat”, it
invoke the user provided “map” function, which emits a
number of key/value pair in the memory buffer
Example
input is” bhaghubhai”
The output will be
b- 2 g-1
h-3 u-1
a-2 i-1
Map Reduce Architecture
 Reduce Function
TaskTracker will read the region files remotely. It sorts the
key/value pairs and for each key, it invoke the “reduce”
function, which collects the key/aggregatedValue into the
output file (one per reducer node).
Example
There are two input split both contain “bhaghubhai
So it will merge it output will be
b- 4 g-2
h-6 u-2
a-4 i-2
Word Count Example Program
Mapper class
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizeritr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
Word Count Example Program
Reducer class
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritableval : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
Word Count Example Program
Driver class
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
hadoop

More Related Content

What's hot

Map Reduce
Map ReduceMap Reduce
Map Reduce
Prashant Gupta
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
Stanley Wang
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
Abhinav Tyagi
 
Birch Algorithm With Solved Example
Birch Algorithm With Solved ExampleBirch Algorithm With Solved Example
Birch Algorithm With Solved Example
kailash shaw
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Dr. C.V. Suresh Babu
 
Big Data Analytics with Spark
Big Data Analytics with SparkBig Data Analytics with Spark
Big Data Analytics with Spark
Mohammed Guller
 
Java Persistence API (JPA) Step By Step
Java Persistence API (JPA) Step By StepJava Persistence API (JPA) Step By Step
Java Persistence API (JPA) Step By StepGuo Albert
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
odsc
 
Distributed System ppt
Distributed System pptDistributed System ppt
Mongo db workshop # 01
Mongo db workshop # 01Mongo db workshop # 01
Mongo db workshop # 01
FarhatParveen10
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
Vigen Sahakyan
 
Handling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random UndersamplingHandling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random Undersampling
IRJET Journal
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
Apache Apex
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
Dan Gunter
 
Graph kernels
Graph kernelsGraph kernels
Graph kernels
Luc Brun
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
rebeccatho
 
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
Taegyun Jeon
 
MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)
MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)
MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)
University of California, San Diego
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
Rutvik Bapat
 

What's hot (20)

Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
Birch Algorithm With Solved Example
Birch Algorithm With Solved ExampleBirch Algorithm With Solved Example
Birch Algorithm With Solved Example
 
Birch
BirchBirch
Birch
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Big Data Analytics with Spark
Big Data Analytics with SparkBig Data Analytics with Spark
Big Data Analytics with Spark
 
Java Persistence API (JPA) Step By Step
Java Persistence API (JPA) Step By StepJava Persistence API (JPA) Step By Step
Java Persistence API (JPA) Step By Step
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
 
Distributed System ppt
Distributed System pptDistributed System ppt
Distributed System ppt
 
Mongo db workshop # 01
Mongo db workshop # 01Mongo db workshop # 01
Mongo db workshop # 01
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Handling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random UndersamplingHandling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random Undersampling
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
Graph kernels
Graph kernelsGraph kernels
Graph kernels
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
 
MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)
MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)
MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 

Similar to hadoop

Hadoop and Mapreduce Introduction
Hadoop and Mapreduce IntroductionHadoop and Mapreduce Introduction
Hadoop and Mapreduce Introduction
rajsandhu1989
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to Hadoop
GERARDO BARBERENA
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewNisanth Simon
 
Hadoop
HadoopHadoop
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabad
sreehari orienit
 
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overview
harithakannan
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big picture
J S Jodha
 
Hadoop online-training
Hadoop online-trainingHadoop online-training
Hadoop online-training
Geohedrick
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
Aamir Ameen
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
Varun Narang
 
Cppt Hadoop
Cppt HadoopCppt Hadoop
Cppt Hadoop
chunkypandey12
 
Cppt
CpptCppt
Cppt
CpptCppt
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.ppt
Sathish24111
 
Learn what is Hadoop-and-BigData
Learn  what is Hadoop-and-BigDataLearn  what is Hadoop-and-BigData
Learn what is Hadoop-and-BigData
Thanusha154
 

Similar to hadoop (20)

Anju
AnjuAnju
Anju
 
Hadoop and Mapreduce Introduction
Hadoop and Mapreduce IntroductionHadoop and Mapreduce Introduction
Hadoop and Mapreduce Introduction
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
 
Hadoop
HadoopHadoop
Hadoop
 
hadoop
hadoophadoop
hadoop
 
hadoop
hadoophadoop
hadoop
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabad
 
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overview
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big picture
 
Hadoop online-training
Hadoop online-trainingHadoop online-training
Hadoop online-training
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
Cppt Hadoop
Cppt HadoopCppt Hadoop
Cppt Hadoop
 
Cppt
CpptCppt
Cppt
 
Cppt
CpptCppt
Cppt
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.ppt
 
Learn what is Hadoop-and-BigData
Learn  what is Hadoop-and-BigDataLearn  what is Hadoop-and-BigData
Learn what is Hadoop-and-BigData
 

Recently uploaded

Memory Rental Store - The Chase (Storyboard)
Memory Rental Store - The Chase (Storyboard)Memory Rental Store - The Chase (Storyboard)
Memory Rental Store - The Chase (Storyboard)
SuryaKalyan3
 
acting board rough title here lolaaaaaaa
acting board rough title here lolaaaaaaaacting board rough title here lolaaaaaaa
acting board rough title here lolaaaaaaa
angelicafronda7
 
The Legacy of Breton In A New Age by Master Terrance Lindall
The Legacy of Breton In A New Age by Master Terrance LindallThe Legacy of Breton In A New Age by Master Terrance Lindall
The Legacy of Breton In A New Age by Master Terrance Lindall
BBaez1
 
The Last Polymath: Muntadher Saleh‎‎‎‎‎‎‎‎‎‎‎‎
The Last Polymath: Muntadher Saleh‎‎‎‎‎‎‎‎‎‎‎‎The Last Polymath: Muntadher Saleh‎‎‎‎‎‎‎‎‎‎‎‎
The Last Polymath: Muntadher Saleh‎‎‎‎‎‎‎‎‎‎‎‎
iraqartsandculture
 
一比一原版(DU毕业证)迪肯大学毕业证成绩单
一比一原版(DU毕业证)迪肯大学毕业证成绩单一比一原版(DU毕业证)迪肯大学毕业证成绩单
一比一原版(DU毕业证)迪肯大学毕业证成绩单
zvaywau
 
CLASS XII- HISTORY-THEME 4-Thinkers, Bes
CLASS XII- HISTORY-THEME 4-Thinkers, BesCLASS XII- HISTORY-THEME 4-Thinkers, Bes
CLASS XII- HISTORY-THEME 4-Thinkers, Bes
aditiyad2020
 
Inter-Dimensional Girl Boards Segment (Act 3)
Inter-Dimensional Girl Boards Segment (Act 3)Inter-Dimensional Girl Boards Segment (Act 3)
Inter-Dimensional Girl Boards Segment (Act 3)
CristianMestre
 
Memory Rental Store - The Ending(Storyboard)
Memory Rental Store - The Ending(Storyboard)Memory Rental Store - The Ending(Storyboard)
Memory Rental Store - The Ending(Storyboard)
SuryaKalyan3
 
thGAP - BAbyss in Moderno!! Transgenic Human Germline Alternatives Project
thGAP - BAbyss in Moderno!!  Transgenic Human Germline Alternatives ProjectthGAP - BAbyss in Moderno!!  Transgenic Human Germline Alternatives Project
thGAP - BAbyss in Moderno!! Transgenic Human Germline Alternatives Project
Marc Dusseiller Dusjagr
 
Caffeinated Pitch Bible- developed by Claire Wilson
Caffeinated Pitch Bible- developed by Claire WilsonCaffeinated Pitch Bible- developed by Claire Wilson
Caffeinated Pitch Bible- developed by Claire Wilson
ClaireWilson398082
 
IrishWritersCtrsPersonalEssaysMay29.pptx
IrishWritersCtrsPersonalEssaysMay29.pptxIrishWritersCtrsPersonalEssaysMay29.pptx
IrishWritersCtrsPersonalEssaysMay29.pptx
Aine Greaney Ellrott
 
Sundabet | Slot gacor dan terpercaya mudah menang
Sundabet | Slot gacor dan terpercaya mudah menangSundabet | Slot gacor dan terpercaya mudah menang
Sundabet | Slot gacor dan terpercaya mudah menang
Sundabet | Situs Slot gacor dan terpercaya
 
一比一原版(qut毕业证)昆士兰科技大学毕业证如何办理
一比一原版(qut毕业证)昆士兰科技大学毕业证如何办理一比一原版(qut毕业证)昆士兰科技大学毕业证如何办理
一比一原版(qut毕业证)昆士兰科技大学毕业证如何办理
taqyed
 
ART FORMS OF KERALA: TRADITIONAL AND OTHERS
ART FORMS OF KERALA: TRADITIONAL AND OTHERSART FORMS OF KERALA: TRADITIONAL AND OTHERS
ART FORMS OF KERALA: TRADITIONAL AND OTHERS
Sandhya J.Nair
 
Codes n Conventionss copy (2).pptx new new
Codes n Conventionss copy (2).pptx new newCodes n Conventionss copy (2).pptx new new
Codes n Conventionss copy (2).pptx new new
ZackSpencer3
 
2137ad - Characters that live in Merindol and are at the center of main stories
2137ad - Characters that live in Merindol and are at the center of main stories2137ad - Characters that live in Merindol and are at the center of main stories
2137ad - Characters that live in Merindol and are at the center of main stories
luforfor
 
一比一原版UPenn毕业证宾夕法尼亚大学毕业证成绩单如何办理
一比一原版UPenn毕业证宾夕法尼亚大学毕业证成绩单如何办理一比一原版UPenn毕业证宾夕法尼亚大学毕业证成绩单如何办理
一比一原版UPenn毕业证宾夕法尼亚大学毕业证成绩单如何办理
beduwt
 
一比一原版(GU毕业证)格里菲斯大学毕业证成绩单
一比一原版(GU毕业证)格里菲斯大学毕业证成绩单一比一原版(GU毕业证)格里菲斯大学毕业证成绩单
一比一原版(GU毕业证)格里菲斯大学毕业证成绩单
zvaywau
 
2137ad Merindol Colony Interiors where refugee try to build a seemengly norm...
2137ad  Merindol Colony Interiors where refugee try to build a seemengly norm...2137ad  Merindol Colony Interiors where refugee try to build a seemengly norm...
2137ad Merindol Colony Interiors where refugee try to build a seemengly norm...
luforfor
 

Recently uploaded (20)

Memory Rental Store - The Chase (Storyboard)
Memory Rental Store - The Chase (Storyboard)Memory Rental Store - The Chase (Storyboard)
Memory Rental Store - The Chase (Storyboard)
 
acting board rough title here lolaaaaaaa
acting board rough title here lolaaaaaaaacting board rough title here lolaaaaaaa
acting board rough title here lolaaaaaaa
 
The Legacy of Breton In A New Age by Master Terrance Lindall
The Legacy of Breton In A New Age by Master Terrance LindallThe Legacy of Breton In A New Age by Master Terrance Lindall
The Legacy of Breton In A New Age by Master Terrance Lindall
 
The Last Polymath: Muntadher Saleh‎‎‎‎‎‎‎‎‎‎‎‎
The Last Polymath: Muntadher Saleh‎‎‎‎‎‎‎‎‎‎‎‎The Last Polymath: Muntadher Saleh‎‎‎‎‎‎‎‎‎‎‎‎
The Last Polymath: Muntadher Saleh‎‎‎‎‎‎‎‎‎‎‎‎
 
一比一原版(DU毕业证)迪肯大学毕业证成绩单
一比一原版(DU毕业证)迪肯大学毕业证成绩单一比一原版(DU毕业证)迪肯大学毕业证成绩单
一比一原版(DU毕业证)迪肯大学毕业证成绩单
 
CLASS XII- HISTORY-THEME 4-Thinkers, Bes
CLASS XII- HISTORY-THEME 4-Thinkers, BesCLASS XII- HISTORY-THEME 4-Thinkers, Bes
CLASS XII- HISTORY-THEME 4-Thinkers, Bes
 
Inter-Dimensional Girl Boards Segment (Act 3)
Inter-Dimensional Girl Boards Segment (Act 3)Inter-Dimensional Girl Boards Segment (Act 3)
Inter-Dimensional Girl Boards Segment (Act 3)
 
Memory Rental Store - The Ending(Storyboard)
Memory Rental Store - The Ending(Storyboard)Memory Rental Store - The Ending(Storyboard)
Memory Rental Store - The Ending(Storyboard)
 
thGAP - BAbyss in Moderno!! Transgenic Human Germline Alternatives Project
thGAP - BAbyss in Moderno!!  Transgenic Human Germline Alternatives ProjectthGAP - BAbyss in Moderno!!  Transgenic Human Germline Alternatives Project
thGAP - BAbyss in Moderno!! Transgenic Human Germline Alternatives Project
 
Caffeinated Pitch Bible- developed by Claire Wilson
Caffeinated Pitch Bible- developed by Claire WilsonCaffeinated Pitch Bible- developed by Claire Wilson
Caffeinated Pitch Bible- developed by Claire Wilson
 
IrishWritersCtrsPersonalEssaysMay29.pptx
IrishWritersCtrsPersonalEssaysMay29.pptxIrishWritersCtrsPersonalEssaysMay29.pptx
IrishWritersCtrsPersonalEssaysMay29.pptx
 
Sundabet | Slot gacor dan terpercaya mudah menang
Sundabet | Slot gacor dan terpercaya mudah menangSundabet | Slot gacor dan terpercaya mudah menang
Sundabet | Slot gacor dan terpercaya mudah menang
 
一比一原版(qut毕业证)昆士兰科技大学毕业证如何办理
一比一原版(qut毕业证)昆士兰科技大学毕业证如何办理一比一原版(qut毕业证)昆士兰科技大学毕业证如何办理
一比一原版(qut毕业证)昆士兰科技大学毕业证如何办理
 
ART FORMS OF KERALA: TRADITIONAL AND OTHERS
ART FORMS OF KERALA: TRADITIONAL AND OTHERSART FORMS OF KERALA: TRADITIONAL AND OTHERS
ART FORMS OF KERALA: TRADITIONAL AND OTHERS
 
Codes n Conventionss copy (2).pptx new new
Codes n Conventionss copy (2).pptx new newCodes n Conventionss copy (2).pptx new new
Codes n Conventionss copy (2).pptx new new
 
2137ad - Characters that live in Merindol and are at the center of main stories
2137ad - Characters that live in Merindol and are at the center of main stories2137ad - Characters that live in Merindol and are at the center of main stories
2137ad - Characters that live in Merindol and are at the center of main stories
 
一比一原版UPenn毕业证宾夕法尼亚大学毕业证成绩单如何办理
一比一原版UPenn毕业证宾夕法尼亚大学毕业证成绩单如何办理一比一原版UPenn毕业证宾夕法尼亚大学毕业证成绩单如何办理
一比一原版UPenn毕业证宾夕法尼亚大学毕业证成绩单如何办理
 
European Cybersecurity Skills Framework Role Profiles.pdf
European Cybersecurity Skills Framework Role Profiles.pdfEuropean Cybersecurity Skills Framework Role Profiles.pdf
European Cybersecurity Skills Framework Role Profiles.pdf
 
一比一原版(GU毕业证)格里菲斯大学毕业证成绩单
一比一原版(GU毕业证)格里菲斯大学毕业证成绩单一比一原版(GU毕业证)格里菲斯大学毕业证成绩单
一比一原版(GU毕业证)格里菲斯大学毕业证成绩单
 
2137ad Merindol Colony Interiors where refugee try to build a seemengly norm...
2137ad  Merindol Colony Interiors where refugee try to build a seemengly norm...2137ad  Merindol Colony Interiors where refugee try to build a seemengly norm...
2137ad Merindol Colony Interiors where refugee try to build a seemengly norm...
 

hadoop

  • 1. by Project Guide Deep P Mehta(1498016) Manish R Solanki
  • 2. Topics To Be Covered  Various Type Of Computing  What is Hadoop?  Disadvantage Of Distributed System &How Hadoop Overcomes it?  Hadoop Architecture  HDFS(Hadoop Distributed File System)  Map Reduce  Example Program of Map Reduce(Word Count)
  • 3. Today Scenario Of Data • The New York Stock Exchange generates about 4−5 terabytes of data per day. • Facebook hosts more than 240 billion photos, growing at 7 petabytes per month. • Ancestry.com, the genealogy site, stores around 10 petabytes of data. • The Internet Archive stores around 18.5 petabytes of data.
  • 4. Distributed Computing •A distributed Computing is a model in which components located on networked computers communicate and coordinate their actions by passing messages.
  • 5. Parallel Computing •Parallel computing is a type of computation in which many calculations or the execution of processes are carried out simultaneously
  • 6. Grid Computing •Grid computing combines computers to reach a common goal, to solve a single task at a time
  • 7. Volunteer Computing •Volunteer computing is a type of distributed computing in which computer donate their computing resources (such as processing power and storage) to "projects".
  • 8. Cloud Computing •Cloud computing is a type of Internet -based computing that provides shared computer processing resources and data to computers and other devices on demand
  • 9.
  • 10. What Cause The Problem in Distributed System?  The Transfer speed is around 100Mbps  Consider a disk is of 1 Terabyte  Time to read a disk =10000 seconds around 3 hours  Increase in time may not be helpful because  Network Bandwith problem  Processor limit have been reached
  • 11. Issues Involved in Distributed System  Hardware Problems As we start using hardware the chances of failure are very high  Combing data after analysis while combining data after analysis from one disk with other disk it cause failure or data loss
  • 12. Hadoop • Hadoop is an software framework for distributed storage and distributed processing •It Is Built from commodity hardware. • Hadoop is designed with a fundamental assumption of • Hardware failure •Largedata Processing
  • 13. Hadoop  Doug Cutting and Michael J. Cafarella developed Hadoop in year 2005 Hadoop Approach To Distributed System  Hadoop provides a simplified programming model, which allows users to quickly write and test distributed system and its efficient automatic distribution of data and work across machines and in turn utilizing the underlying parallelism of cpu cores
  • 14. Advantages Of Hadoop • High scalability and availability • Use commodity (cheap!) hardware with little redundancy • Fault-tolerance • Move computation rather than data
  • 18. HDFS Architecture  NameNode It run as master server Application  Manages the file system namespace.  Regulates client’s access to files.  It also executes file system operations such as renaming, closing, and opening files and directories.
  • 19. HDFS Architecture  DataNode These node runs as slave&manage data storage of their system  Application  Datanodes perform read-write operations on the file systems, as per client request.  They also perform operations such as block creation, deletion, and replication according to the instructions of the namenode.
  • 20. HDFS Architecture  Block The minimum amount of data that HDFS can read or write is called a Block. The default block size is 64MB, but it can be increased as per the need to change in HDFS configuration.
  • 22. Map Reduce Architecture  Map Function In map phase processing by extracting the input data from the splits. For each record parsed by the “InputFormat”, it invoke the user provided “map” function, which emits a number of key/value pair in the memory buffer Example input is” bhaghubhai” The output will be b- 2 g-1 h-3 u-1 a-2 i-1
  • 23. Map Reduce Architecture  Reduce Function TaskTracker will read the region files remotely. It sorts the key/value pairs and for each key, it invoke the “reduce” function, which collects the key/aggregatedValue into the output file (one per reducer node). Example There are two input split both contain “bhaghubhai So it will merge it output will be b- 4 g-2 h-6 u-2 a-4 i-2
  • 24. Word Count Example Program Mapper class public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizeritr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } }
  • 25. Word Count Example Program Reducer class public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritableval : values) { sum += val.get(); } result.set(sum); context.write(key, result); } }
  • 26. Word Count Example Program Driver class public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }