SlideShare a Scribd company logo
INTRODUCTION TO HADOOP
Presented By
www.zenithit.co.uk
WHAT IS ?
 Distributed computing frame work
 For clusters of computers
 Thousands of Compute Nodes
 Petabytes of data
 Open source, Java
 Google’s MapReduce inspired Yahoo’s Hadoop.
 Now part of Apache group
www.zenithit.co.uk
WHAT IS ?
 The Apache Hadoop project develops open-source
software for reliable, scalable, distributed
computing. Hadoop includes:
 Hadoop Common utilities
 Avro: A data serialization system with scripting
languages.
 Chukwa: managing large distributed systems.
 HBase: A scalable, distributed database for large tables.
 HDFS: A distributed file system.
 Hive: data summarization and ad hoc querying.
 MapReduce: distributed processing on compute clusters.
 Pig: A high-level data-flow language for parallel
computation.
 ZooKeeper: coordination service for distributed
applications.
www.zenithit.co.uk
THE IDEA OF MAP REDUCE
www.zenithit.co.uk
MAP AND REDUCE
 The idea of Map, and Reduce is 40+ year
old
 Present in all Functional Programming
Languages.
 See, e.g., APL, Lisp and ML
 Alternate names for Map: Apply-All
 Higher Order Functions
 take function definitions as arguments, or
 return a function as output
 Map and Reduce are higher-order
functions.
www.zenithit.co.uk
MAP: A HIGHER ORDER FUNCTION
 F(x: int) returns r: int
 Let V be an array of integers.
 W = map(F, V)
 W[i] = F(V[i]) for all I
 i.e., apply F to every element of V
www.zenithit.co.uk
MAP EXAMPLES IN HASKELL
 map (+1) [1,2,3,4,5]
== [2, 3, 4, 5, 6]
 map (toLower) "abcDEFG12!@#“
== "abcdefg12!@#“
 map (`mod` 3) [1..10]
== [1, 2, 0, 1, 2, 0, 1, 2, 0, 1]
www.zenithit.co.uk
REDUCE: A HIGHER ORDER FUNCTION
 reduce also known as
fold, accumulate,
compress or inject
 Reduce/fold takes in
a function and folds
it in between the
elements of a list.
www.zenithit.co.uk
FOLD-LEFT IN HASKELL
 Definition
 foldl f z [] = z
 foldl f z (x:xs) = foldl f (f z x) xs
 Examples
 foldl (+) 0 [1..5] ==15
 foldl (+) 10 [1..5] == 25
 foldl (div) 7 [34,56,12,4,23] == 0
www.zenithit.co.uk
FOLD-RIGHT IN HASKELL
 Definition
 foldr f z [] = z
 foldr f z (x:xs) = f x (foldr f z xs)
 Example
 foldr (div) 7 [34,56,12,4,23] == 8
www.zenithit.co.uk
EXAMPLES OF THE
MAP REDUCE IDEA
www.zenithit.co.uk
WORD COUNT EXAMPLE
 Read text files and count how often words occur.
 The input is text files
 The output is a text file
 each line: word, tab, count
 Map: Produce pairs of (word, count)
 Reduce: For each word, sum up the counts.
www.zenithit.co.uk
GREP EXAMPLE
 Search input files for a given pattern
 Map: emits a line if pattern is matched
 Reduce: Copies results to output
www.zenithit.co.uk
INVERTED INDEX EXAMPLE
 Generate an inverted index of words from a given set
of files
 Map: parses a document and emits <word, docId>
pairs
 Reduce: takes all pairs for a given word, sorts the
docId values, and emits a <word, list(docId)> pair
www.zenithit.co.uk
MAP/REDUCE IMPLEMENTATION
IDEA
www.zenithit.co.uk
EXECUTION ON CLUSTERS
1. Input files split (M splits)
2. Assign Master & Workers
3. Map tasks
4. Writing intermediate data to disk (R regions)
5. Intermediate data read & sort
6. Reduce tasks
7. Return
www.zenithit.co.uk
MAP/REDUCE CLUSTER IMPLEMENTATION
split 0
split 1
split 2
split 3
split 4
Output 0
Output 1
Input
files
Output
files
M map
tasks
R reduce
tasks
Intermediate
files
Several map or
reduce tasks can
run on a single
computer
Each intermediate
file is divided into R
partitions, by
partitioning function
Each reduce task
corresponds to one
partition
www.zenithit.co.uk
EXECUTION
www.zenithit.co.uk
FAULT RECOVERY
 Workers are pinged by master periodically
 Non-responsive workers are marked as failed
 All tasks in-progress or completed by failed worker become
eligible for rescheduling
 Master could periodically checkpoint
 Current implementations abort on master failure
www.zenithit.co.uk
POPULAR GOOGLE SEARCH KEY
WORDS
Hadoop training in UK
Bigdata training in UK
Best Hadoop training in UK
Best Bigdata trainin g in UK
Hadoop fee
Hadoop material
Hadoop videos

More Related Content

What's hot

R language introduction
R language introductionR language introduction
R language introduction
Shashwat Shriparv
 
Heapsort using Heap
Heapsort using HeapHeapsort using Heap
Heapsort using Heap
Mohamed Fawzy
 
Heap sort
Heap sortHeap sort
Heap sort
Ayesha Tahir
 
Working with LiDAR
Working with LiDARWorking with LiDAR
Working with LiDAR
Safe Software
 
R seminar dplyr package
R seminar dplyr packageR seminar dplyr package
R seminar dplyr package
Muhammad Nabi Ahmad
 
Data engineering and analytics using python
Data engineering and analytics using pythonData engineering and analytics using python
Data engineering and analytics using python
Purna Chander
 
Data Analysis with Python Pandas
Data Analysis with Python PandasData Analysis with Python Pandas
Data Analysis with Python Pandas
Neeru Mittal
 
5 R Tutorial Data Visualization
5 R Tutorial Data Visualization5 R Tutorial Data Visualization
5 R Tutorial Data Visualization
Sakthi Dasans
 
Data structures and algorithms lab10
Data structures and algorithms lab10Data structures and algorithms lab10
Data structures and algorithms lab10
Bianca Teşilă
 
heap Sort Algorithm
heap  Sort Algorithmheap  Sort Algorithm
heap Sort Algorithm
Lemia Algmri
 
ComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical SciencesComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical Sciences
alexstorer
 
Python and CSV Connectivity
Python and CSV ConnectivityPython and CSV Connectivity
Python and CSV Connectivity
Neeru Mittal
 
Heap Data Structure
 Heap Data Structure Heap Data Structure
Heap Data Structure
Saumya Som
 
Pig statements
Pig statementsPig statements
Pig statements
Ganesh Sanap
 
Heap Sort in Design and Analysis of algorithms
Heap Sort in Design and Analysis of algorithmsHeap Sort in Design and Analysis of algorithms
Heap Sort in Design and Analysis of algorithms
samairaakram
 
Communication Patterns with Apache Spark-(Reza Zadeh, Stanford)
Communication Patterns with Apache Spark-(Reza Zadeh, Stanford)Communication Patterns with Apache Spark-(Reza Zadeh, Stanford)
Communication Patterns with Apache Spark-(Reza Zadeh, Stanford)
Spark Summit
 
Presentation on Heap Sort
Presentation on Heap Sort Presentation on Heap Sort
Presentation on Heap Sort
Amit Kundu
 
Sawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsSawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data Clouds
Robert Grossman
 
Heap tree
Heap treeHeap tree
Heap tree
Shankar Bishnoi
 
Lec 17 heap data structure
Lec 17 heap data structureLec 17 heap data structure
Lec 17 heap data structure
Sajid Marwat
 

What's hot (20)

R language introduction
R language introductionR language introduction
R language introduction
 
Heapsort using Heap
Heapsort using HeapHeapsort using Heap
Heapsort using Heap
 
Heap sort
Heap sortHeap sort
Heap sort
 
Working with LiDAR
Working with LiDARWorking with LiDAR
Working with LiDAR
 
R seminar dplyr package
R seminar dplyr packageR seminar dplyr package
R seminar dplyr package
 
Data engineering and analytics using python
Data engineering and analytics using pythonData engineering and analytics using python
Data engineering and analytics using python
 
Data Analysis with Python Pandas
Data Analysis with Python PandasData Analysis with Python Pandas
Data Analysis with Python Pandas
 
5 R Tutorial Data Visualization
5 R Tutorial Data Visualization5 R Tutorial Data Visualization
5 R Tutorial Data Visualization
 
Data structures and algorithms lab10
Data structures and algorithms lab10Data structures and algorithms lab10
Data structures and algorithms lab10
 
heap Sort Algorithm
heap  Sort Algorithmheap  Sort Algorithm
heap Sort Algorithm
 
ComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical SciencesComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical Sciences
 
Python and CSV Connectivity
Python and CSV ConnectivityPython and CSV Connectivity
Python and CSV Connectivity
 
Heap Data Structure
 Heap Data Structure Heap Data Structure
Heap Data Structure
 
Pig statements
Pig statementsPig statements
Pig statements
 
Heap Sort in Design and Analysis of algorithms
Heap Sort in Design and Analysis of algorithmsHeap Sort in Design and Analysis of algorithms
Heap Sort in Design and Analysis of algorithms
 
Communication Patterns with Apache Spark-(Reza Zadeh, Stanford)
Communication Patterns with Apache Spark-(Reza Zadeh, Stanford)Communication Patterns with Apache Spark-(Reza Zadeh, Stanford)
Communication Patterns with Apache Spark-(Reza Zadeh, Stanford)
 
Presentation on Heap Sort
Presentation on Heap Sort Presentation on Heap Sort
Presentation on Heap Sort
 
Sawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsSawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data Clouds
 
Heap tree
Heap treeHeap tree
Heap tree
 
Lec 17 heap data structure
Lec 17 heap data structureLec 17 heap data structure
Lec 17 heap data structure
 

Similar to Zenith it-hadoop-training

Map-Reduce and Apache Hadoop
Map-Reduce and Apache HadoopMap-Reduce and Apache Hadoop
Map-Reduce and Apache Hadoop
Svetlin Nakov
 
Introduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopIntroduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with Hadoop
Dilum Bandara
 
Apache Spark with Scala
Apache Spark with ScalaApache Spark with Scala
Apache Spark with Scala
Fernando Rodriguez
 
Hadoop
HadoopHadoop
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Desing Pathshala
 
SparkNotes
SparkNotesSparkNotes
SparkNotes
Demet Aksoy
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
VNIT-ACM Student Chapter
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
gothicane
 
Hadoop本 輪読会 1章〜2章
Hadoop本 輪読会 1章〜2章Hadoop本 輪読会 1章〜2章
Hadoop本 輪読会 1章〜2章
moai kids
 
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
shravanthium111
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
IIIT-H
 
Map reducefunnyslide
Map reducefunnyslideMap reducefunnyslide
Map reducefunnyslide
letstalkbigdata
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
Prashant Gupta
 
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainApache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Yahoo Developer Network
 
Embarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel ProblemsEmbarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel Problems
Dilum Bandara
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
Lynn Langit
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
Mohamed Ali Mahmoud khouder
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologies
Kelly Technologies
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
HARIKRISHNANU13
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
IndicThreads
 

Similar to Zenith it-hadoop-training (20)

Map-Reduce and Apache Hadoop
Map-Reduce and Apache HadoopMap-Reduce and Apache Hadoop
Map-Reduce and Apache Hadoop
 
Introduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopIntroduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with Hadoop
 
Apache Spark with Scala
Apache Spark with ScalaApache Spark with Scala
Apache Spark with Scala
 
Hadoop
HadoopHadoop
Hadoop
 
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
 
SparkNotes
SparkNotesSparkNotes
SparkNotes
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Hadoop本 輪読会 1章〜2章
Hadoop本 輪読会 1章〜2章Hadoop本 輪読会 1章〜2章
Hadoop本 輪読会 1章〜2章
 
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
CSI conference PPT on Performance Analysis of Map/Reduce to compute the frequ...
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
 
Map reducefunnyslide
Map reducefunnyslideMap reducefunnyslide
Map reducefunnyslide
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainApache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
 
Embarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel ProblemsEmbarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel Problems
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologies
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
 

Recently uploaded

CapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptxCapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapitolTechU
 
SWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptxSWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptx
zuzanka
 
Simple-Present-Tense xxxxxxxxxxxxxxxxxxx
Simple-Present-Tense xxxxxxxxxxxxxxxxxxxSimple-Present-Tense xxxxxxxxxxxxxxxxxxx
Simple-Present-Tense xxxxxxxxxxxxxxxxxxx
RandolphRadicy
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
haiqairshad
 
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
indexPub
 
CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...
Nguyen Thanh Tu Collection
 
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptxRESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
zuzanka
 
Pharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brubPharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brub
danielkiash986
 
BPSC-105 important questions for june term end exam
BPSC-105 important questions for june term end examBPSC-105 important questions for june term end exam
BPSC-105 important questions for june term end exam
sonukumargpnirsadhan
 
How to Setup Default Value for a Field in Odoo 17
How to Setup Default Value for a Field in Odoo 17How to Setup Default Value for a Field in Odoo 17
How to Setup Default Value for a Field in Odoo 17
Celine George
 
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
TechSoup
 
Bonku-Babus-Friend by Sathyajith Ray (9)
Bonku-Babus-Friend by Sathyajith Ray  (9)Bonku-Babus-Friend by Sathyajith Ray  (9)
Bonku-Babus-Friend by Sathyajith Ray (9)
nitinpv4ai
 
Educational Technology in the Health Sciences
Educational Technology in the Health SciencesEducational Technology in the Health Sciences
Educational Technology in the Health Sciences
Iris Thiele Isip-Tan
 
Observational Learning
Observational Learning Observational Learning
Observational Learning
sanamushtaq922
 
A Free 200-Page eBook ~ Brain and Mind Exercise.pptx
A Free 200-Page eBook ~ Brain and Mind Exercise.pptxA Free 200-Page eBook ~ Brain and Mind Exercise.pptx
A Free 200-Page eBook ~ Brain and Mind Exercise.pptx
OH TEIK BIN
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...
Nguyen Thanh Tu Collection
 
Juneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School DistrictJuneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School District
David Douglas School District
 
CIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdfCIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdf
blueshagoo1
 
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdfمصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
سمير بسيوني
 
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
Payaamvohra1
 

Recently uploaded (20)

CapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptxCapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
 
SWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptxSWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptx
 
Simple-Present-Tense xxxxxxxxxxxxxxxxxxx
Simple-Present-Tense xxxxxxxxxxxxxxxxxxxSimple-Present-Tense xxxxxxxxxxxxxxxxxxx
Simple-Present-Tense xxxxxxxxxxxxxxxxxxx
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
 
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
 
CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...
 
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptxRESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
 
Pharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brubPharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brub
 
BPSC-105 important questions for june term end exam
BPSC-105 important questions for june term end examBPSC-105 important questions for june term end exam
BPSC-105 important questions for june term end exam
 
How to Setup Default Value for a Field in Odoo 17
How to Setup Default Value for a Field in Odoo 17How to Setup Default Value for a Field in Odoo 17
How to Setup Default Value for a Field in Odoo 17
 
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
 
Bonku-Babus-Friend by Sathyajith Ray (9)
Bonku-Babus-Friend by Sathyajith Ray  (9)Bonku-Babus-Friend by Sathyajith Ray  (9)
Bonku-Babus-Friend by Sathyajith Ray (9)
 
Educational Technology in the Health Sciences
Educational Technology in the Health SciencesEducational Technology in the Health Sciences
Educational Technology in the Health Sciences
 
Observational Learning
Observational Learning Observational Learning
Observational Learning
 
A Free 200-Page eBook ~ Brain and Mind Exercise.pptx
A Free 200-Page eBook ~ Brain and Mind Exercise.pptxA Free 200-Page eBook ~ Brain and Mind Exercise.pptx
A Free 200-Page eBook ~ Brain and Mind Exercise.pptx
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...
 
Juneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School DistrictJuneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School District
 
CIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdfCIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdf
 
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdfمصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
 
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
 

Zenith it-hadoop-training

  • 1. INTRODUCTION TO HADOOP Presented By www.zenithit.co.uk
  • 2. WHAT IS ?  Distributed computing frame work  For clusters of computers  Thousands of Compute Nodes  Petabytes of data  Open source, Java  Google’s MapReduce inspired Yahoo’s Hadoop.  Now part of Apache group www.zenithit.co.uk
  • 3. WHAT IS ?  The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. Hadoop includes:  Hadoop Common utilities  Avro: A data serialization system with scripting languages.  Chukwa: managing large distributed systems.  HBase: A scalable, distributed database for large tables.  HDFS: A distributed file system.  Hive: data summarization and ad hoc querying.  MapReduce: distributed processing on compute clusters.  Pig: A high-level data-flow language for parallel computation.  ZooKeeper: coordination service for distributed applications. www.zenithit.co.uk
  • 4. THE IDEA OF MAP REDUCE www.zenithit.co.uk
  • 5. MAP AND REDUCE  The idea of Map, and Reduce is 40+ year old  Present in all Functional Programming Languages.  See, e.g., APL, Lisp and ML  Alternate names for Map: Apply-All  Higher Order Functions  take function definitions as arguments, or  return a function as output  Map and Reduce are higher-order functions. www.zenithit.co.uk
  • 6. MAP: A HIGHER ORDER FUNCTION  F(x: int) returns r: int  Let V be an array of integers.  W = map(F, V)  W[i] = F(V[i]) for all I  i.e., apply F to every element of V www.zenithit.co.uk
  • 7. MAP EXAMPLES IN HASKELL  map (+1) [1,2,3,4,5] == [2, 3, 4, 5, 6]  map (toLower) "abcDEFG12!@#“ == "abcdefg12!@#“  map (`mod` 3) [1..10] == [1, 2, 0, 1, 2, 0, 1, 2, 0, 1] www.zenithit.co.uk
  • 8. REDUCE: A HIGHER ORDER FUNCTION  reduce also known as fold, accumulate, compress or inject  Reduce/fold takes in a function and folds it in between the elements of a list. www.zenithit.co.uk
  • 9. FOLD-LEFT IN HASKELL  Definition  foldl f z [] = z  foldl f z (x:xs) = foldl f (f z x) xs  Examples  foldl (+) 0 [1..5] ==15  foldl (+) 10 [1..5] == 25  foldl (div) 7 [34,56,12,4,23] == 0 www.zenithit.co.uk
  • 10. FOLD-RIGHT IN HASKELL  Definition  foldr f z [] = z  foldr f z (x:xs) = f x (foldr f z xs)  Example  foldr (div) 7 [34,56,12,4,23] == 8 www.zenithit.co.uk
  • 11. EXAMPLES OF THE MAP REDUCE IDEA www.zenithit.co.uk
  • 12. WORD COUNT EXAMPLE  Read text files and count how often words occur.  The input is text files  The output is a text file  each line: word, tab, count  Map: Produce pairs of (word, count)  Reduce: For each word, sum up the counts. www.zenithit.co.uk
  • 13. GREP EXAMPLE  Search input files for a given pattern  Map: emits a line if pattern is matched  Reduce: Copies results to output www.zenithit.co.uk
  • 14. INVERTED INDEX EXAMPLE  Generate an inverted index of words from a given set of files  Map: parses a document and emits <word, docId> pairs  Reduce: takes all pairs for a given word, sorts the docId values, and emits a <word, list(docId)> pair www.zenithit.co.uk
  • 16. EXECUTION ON CLUSTERS 1. Input files split (M splits) 2. Assign Master & Workers 3. Map tasks 4. Writing intermediate data to disk (R regions) 5. Intermediate data read & sort 6. Reduce tasks 7. Return www.zenithit.co.uk
  • 17. MAP/REDUCE CLUSTER IMPLEMENTATION split 0 split 1 split 2 split 3 split 4 Output 0 Output 1 Input files Output files M map tasks R reduce tasks Intermediate files Several map or reduce tasks can run on a single computer Each intermediate file is divided into R partitions, by partitioning function Each reduce task corresponds to one partition www.zenithit.co.uk
  • 19. FAULT RECOVERY  Workers are pinged by master periodically  Non-responsive workers are marked as failed  All tasks in-progress or completed by failed worker become eligible for rescheduling  Master could periodically checkpoint  Current implementations abort on master failure www.zenithit.co.uk
  • 20. POPULAR GOOGLE SEARCH KEY WORDS Hadoop training in UK Bigdata training in UK Best Hadoop training in UK Best Bigdata trainin g in UK Hadoop fee Hadoop material Hadoop videos