SlideShare a Scribd company logo
1 of 21
INTRODUCTION TO
HADOOP
Presented By
www.kellytechno.com
ACK
 Thanks to all the authors who left their slides on
the Web.
 I own the errors of course.
www.kellytechno.com
WHAT IS ?
 Distributed computing frame work
 For clusters of computers
 Thousands of Compute Nodes
 Petabytes of data
 Open source, Java
 Google’s MapReduce inspired Yahoo’s Hadoop.
 Now part of Apache group
www.kellytechno.com
WHAT IS ?
 The Apache Hadoop project develops open-source
software for reliable, scalable, distributed
computing. Hadoop includes:
 Hadoop Common utilities
 Avro: A data serialization system with scripting
languages.
 Chukwa: managing large distributed systems.
 HBase: A scalable, distributed database for large tables.
 HDFS: A distributed file system.
 Hive: data summarization and ad hoc querying.
 MapReduce: distributed processing on compute clusters.
 Pig: A high-level data-flow language for parallel
computation.
 ZooKeeper: coordination service for distributed
applications.
www.kellytechno.com
THE IDEA OF MAP REDUCE
www.kellytechno.com
MAP AND REDUCE
 The idea of Map, and Reduce is 40+ year
old
Present in all Functional Programming
Languages.
See, e.g., APL, Lisp and ML
 Alternate names for Map: Apply-All
 Higher Order Functions
take function definitions as arguments, or
return a function as output
 Map and Reduce are higher-order
functions.
www.kellytechno.com
MAP: A HIGHER ORDER FUNCTION
 F(x: int) returns r: int
 Let V be an array of integers.
 W = map(F, V)
 W[i] = F(V[i]) for all I
 i.e., apply F to every element of V
www.kellytechno.com
MAP EXAMPLES IN HASKELL
 map (+1) [1,2,3,4,5]
== [2, 3, 4, 5, 6]
 map (toLower) "abcDEFG12!@#“
== "abcdefg12!@#“
 map (`mod` 3) [1..10]
== [1, 2, 0, 1, 2, 0, 1, 2, 0, 1]
www.kellytechno.com
REDUCE: A HIGHER ORDER
FUNCTION
 reduce also known as
fold, accumulate,
compress or inject
 Reduce/fold takes in
a function and folds
it in between the
elements of a list.
www.kellytechno.com
FOLD-LEFT IN HASKELL
 Definition
 foldl f z [] = z
 foldl f z (x:xs) = foldl f (f z x) xs
 Examples
 foldl (+) 0 [1..5] ==15
 foldl (+) 10 [1..5] == 25
 foldl (div) 7 [34,56,12,4,23] == 0
www.kellytechno.com
FOLD-RIGHT IN HASKELL
 Definition
 foldr f z [] = z
 foldr f z (x:xs) = f x (foldr f z xs)
 Example
 foldr (div) 7 [34,56,12,4,23] == 8
www.kellytechno.com
EXAMPLES OF THE
MAP REDUCE IDEA
www.kellytechno.com
WORD COUNT EXAMPLE
 Read text files and count how often words occur.
 The input is text files
 The output is a text file
 each line: word, tab, count
 Map: Produce pairs of (word, count)
 Reduce: For each word, sum up the counts.
www.kellytechno.com
GREP EXAMPLE
 Search input files for a given pattern
 Map: emits a line if pattern is matched
 Reduce: Copies results to output
www.kellytechno.com
INVERTED INDEX EXAMPLE
 Generate an inverted index of words from a given set
of files
 Map: parses a document and emits <word, docId>
pairs
 Reduce: takes all pairs for a given word, sorts the
docId values, and emits a <word, list(docId)> pair
www.kellytechno.com
MAP/REDUCE
IMPLEMENTATION IDEA
www.kellytechno.com
EXECUTION ON CLUSTERS
1. Input files split (M splits)
2. Assign Master & Workers
3. Map tasks
4. Writing intermediate data to disk (R regions)
5. Intermediate data read & sort
6. Reduce tasks
7. Return
www.kellytechno.com
MAP/REDUCE CLUSTER
IMPLEMENTATION
split 0
split 1
split 2
split 3
split 4
Output 0
Output 1
Input
files
Output
files
M map
tasks
R reduce
tasks
Intermediate
files
Several map or
reduce tasks can
run on a single
computer
Each intermediate
file is divided into R
partitions, by
partitioning function
Each reduce task
corresponds to one
partition
www.kellytechno.com
EXECUTION
www.kellytechno.com
FAULT RECOVERY
 Workers are pinged by master periodically
Non-responsive workers are marked as failed
All tasks in-progress or completed by failed worker become
eligible for rescheduling
 Master could periodically checkpoint
Current implementations abort on master failure
www.kellytechno.com
www.kellytechno.com

More Related Content

What's hot

Into The Wonderful
Into The WonderfulInto The Wonderful
Into The WonderfulMatt Wood
 
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]Alexander Hendorf
 
pandas - Python Data Analysis
pandas - Python Data Analysispandas - Python Data Analysis
pandas - Python Data AnalysisAndrew Henshaw
 
Binary Heap Tree, Data Structure
Binary Heap Tree, Data Structure Binary Heap Tree, Data Structure
Binary Heap Tree, Data Structure Anand Ingle
 
Introduction to pandas
Introduction to pandasIntroduction to pandas
Introduction to pandasPiyush rai
 
Python and Data Analysis
Python and Data AnalysisPython and Data Analysis
Python and Data AnalysisPraveen Nair
 
Let if flow: Java 8 Streams puzzles and more
Let if flow: Java 8 Streams puzzles and moreLet if flow: Java 8 Streams puzzles and more
Let if flow: Java 8 Streams puzzles and moreBhakti Mehta
 
Big Data Hadoop Training in Pune-Course Content Advanto Software
Big Data Hadoop Training in Pune-Course Content Advanto SoftwareBig Data Hadoop Training in Pune-Course Content Advanto Software
Big Data Hadoop Training in Pune-Course Content Advanto SoftwareAdvanto Software
 
Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Ram Narasimhan
 
Lec 17 heap data structure
Lec 17 heap data structureLec 17 heap data structure
Lec 17 heap data structureSajid Marwat
 

What's hot (11)

Into The Wonderful
Into The WonderfulInto The Wonderful
Into The Wonderful
 
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]
 
pandas - Python Data Analysis
pandas - Python Data Analysispandas - Python Data Analysis
pandas - Python Data Analysis
 
Binary Heap Tree, Data Structure
Binary Heap Tree, Data Structure Binary Heap Tree, Data Structure
Binary Heap Tree, Data Structure
 
Introduction to pandas
Introduction to pandasIntroduction to pandas
Introduction to pandas
 
Python and Data Analysis
Python and Data AnalysisPython and Data Analysis
Python and Data Analysis
 
Let if flow: Java 8 Streams puzzles and more
Let if flow: Java 8 Streams puzzles and moreLet if flow: Java 8 Streams puzzles and more
Let if flow: Java 8 Streams puzzles and more
 
Vertica on aws
Vertica on awsVertica on aws
Vertica on aws
 
Big Data Hadoop Training in Pune-Course Content Advanto Software
Big Data Hadoop Training in Pune-Course Content Advanto SoftwareBig Data Hadoop Training in Pune-Course Content Advanto Software
Big Data Hadoop Training in Pune-Course Content Advanto Software
 
Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)
 
Lec 17 heap data structure
Lec 17 heap data structureLec 17 heap data structure
Lec 17 heap data structure
 

Similar to Hadoop training institutes in bangalore

Map-Reduce and Apache Hadoop
Map-Reduce and Apache HadoopMap-Reduce and Apache Hadoop
Map-Reduce and Apache HadoopSvetlin Nakov
 
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaDesing Pathshala
 
Another Intro To Hadoop
Another Intro To HadoopAnother Intro To Hadoop
Another Intro To HadoopAdeel Ahmad
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesKelly Technologies
 
Hadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreHadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreKelly Technologies
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesKelly Technologies
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsLynn Langit
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Chris Baglieri
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerankgothicane
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingCloudera, Inc.
 
Hadoop and big data training
Hadoop and big data trainingHadoop and big data training
Hadoop and big data trainingagiamas
 
Hadoop with Python
Hadoop with PythonHadoop with Python
Hadoop with PythonDonald Miner
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-servicesSreenu Musham
 
Python in big data world
Python in big data worldPython in big data world
Python in big data worldRohit
 
Algorithms on Hadoop at Last.fm
Algorithms on Hadoop at Last.fmAlgorithms on Hadoop at Last.fm
Algorithms on Hadoop at Last.fmMark Levy
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsDataStax Academy
 

Similar to Hadoop training institutes in bangalore (20)

Map-Reduce and Apache Hadoop
Map-Reduce and Apache HadoopMap-Reduce and Apache Hadoop
Map-Reduce and Apache Hadoop
 
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
 
Another Intro To Hadoop
Another Intro To HadoopAnother Intro To Hadoop
Another Intro To Hadoop
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologies
 
Hadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreHadoop institutes-in-bangalore
Hadoop institutes-in-bangalore
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Hadoop
HadoopHadoop
Hadoop
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
 
Hadoop and big data training
Hadoop and big data trainingHadoop and big data training
Hadoop and big data training
 
Hadoop with Python
Hadoop with PythonHadoop with Python
Hadoop with Python
 
Hive paris
Hive parisHive paris
Hive paris
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
 
Python in big data world
Python in big data worldPython in big data world
Python in big data world
 
Algorithms on Hadoop at Last.fm
Algorithms on Hadoop at Last.fmAlgorithms on Hadoop at Last.fm
Algorithms on Hadoop at Last.fm
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
 
Big data concepts
Big data conceptsBig data concepts
Big data concepts
 

More from Kelly Technologies

Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabadKelly Technologies
 
Data science institutes in hyderabad
Data science institutes in hyderabadData science institutes in hyderabad
Data science institutes in hyderabadKelly Technologies
 
Data science training in hyderabad
Data science training in hyderabadData science training in hyderabad
Data science training in hyderabadKelly Technologies
 
Hadoop training institute in hyderabad
Hadoop training institute in hyderabadHadoop training institute in hyderabad
Hadoop training institute in hyderabadKelly Technologies
 
Hadoop institutes in hyderabad
Hadoop institutes in hyderabadHadoop institutes in hyderabad
Hadoop institutes in hyderabadKelly Technologies
 
Hadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesHadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesKelly Technologies
 
Websphere mb training in hyderabad
Websphere mb training in hyderabadWebsphere mb training in hyderabad
Websphere mb training in hyderabadKelly Technologies
 
Oracle training-institutes-in-hyderabad
Oracle training-institutes-in-hyderabadOracle training-institutes-in-hyderabad
Oracle training-institutes-in-hyderabadKelly Technologies
 
Hadoop training institute in bangalore
Hadoop training institute in bangaloreHadoop training institute in bangalore
Hadoop training institute in bangaloreKelly Technologies
 
Salesforce crm-training-in-bangalore
Salesforce crm-training-in-bangaloreSalesforce crm-training-in-bangalore
Salesforce crm-training-in-bangaloreKelly Technologies
 
Qlikview training in hyderabad
Qlikview training in hyderabadQlikview training in hyderabad
Qlikview training in hyderabadKelly Technologies
 
Project Management Planning training in hyderabad
Project Management Planning training in hyderabadProject Management Planning training in hyderabad
Project Management Planning training in hyderabadKelly Technologies
 
Ax finance training in hyderabad
Ax finance training in hyderabadAx finance training in hyderabad
Ax finance training in hyderabadKelly Technologies
 

More from Kelly Technologies (20)

Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabad
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
Data science institutes in hyderabad
Data science institutes in hyderabadData science institutes in hyderabad
Data science institutes in hyderabad
 
Data science training in hyderabad
Data science training in hyderabadData science training in hyderabad
Data science training in hyderabad
 
Hadoop training institute in hyderabad
Hadoop training institute in hyderabadHadoop training institute in hyderabad
Hadoop training institute in hyderabad
 
Hadoop institutes in hyderabad
Hadoop institutes in hyderabadHadoop institutes in hyderabad
Hadoop institutes in hyderabad
 
Hadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesHadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologies
 
Sas training in hyderabad
Sas training in hyderabadSas training in hyderabad
Sas training in hyderabad
 
Websphere mb training in hyderabad
Websphere mb training in hyderabadWebsphere mb training in hyderabad
Websphere mb training in hyderabad
 
Oracle training-institutes-in-hyderabad
Oracle training-institutes-in-hyderabadOracle training-institutes-in-hyderabad
Oracle training-institutes-in-hyderabad
 
Hadoop training institute in bangalore
Hadoop training institute in bangaloreHadoop training institute in bangalore
Hadoop training institute in bangalore
 
Tableau training in bangalore
Tableau training in bangaloreTableau training in bangalore
Tableau training in bangalore
 
Salesforce crm-training-in-bangalore
Salesforce crm-training-in-bangaloreSalesforce crm-training-in-bangalore
Salesforce crm-training-in-bangalore
 
Oracle training in hyderabad
Oracle training in hyderabadOracle training in hyderabad
Oracle training in hyderabad
 
Qlikview training in hyderabad
Qlikview training in hyderabadQlikview training in hyderabad
Qlikview training in hyderabad
 
Spark training-in-bangalore
Spark training-in-bangaloreSpark training-in-bangalore
Spark training-in-bangalore
 
Project Management Planning training in hyderabad
Project Management Planning training in hyderabadProject Management Planning training in hyderabad
Project Management Planning training in hyderabad
 
Hadoop training in bangalore
Hadoop training in bangaloreHadoop training in bangalore
Hadoop training in bangalore
 
Oracle training in_hyderabad
Oracle training in_hyderabadOracle training in_hyderabad
Oracle training in_hyderabad
 
Ax finance training in hyderabad
Ax finance training in hyderabadAx finance training in hyderabad
Ax finance training in hyderabad
 

Recently uploaded

Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxabhijeetpadhi001
 

Recently uploaded (20)

Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptx
 

Hadoop training institutes in bangalore

  • 2. ACK  Thanks to all the authors who left their slides on the Web.  I own the errors of course. www.kellytechno.com
  • 3. WHAT IS ?  Distributed computing frame work  For clusters of computers  Thousands of Compute Nodes  Petabytes of data  Open source, Java  Google’s MapReduce inspired Yahoo’s Hadoop.  Now part of Apache group www.kellytechno.com
  • 4. WHAT IS ?  The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. Hadoop includes:  Hadoop Common utilities  Avro: A data serialization system with scripting languages.  Chukwa: managing large distributed systems.  HBase: A scalable, distributed database for large tables.  HDFS: A distributed file system.  Hive: data summarization and ad hoc querying.  MapReduce: distributed processing on compute clusters.  Pig: A high-level data-flow language for parallel computation.  ZooKeeper: coordination service for distributed applications. www.kellytechno.com
  • 5. THE IDEA OF MAP REDUCE www.kellytechno.com
  • 6. MAP AND REDUCE  The idea of Map, and Reduce is 40+ year old Present in all Functional Programming Languages. See, e.g., APL, Lisp and ML  Alternate names for Map: Apply-All  Higher Order Functions take function definitions as arguments, or return a function as output  Map and Reduce are higher-order functions. www.kellytechno.com
  • 7. MAP: A HIGHER ORDER FUNCTION  F(x: int) returns r: int  Let V be an array of integers.  W = map(F, V)  W[i] = F(V[i]) for all I  i.e., apply F to every element of V www.kellytechno.com
  • 8. MAP EXAMPLES IN HASKELL  map (+1) [1,2,3,4,5] == [2, 3, 4, 5, 6]  map (toLower) "abcDEFG12!@#“ == "abcdefg12!@#“  map (`mod` 3) [1..10] == [1, 2, 0, 1, 2, 0, 1, 2, 0, 1] www.kellytechno.com
  • 9. REDUCE: A HIGHER ORDER FUNCTION  reduce also known as fold, accumulate, compress or inject  Reduce/fold takes in a function and folds it in between the elements of a list. www.kellytechno.com
  • 10. FOLD-LEFT IN HASKELL  Definition  foldl f z [] = z  foldl f z (x:xs) = foldl f (f z x) xs  Examples  foldl (+) 0 [1..5] ==15  foldl (+) 10 [1..5] == 25  foldl (div) 7 [34,56,12,4,23] == 0 www.kellytechno.com
  • 11. FOLD-RIGHT IN HASKELL  Definition  foldr f z [] = z  foldr f z (x:xs) = f x (foldr f z xs)  Example  foldr (div) 7 [34,56,12,4,23] == 8 www.kellytechno.com
  • 12. EXAMPLES OF THE MAP REDUCE IDEA www.kellytechno.com
  • 13. WORD COUNT EXAMPLE  Read text files and count how often words occur.  The input is text files  The output is a text file  each line: word, tab, count  Map: Produce pairs of (word, count)  Reduce: For each word, sum up the counts. www.kellytechno.com
  • 14. GREP EXAMPLE  Search input files for a given pattern  Map: emits a line if pattern is matched  Reduce: Copies results to output www.kellytechno.com
  • 15. INVERTED INDEX EXAMPLE  Generate an inverted index of words from a given set of files  Map: parses a document and emits <word, docId> pairs  Reduce: takes all pairs for a given word, sorts the docId values, and emits a <word, list(docId)> pair www.kellytechno.com
  • 17. EXECUTION ON CLUSTERS 1. Input files split (M splits) 2. Assign Master & Workers 3. Map tasks 4. Writing intermediate data to disk (R regions) 5. Intermediate data read & sort 6. Reduce tasks 7. Return www.kellytechno.com
  • 18. MAP/REDUCE CLUSTER IMPLEMENTATION split 0 split 1 split 2 split 3 split 4 Output 0 Output 1 Input files Output files M map tasks R reduce tasks Intermediate files Several map or reduce tasks can run on a single computer Each intermediate file is divided into R partitions, by partitioning function Each reduce task corresponds to one partition www.kellytechno.com
  • 20. FAULT RECOVERY  Workers are pinged by master periodically Non-responsive workers are marked as failed All tasks in-progress or completed by failed worker become eligible for rescheduling  Master could periodically checkpoint Current implementations abort on master failure www.kellytechno.com