SlideShare a Scribd company logo
1 of 27
SEMINAR ONSEMINAR ON
Android App DevelopmentAndroid App Development
Trained by-Trained by-
Hewlett-Packard Education Services,Hewlett-Packard Education Services,
MumbaiMumbai
Presented to-
Mr. R.K. Banyal By-
Mr. Hukum Chand Saini Urvashi Kataria
About HPES:About HPES:
• American global IT company headquartered in Palo-
Alto, California, US.
• Provider of products, soft wares, technologies,
solutions and services to individual as well as small
& medium sized business.
• Major operations include- HP Software, HP Financial
Services & Corporate Investments
• Provides practical training in fields like Big Data,
Android App Dev, Embedded Systems etc.
An android application that allows you to enjoy your as well as
your dear ones birthday.
Save the days, get reminded of them, capture moments on the
day itself, get greeted by the app, and celebrate!!
About Birthday Bash:About Birthday Bash:
The home screen:The home screen:
Calculating age and further:Calculating age and further:
Saving name for specified date:Saving name for specified date:
Happy Birthday!Happy Birthday!
Hadoop Map Reduce
(Map + reduce)
Presentation on:Presentation on:
Why MapReduce?Why MapReduce?
• Large scale data processing was difficult!
 Managing hundreds or thousands of processors
 Managing parallelization and distribution
 Reliable execution with easy data access
MapReduce provides all of these, easily!
What is Hadoop MapReduce?What is Hadoop MapReduce?
Hadoop ClusterHadoop Cluster HDFS (Physical)HDFS (Physical) StorageStorage
MapReduce ObjectsMapReduce Objects
How Map and Reduce WorkHow Map and Reduce Work
TogetherTogether
Hadoop MapReduce: A Closer LookHadoop MapReduce: A Closer Look
file
file
InputFormat
Split Split Split
RR RR RR
Map Map Map
Input (K, V) pairs
Partitioner
Intermediate (K, V) pairs
Sort
Reduce
OutputFormat
Files loaded from local HDFS store
RecordReaders
Final (K, V) pairs
Writeback to local
HDFS store
file
file
InputFormat
Split Split Split
RR RR RR
Map Map Map
Input (K, V) pairs
Partitioner
Intermediate (K, V) pairs
Sort
Reduce
OutputFormat
Files loaded from local HDFS store
RecordReaders
Final (K, V) pairs
Writeback to local
HDFS store
Node 1 Node 2
Shuffling
Process
Intermediate
(K,V) pairs
exchanged by
all nodes
AlgorithmAlgorithm
map(key, value):
// key: document name; value: text of document
for each word w in value:
emit(w, 1)
reduce(key, values):
// key: a word; values: an iterator over counts
result = 0
for each count v in values:
result += v
emit(key,result)
map(key=url, val=contents):
for each word w in contents:
emit (w, “1”)
reduce(key=word, values=uniq_counts):
//Sum all “1”s in values list
emit result “(word, sum)”
The very famous:The very famous:
Word Count ExampleWord Count Example
Ways to MapReduceWays to MapReduce
Libraries Languages
Note: Java is most common, but other languages can be used
Common Data Sources forCommon Data Sources for
MapReduce JobsMapReduce Jobs
Service ProvidersService Providers
• Open Source
o Apache
• Commercial
o Cloudera
o Hortonworks
o MapR
o AWS MapReduce
o Microsoft HDInsight (Beta)
Advancements:Advancements:
MRV1 & MRV2MRV1 & MRV2
MRV2 (MAPREDUCE VERSION 2)
•Splits the existing JobTracker’s roles
o Resource management
o Job lifecycle management
•MapReduce 2.0 provides many benefits over the existing
MapReduce framework:
o Better scalability
o Through distributed job lifecycle management
o Support for multiple Hadoop MapReduce API versions in a
single cluster
Better MapReduce - OptimizationsBetter MapReduce - Optimizations
Advantages of MapReduceAdvantages of MapReduce
• Distributed data and computation.
• Tasks are independent. Entire nodes can fail and restart.
• Linear scaling in the idle case. It’s used to design cheap
commodity, hardware.
• Simple programming model. The end-user programmer
only writes map reduce task.
Disadvantages/ Cases where MR isn’tDisadvantages/ Cases where MR isn’t
a suitable choice:a suitable choice:
• Real time processing
• It is not always very easy to implement each and every
thing as a map reduce program
• When your intermediate processes need to talk to each
other
• When your processing requires lot of data to be shuffled
over the network
• When you need to handle streaming data. MR is best suited
to batch process huge amount of data which you already
have
Limitations of MapReduceLimitations of MapReduce
RDBMS vs. HadoopRDBMS vs. Hadoop
Traditional RDBMS Hadoop / MapReduce
Data Size Gigabytes (Terabytes) Petabytes (Hexabytes)
Access Interactive and Batch Batch – NOT Interactive
Updates Read / Write many times Write once, Read many times
Structure Static Schema Dynamic Schema
Integrity High (ACID) Low
Scaling Nonlinear Linear
Query Response
Time
Can be near immediate Has latency (due to batch
processing)
ReferencesReferences
• J. Dean and S. Ghemawat. “MapReduce: Simplified Data
Processing on Large Clusters.” Proceedings of the 6th
Symposium on Operating System Design and Implementation
(OSDI 2004), pages 137-150. 2004.
• S. Ghemawat, H. Gobioff, and S.-T. Leung. “The Google File
System.” OSDI 200?
• http://hadoop.apache.org/common/docs/current/mapred_tutori
al.html. “Map/Reduce Tutorial”. Fetched January 21, 2010.
• Tom White. Hadoop: The Definitive Guide. O'Reilly Media.
June 5, 2009
• http://developer.yahoo.com/hadoop/tutorial/module4.html
• J. Lin and C. Dyer. Data-Intensive Text Processing with
MapReduce, Book Draft. February 7, 2010.
Thank You!!Thank You!!

More Related Content

What's hot

Geek camp
Geek campGeek camp
Geek campjdhok
 
Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview Senthil Kumar
 
Dumbo Hadoop Streaming Made Elegant And Easy Klaas Bosteels
Dumbo Hadoop Streaming Made Elegant And Easy Klaas BosteelsDumbo Hadoop Streaming Made Elegant And Easy Klaas Bosteels
Dumbo Hadoop Streaming Made Elegant And Easy Klaas BosteelsGeorge Ang
 
Hadoop course curriculm
Hadoop course curriculm Hadoop course curriculm
Hadoop course curriculm alogarg
 
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...Yahoo Developer Network
 
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranThe Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranMapR Technologies
 
Hadoop & distributed cloud computing
Hadoop & distributed cloud computingHadoop & distributed cloud computing
Hadoop & distributed cloud computingRajan Kumar Upadhyay
 
writing Hadoop Map Reduce programs
writing Hadoop Map Reduce programswriting Hadoop Map Reduce programs
writing Hadoop Map Reduce programsjani shaik
 
Analysing of big data using map reduce
Analysing of big data using map reduceAnalysing of big data using map reduce
Analysing of big data using map reducePaladion Networks
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewNisanth Simon
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce ParadigmDilip Reddy
 
Scalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worldsScalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worldsDataWorks Summit
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduceBhupesh Chawda
 

What's hot (19)

MapReduce basic
MapReduce basicMapReduce basic
MapReduce basic
 
Big data analytics using R
Big data analytics using RBig data analytics using R
Big data analytics using R
 
Geek camp
Geek campGeek camp
Geek camp
 
Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview
 
Dumbo Hadoop Streaming Made Elegant And Easy Klaas Bosteels
Dumbo Hadoop Streaming Made Elegant And Easy Klaas BosteelsDumbo Hadoop Streaming Made Elegant And Easy Klaas Bosteels
Dumbo Hadoop Streaming Made Elegant And Easy Klaas Bosteels
 
Hadoop course curriculm
Hadoop course curriculm Hadoop course curriculm
Hadoop course curriculm
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Non-Relational Revolution
Non-Relational RevolutionNon-Relational Revolution
Non-Relational Revolution
 
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
 
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranThe Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
 
Hadoop & distributed cloud computing
Hadoop & distributed cloud computingHadoop & distributed cloud computing
Hadoop & distributed cloud computing
 
writing Hadoop Map Reduce programs
writing Hadoop Map Reduce programswriting Hadoop Map Reduce programs
writing Hadoop Map Reduce programs
 
Analysing of big data using map reduce
Analysing of big data using map reduceAnalysing of big data using map reduce
Analysing of big data using map reduce
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
Scalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worldsScalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worlds
 
Tutorial5
Tutorial5Tutorial5
Tutorial5
 
Data scientist a perfect job
Data scientist a perfect jobData scientist a perfect job
Data scientist a perfect job
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 

Similar to Hadoop MapReduce

Integrating R & Hadoop - Text Mining & Sentiment Analysis
Integrating R & Hadoop - Text Mining & Sentiment AnalysisIntegrating R & Hadoop - Text Mining & Sentiment Analysis
Integrating R & Hadoop - Text Mining & Sentiment AnalysisAravind Babu
 
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...Cloudera, Inc.
 
Learning How to Learn Hadoop
Learning How to Learn HadoopLearning How to Learn Hadoop
Learning How to Learn HadoopSilicon Halton
 
The Powerful Marriage of Hadoop and R (David Champagne)
The Powerful Marriage of Hadoop and R (David Champagne)The Powerful Marriage of Hadoop and R (David Champagne)
The Powerful Marriage of Hadoop and R (David Champagne)Revolution Analytics
 
Getting Started on Hadoop
Getting Started on HadoopGetting Started on Hadoop
Getting Started on HadoopPaco Nathan
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Andrey Vykhodtsev
 
Python in big data world
Python in big data worldPython in big data world
Python in big data worldRohit
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010nzhang
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsAndrew Brust
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelt3rmin4t0r
 
Hadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopHadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopVictoria López
 
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsightEnterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsightPaco Nathan
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerankgothicane
 
Big Data - HDInsight and Power BI
Big Data - HDInsight and Power BIBig Data - HDInsight and Power BI
Big Data - HDInsight and Power BIPrasad Prabhu (PP)
 

Similar to Hadoop MapReduce (20)

Integrating R & Hadoop - Text Mining & Sentiment Analysis
Integrating R & Hadoop - Text Mining & Sentiment AnalysisIntegrating R & Hadoop - Text Mining & Sentiment Analysis
Integrating R & Hadoop - Text Mining & Sentiment Analysis
 
Big data
Big dataBig data
Big data
 
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
 
Map reducecloudtech
Map reducecloudtechMap reducecloudtech
Map reducecloudtech
 
Learning How to Learn Hadoop
Learning How to Learn HadoopLearning How to Learn Hadoop
Learning How to Learn Hadoop
 
The Powerful Marriage of Hadoop and R (David Champagne)
The Powerful Marriage of Hadoop and R (David Champagne)The Powerful Marriage of Hadoop and R (David Champagne)
The Powerful Marriage of Hadoop and R (David Champagne)
 
Getting Started on Hadoop
Getting Started on HadoopGetting Started on Hadoop
Getting Started on Hadoop
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
 
Python in big data world
Python in big data worldPython in big data world
Python in big data world
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
Hackathon bonn
Hackathon bonnHackathon bonn
Hackathon bonn
 
Big Data Processing
Big Data ProcessingBig Data Processing
Big Data Processing
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthel
 
Hadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopHadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoop
 
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsightEnterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
 
Apache Hadoop: DFS and Map Reduce
Apache Hadoop: DFS and Map ReduceApache Hadoop: DFS and Map Reduce
Apache Hadoop: DFS and Map Reduce
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Big Data - HDInsight and Power BI
Big Data - HDInsight and Power BIBig Data - HDInsight and Power BI
Big Data - HDInsight and Power BI
 

Recently uploaded

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Recently uploaded (20)

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

Hadoop MapReduce

  • 1. SEMINAR ONSEMINAR ON Android App DevelopmentAndroid App Development Trained by-Trained by- Hewlett-Packard Education Services,Hewlett-Packard Education Services, MumbaiMumbai Presented to- Mr. R.K. Banyal By- Mr. Hukum Chand Saini Urvashi Kataria
  • 2. About HPES:About HPES: • American global IT company headquartered in Palo- Alto, California, US. • Provider of products, soft wares, technologies, solutions and services to individual as well as small & medium sized business. • Major operations include- HP Software, HP Financial Services & Corporate Investments • Provides practical training in fields like Big Data, Android App Dev, Embedded Systems etc.
  • 3. An android application that allows you to enjoy your as well as your dear ones birthday. Save the days, get reminded of them, capture moments on the day itself, get greeted by the app, and celebrate!! About Birthday Bash:About Birthday Bash:
  • 4. The home screen:The home screen:
  • 5. Calculating age and further:Calculating age and further:
  • 6. Saving name for specified date:Saving name for specified date:
  • 8. Hadoop Map Reduce (Map + reduce) Presentation on:Presentation on:
  • 9. Why MapReduce?Why MapReduce? • Large scale data processing was difficult!  Managing hundreds or thousands of processors  Managing parallelization and distribution  Reliable execution with easy data access MapReduce provides all of these, easily!
  • 10. What is Hadoop MapReduce?What is Hadoop MapReduce?
  • 11. Hadoop ClusterHadoop Cluster HDFS (Physical)HDFS (Physical) StorageStorage
  • 13. How Map and Reduce WorkHow Map and Reduce Work TogetherTogether
  • 14. Hadoop MapReduce: A Closer LookHadoop MapReduce: A Closer Look file file InputFormat Split Split Split RR RR RR Map Map Map Input (K, V) pairs Partitioner Intermediate (K, V) pairs Sort Reduce OutputFormat Files loaded from local HDFS store RecordReaders Final (K, V) pairs Writeback to local HDFS store file file InputFormat Split Split Split RR RR RR Map Map Map Input (K, V) pairs Partitioner Intermediate (K, V) pairs Sort Reduce OutputFormat Files loaded from local HDFS store RecordReaders Final (K, V) pairs Writeback to local HDFS store Node 1 Node 2 Shuffling Process Intermediate (K,V) pairs exchanged by all nodes
  • 15. AlgorithmAlgorithm map(key, value): // key: document name; value: text of document for each word w in value: emit(w, 1) reduce(key, values): // key: a word; values: an iterator over counts result = 0 for each count v in values: result += v emit(key,result) map(key=url, val=contents): for each word w in contents: emit (w, “1”) reduce(key=word, values=uniq_counts): //Sum all “1”s in values list emit result “(word, sum)”
  • 16. The very famous:The very famous: Word Count ExampleWord Count Example
  • 17. Ways to MapReduceWays to MapReduce Libraries Languages Note: Java is most common, but other languages can be used
  • 18. Common Data Sources forCommon Data Sources for MapReduce JobsMapReduce Jobs
  • 19. Service ProvidersService Providers • Open Source o Apache • Commercial o Cloudera o Hortonworks o MapR o AWS MapReduce o Microsoft HDInsight (Beta)
  • 20. Advancements:Advancements: MRV1 & MRV2MRV1 & MRV2 MRV2 (MAPREDUCE VERSION 2) •Splits the existing JobTracker’s roles o Resource management o Job lifecycle management •MapReduce 2.0 provides many benefits over the existing MapReduce framework: o Better scalability o Through distributed job lifecycle management o Support for multiple Hadoop MapReduce API versions in a single cluster
  • 21. Better MapReduce - OptimizationsBetter MapReduce - Optimizations
  • 22. Advantages of MapReduceAdvantages of MapReduce • Distributed data and computation. • Tasks are independent. Entire nodes can fail and restart. • Linear scaling in the idle case. It’s used to design cheap commodity, hardware. • Simple programming model. The end-user programmer only writes map reduce task.
  • 23. Disadvantages/ Cases where MR isn’tDisadvantages/ Cases where MR isn’t a suitable choice:a suitable choice: • Real time processing • It is not always very easy to implement each and every thing as a map reduce program • When your intermediate processes need to talk to each other • When your processing requires lot of data to be shuffled over the network • When you need to handle streaming data. MR is best suited to batch process huge amount of data which you already have
  • 25. RDBMS vs. HadoopRDBMS vs. Hadoop Traditional RDBMS Hadoop / MapReduce Data Size Gigabytes (Terabytes) Petabytes (Hexabytes) Access Interactive and Batch Batch – NOT Interactive Updates Read / Write many times Write once, Read many times Structure Static Schema Dynamic Schema Integrity High (ACID) Low Scaling Nonlinear Linear Query Response Time Can be near immediate Has latency (due to batch processing)
  • 26. ReferencesReferences • J. Dean and S. Ghemawat. “MapReduce: Simplified Data Processing on Large Clusters.” Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI 2004), pages 137-150. 2004. • S. Ghemawat, H. Gobioff, and S.-T. Leung. “The Google File System.” OSDI 200? • http://hadoop.apache.org/common/docs/current/mapred_tutori al.html. “Map/Reduce Tutorial”. Fetched January 21, 2010. • Tom White. Hadoop: The Definitive Guide. O'Reilly Media. June 5, 2009 • http://developer.yahoo.com/hadoop/tutorial/module4.html • J. Lin and C. Dyer. Data-Intensive Text Processing with MapReduce, Book Draft. February 7, 2010.

Editor's Notes

  1. http://www.dataspora.com/2011/04/pigs-bees-and-elephants-a-comparison-of-eight-mapreduce-languages/
  2. Original Reference: Tom White’s Hadoop: The Definitive Guide (I made some modifications based on my experience)