SlideShare a Scribd company logo
SEMINAR ONSEMINAR ON
Android App DevelopmentAndroid App Development
Trained by-Trained by-
Hewlett-Packard Education Services,Hewlett-Packard Education Services,
MumbaiMumbai
Presented to-
Mr. R.K. Banyal By-
Mr. Hukum Chand Saini Urvashi Kataria
About HPES:About HPES:
• American global IT company headquartered in Palo-
Alto, California, US.
• Provider of products, soft wares, technologies,
solutions and services to individual as well as small
& medium sized business.
• Major operations include- HP Software, HP Financial
Services & Corporate Investments
• Provides practical training in fields like Big Data,
Android App Dev, Embedded Systems etc.
An android application that allows you to enjoy your as well as
your dear ones birthday.
Save the days, get reminded of them, capture moments on the
day itself, get greeted by the app, and celebrate!!
About Birthday Bash:About Birthday Bash:
The home screen:The home screen:
Calculating age and further:Calculating age and further:
Saving name for specified date:Saving name for specified date:
Happy Birthday!Happy Birthday!
Hadoop Map Reduce
(Map + reduce)
Presentation on:Presentation on:
Why MapReduce?Why MapReduce?
• Large scale data processing was difficult!
 Managing hundreds or thousands of processors
 Managing parallelization and distribution
 Reliable execution with easy data access
MapReduce provides all of these, easily!
What is Hadoop MapReduce?What is Hadoop MapReduce?
Hadoop ClusterHadoop Cluster HDFS (Physical)HDFS (Physical) StorageStorage
MapReduce ObjectsMapReduce Objects
How Map and Reduce WorkHow Map and Reduce Work
TogetherTogether
Hadoop MapReduce: A Closer LookHadoop MapReduce: A Closer Look
file
file
InputFormat
Split Split Split
RR RR RR
Map Map Map
Input (K, V) pairs
Partitioner
Intermediate (K, V) pairs
Sort
Reduce
OutputFormat
Files loaded from local HDFS store
RecordReaders
Final (K, V) pairs
Writeback to local
HDFS store
file
file
InputFormat
Split Split Split
RR RR RR
Map Map Map
Input (K, V) pairs
Partitioner
Intermediate (K, V) pairs
Sort
Reduce
OutputFormat
Files loaded from local HDFS store
RecordReaders
Final (K, V) pairs
Writeback to local
HDFS store
Node 1 Node 2
Shuffling
Process
Intermediate
(K,V) pairs
exchanged by
all nodes
AlgorithmAlgorithm
map(key, value):
// key: document name; value: text of document
for each word w in value:
emit(w, 1)
reduce(key, values):
// key: a word; values: an iterator over counts
result = 0
for each count v in values:
result += v
emit(key,result)
map(key=url, val=contents):
for each word w in contents:
emit (w, “1”)
reduce(key=word, values=uniq_counts):
//Sum all “1”s in values list
emit result “(word, sum)”
The very famous:The very famous:
Word Count ExampleWord Count Example
Ways to MapReduceWays to MapReduce
Libraries Languages
Note: Java is most common, but other languages can be used
Common Data Sources forCommon Data Sources for
MapReduce JobsMapReduce Jobs
Service ProvidersService Providers
• Open Source
o Apache
• Commercial
o Cloudera
o Hortonworks
o MapR
o AWS MapReduce
o Microsoft HDInsight (Beta)
Advancements:Advancements:
MRV1 & MRV2MRV1 & MRV2
MRV2 (MAPREDUCE VERSION 2)
•Splits the existing JobTracker’s roles
o Resource management
o Job lifecycle management
•MapReduce 2.0 provides many benefits over the existing
MapReduce framework:
o Better scalability
o Through distributed job lifecycle management
o Support for multiple Hadoop MapReduce API versions in a
single cluster
Better MapReduce - OptimizationsBetter MapReduce - Optimizations
Advantages of MapReduceAdvantages of MapReduce
• Distributed data and computation.
• Tasks are independent. Entire nodes can fail and restart.
• Linear scaling in the idle case. It’s used to design cheap
commodity, hardware.
• Simple programming model. The end-user programmer
only writes map reduce task.
Disadvantages/ Cases where MR isn’tDisadvantages/ Cases where MR isn’t
a suitable choice:a suitable choice:
• Real time processing
• It is not always very easy to implement each and every
thing as a map reduce program
• When your intermediate processes need to talk to each
other
• When your processing requires lot of data to be shuffled
over the network
• When you need to handle streaming data. MR is best suited
to batch process huge amount of data which you already
have
Limitations of MapReduceLimitations of MapReduce
RDBMS vs. HadoopRDBMS vs. Hadoop
Traditional RDBMS Hadoop / MapReduce
Data Size Gigabytes (Terabytes) Petabytes (Hexabytes)
Access Interactive and Batch Batch – NOT Interactive
Updates Read / Write many times Write once, Read many times
Structure Static Schema Dynamic Schema
Integrity High (ACID) Low
Scaling Nonlinear Linear
Query Response
Time
Can be near immediate Has latency (due to batch
processing)
ReferencesReferences
• J. Dean and S. Ghemawat. “MapReduce: Simplified Data
Processing on Large Clusters.” Proceedings of the 6th
Symposium on Operating System Design and Implementation
(OSDI 2004), pages 137-150. 2004.
• S. Ghemawat, H. Gobioff, and S.-T. Leung. “The Google File
System.” OSDI 200?
• http://hadoop.apache.org/common/docs/current/mapred_tutori
al.html. “Map/Reduce Tutorial”. Fetched January 21, 2010.
• Tom White. Hadoop: The Definitive Guide. O'Reilly Media.
June 5, 2009
• http://developer.yahoo.com/hadoop/tutorial/module4.html
• J. Lin and C. Dyer. Data-Intensive Text Processing with
MapReduce, Book Draft. February 7, 2010.
Thank You!!Thank You!!

More Related Content

What's hot

MapReduce basic
MapReduce basicMapReduce basic
MapReduce basic
Chirag Ahuja
 
Big data analytics using R
Big data analytics using RBig data analytics using R
Big data analytics using R
Karthik Padmanabhan ( MLE℠)
 
Geek camp
Geek campGeek camp
Geek camp
jdhok
 
Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview
Senthil Kumar
 
Dumbo Hadoop Streaming Made Elegant And Easy Klaas Bosteels
Dumbo Hadoop Streaming Made Elegant And Easy Klaas BosteelsDumbo Hadoop Streaming Made Elegant And Easy Klaas Bosteels
Dumbo Hadoop Streaming Made Elegant And Easy Klaas Bosteels
George Ang
 
Hadoop course curriculm
Hadoop course curriculm Hadoop course curriculm
Hadoop course curriculm
alogarg
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
Carol McDonald
 
Non-Relational Revolution
Non-Relational RevolutionNon-Relational Revolution
Non-Relational Revolution
Amazon Web Services
 
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Yahoo Developer Network
 
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranThe Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
MapR Technologies
 
Hadoop & distributed cloud computing
Hadoop & distributed cloud computingHadoop & distributed cloud computing
Hadoop & distributed cloud computing
Rajan Kumar Upadhyay
 
writing Hadoop Map Reduce programs
writing Hadoop Map Reduce programswriting Hadoop Map Reduce programs
writing Hadoop Map Reduce programs
jani shaik
 
Analysing of big data using map reduce
Analysing of big data using map reduceAnalysing of big data using map reduce
Analysing of big data using map reduce
Paladion Networks
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
Nisanth Simon
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
Dilip Reddy
 
Scalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worldsScalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worlds
DataWorks Summit
 
Tutorial5
Tutorial5Tutorial5
Data scientist a perfect job
Data scientist a perfect jobData scientist a perfect job
Data scientist a perfect job
Sidharth Raj Agarwal
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
Bhupesh Chawda
 

What's hot (19)

MapReduce basic
MapReduce basicMapReduce basic
MapReduce basic
 
Big data analytics using R
Big data analytics using RBig data analytics using R
Big data analytics using R
 
Geek camp
Geek campGeek camp
Geek camp
 
Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview
 
Dumbo Hadoop Streaming Made Elegant And Easy Klaas Bosteels
Dumbo Hadoop Streaming Made Elegant And Easy Klaas BosteelsDumbo Hadoop Streaming Made Elegant And Easy Klaas Bosteels
Dumbo Hadoop Streaming Made Elegant And Easy Klaas Bosteels
 
Hadoop course curriculm
Hadoop course curriculm Hadoop course curriculm
Hadoop course curriculm
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Non-Relational Revolution
Non-Relational RevolutionNon-Relational Revolution
Non-Relational Revolution
 
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
 
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranThe Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
 
Hadoop & distributed cloud computing
Hadoop & distributed cloud computingHadoop & distributed cloud computing
Hadoop & distributed cloud computing
 
writing Hadoop Map Reduce programs
writing Hadoop Map Reduce programswriting Hadoop Map Reduce programs
writing Hadoop Map Reduce programs
 
Analysing of big data using map reduce
Analysing of big data using map reduceAnalysing of big data using map reduce
Analysing of big data using map reduce
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
Scalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worldsScalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worlds
 
Tutorial5
Tutorial5Tutorial5
Tutorial5
 
Data scientist a perfect job
Data scientist a perfect jobData scientist a perfect job
Data scientist a perfect job
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 

Similar to Hadoop MapReduce

Integrating R & Hadoop - Text Mining & Sentiment Analysis
Integrating R & Hadoop - Text Mining & Sentiment AnalysisIntegrating R & Hadoop - Text Mining & Sentiment Analysis
Integrating R & Hadoop - Text Mining & Sentiment Analysis
Aravind Babu
 
Big data
Big dataBig data
Big data
rajsandhu1989
 
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Cloudera, Inc.
 
Map reducecloudtech
Map reducecloudtechMap reducecloudtech
Map reducecloudtech
Jakir Hossain
 
Learning How to Learn Hadoop
Learning How to Learn HadoopLearning How to Learn Hadoop
Learning How to Learn Hadoop
Silicon Halton
 
The Powerful Marriage of Hadoop and R (David Champagne)
The Powerful Marriage of Hadoop and R (David Champagne)The Powerful Marriage of Hadoop and R (David Champagne)
The Powerful Marriage of Hadoop and R (David Champagne)
Revolution Analytics
 
Getting Started on Hadoop
Getting Started on HadoopGetting Started on Hadoop
Getting Started on Hadoop
Paco Nathan
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Andrey Vykhodtsev
 
Python in big data world
Python in big data worldPython in big data world
Python in big data world
Rohit
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
Hitendra Kumar
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
nzhang
 
Hackathon bonn
Hackathon bonnHackathon bonn
Hackathon bonn
Emil Andreas Siemes
 
Big Data Processing
Big Data ProcessingBig Data Processing
Big Data Processing
Michael Ming Lei
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
Andrew Brust
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthel
t3rmin4t0r
 
Hadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopHadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoop
Victoria López
 
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsightEnterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
Paco Nathan
 
Apache Hadoop: DFS and Map Reduce
Apache Hadoop: DFS and Map ReduceApache Hadoop: DFS and Map Reduce
Apache Hadoop: DFS and Map Reduce
Victor Sanchez Anguix
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
gothicane
 
Big Data - HDInsight and Power BI
Big Data - HDInsight and Power BIBig Data - HDInsight and Power BI
Big Data - HDInsight and Power BI
Prasad Prabhu (PP)
 

Similar to Hadoop MapReduce (20)

Integrating R & Hadoop - Text Mining & Sentiment Analysis
Integrating R & Hadoop - Text Mining & Sentiment AnalysisIntegrating R & Hadoop - Text Mining & Sentiment Analysis
Integrating R & Hadoop - Text Mining & Sentiment Analysis
 
Big data
Big dataBig data
Big data
 
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
 
Map reducecloudtech
Map reducecloudtechMap reducecloudtech
Map reducecloudtech
 
Learning How to Learn Hadoop
Learning How to Learn HadoopLearning How to Learn Hadoop
Learning How to Learn Hadoop
 
The Powerful Marriage of Hadoop and R (David Champagne)
The Powerful Marriage of Hadoop and R (David Champagne)The Powerful Marriage of Hadoop and R (David Champagne)
The Powerful Marriage of Hadoop and R (David Champagne)
 
Getting Started on Hadoop
Getting Started on HadoopGetting Started on Hadoop
Getting Started on Hadoop
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
 
Python in big data world
Python in big data worldPython in big data world
Python in big data world
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
Hackathon bonn
Hackathon bonnHackathon bonn
Hackathon bonn
 
Big Data Processing
Big Data ProcessingBig Data Processing
Big Data Processing
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthel
 
Hadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopHadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoop
 
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsightEnterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
 
Apache Hadoop: DFS and Map Reduce
Apache Hadoop: DFS and Map ReduceApache Hadoop: DFS and Map Reduce
Apache Hadoop: DFS and Map Reduce
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Big Data - HDInsight and Power BI
Big Data - HDInsight and Power BIBig Data - HDInsight and Power BI
Big Data - HDInsight and Power BI
 

Recently uploaded

Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 

Recently uploaded (20)

Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 

Hadoop MapReduce

  • 1. SEMINAR ONSEMINAR ON Android App DevelopmentAndroid App Development Trained by-Trained by- Hewlett-Packard Education Services,Hewlett-Packard Education Services, MumbaiMumbai Presented to- Mr. R.K. Banyal By- Mr. Hukum Chand Saini Urvashi Kataria
  • 2. About HPES:About HPES: • American global IT company headquartered in Palo- Alto, California, US. • Provider of products, soft wares, technologies, solutions and services to individual as well as small & medium sized business. • Major operations include- HP Software, HP Financial Services & Corporate Investments • Provides practical training in fields like Big Data, Android App Dev, Embedded Systems etc.
  • 3. An android application that allows you to enjoy your as well as your dear ones birthday. Save the days, get reminded of them, capture moments on the day itself, get greeted by the app, and celebrate!! About Birthday Bash:About Birthday Bash:
  • 4. The home screen:The home screen:
  • 5. Calculating age and further:Calculating age and further:
  • 6. Saving name for specified date:Saving name for specified date:
  • 8. Hadoop Map Reduce (Map + reduce) Presentation on:Presentation on:
  • 9. Why MapReduce?Why MapReduce? • Large scale data processing was difficult!  Managing hundreds or thousands of processors  Managing parallelization and distribution  Reliable execution with easy data access MapReduce provides all of these, easily!
  • 10. What is Hadoop MapReduce?What is Hadoop MapReduce?
  • 11. Hadoop ClusterHadoop Cluster HDFS (Physical)HDFS (Physical) StorageStorage
  • 13. How Map and Reduce WorkHow Map and Reduce Work TogetherTogether
  • 14. Hadoop MapReduce: A Closer LookHadoop MapReduce: A Closer Look file file InputFormat Split Split Split RR RR RR Map Map Map Input (K, V) pairs Partitioner Intermediate (K, V) pairs Sort Reduce OutputFormat Files loaded from local HDFS store RecordReaders Final (K, V) pairs Writeback to local HDFS store file file InputFormat Split Split Split RR RR RR Map Map Map Input (K, V) pairs Partitioner Intermediate (K, V) pairs Sort Reduce OutputFormat Files loaded from local HDFS store RecordReaders Final (K, V) pairs Writeback to local HDFS store Node 1 Node 2 Shuffling Process Intermediate (K,V) pairs exchanged by all nodes
  • 15. AlgorithmAlgorithm map(key, value): // key: document name; value: text of document for each word w in value: emit(w, 1) reduce(key, values): // key: a word; values: an iterator over counts result = 0 for each count v in values: result += v emit(key,result) map(key=url, val=contents): for each word w in contents: emit (w, “1”) reduce(key=word, values=uniq_counts): //Sum all “1”s in values list emit result “(word, sum)”
  • 16. The very famous:The very famous: Word Count ExampleWord Count Example
  • 17. Ways to MapReduceWays to MapReduce Libraries Languages Note: Java is most common, but other languages can be used
  • 18. Common Data Sources forCommon Data Sources for MapReduce JobsMapReduce Jobs
  • 19. Service ProvidersService Providers • Open Source o Apache • Commercial o Cloudera o Hortonworks o MapR o AWS MapReduce o Microsoft HDInsight (Beta)
  • 20. Advancements:Advancements: MRV1 & MRV2MRV1 & MRV2 MRV2 (MAPREDUCE VERSION 2) •Splits the existing JobTracker’s roles o Resource management o Job lifecycle management •MapReduce 2.0 provides many benefits over the existing MapReduce framework: o Better scalability o Through distributed job lifecycle management o Support for multiple Hadoop MapReduce API versions in a single cluster
  • 21. Better MapReduce - OptimizationsBetter MapReduce - Optimizations
  • 22. Advantages of MapReduceAdvantages of MapReduce • Distributed data and computation. • Tasks are independent. Entire nodes can fail and restart. • Linear scaling in the idle case. It’s used to design cheap commodity, hardware. • Simple programming model. The end-user programmer only writes map reduce task.
  • 23. Disadvantages/ Cases where MR isn’tDisadvantages/ Cases where MR isn’t a suitable choice:a suitable choice: • Real time processing • It is not always very easy to implement each and every thing as a map reduce program • When your intermediate processes need to talk to each other • When your processing requires lot of data to be shuffled over the network • When you need to handle streaming data. MR is best suited to batch process huge amount of data which you already have
  • 25. RDBMS vs. HadoopRDBMS vs. Hadoop Traditional RDBMS Hadoop / MapReduce Data Size Gigabytes (Terabytes) Petabytes (Hexabytes) Access Interactive and Batch Batch – NOT Interactive Updates Read / Write many times Write once, Read many times Structure Static Schema Dynamic Schema Integrity High (ACID) Low Scaling Nonlinear Linear Query Response Time Can be near immediate Has latency (due to batch processing)
  • 26. ReferencesReferences • J. Dean and S. Ghemawat. “MapReduce: Simplified Data Processing on Large Clusters.” Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI 2004), pages 137-150. 2004. • S. Ghemawat, H. Gobioff, and S.-T. Leung. “The Google File System.” OSDI 200? • http://hadoop.apache.org/common/docs/current/mapred_tutori al.html. “Map/Reduce Tutorial”. Fetched January 21, 2010. • Tom White. Hadoop: The Definitive Guide. O'Reilly Media. June 5, 2009 • http://developer.yahoo.com/hadoop/tutorial/module4.html • J. Lin and C. Dyer. Data-Intensive Text Processing with MapReduce, Book Draft. February 7, 2010.

Editor's Notes

  1. http://www.dataspora.com/2011/04/pigs-bees-and-elephants-a-comparison-of-eight-mapreduce-languages/
  2. Original Reference: Tom White’s Hadoop: The Definitive Guide (I made some modifications based on my experience)