SlideShare a Scribd company logo
Running MapReduce
Programs in Clouds
-Anshul Aggarwal
Cisco Systems
Cloud Computing….Mapreduce
…..Hadoop…..
What is MapReduce?
• Simple data-parallel programming model designed for
scalability and fault-tolerance
• Pioneered by Google
• Processes 20 petabytes of data per day
• Popularized by open-source Hadoop project
• Used at Yahoo!, Facebook, Amazon, …
Why MapReduce Optimization
Outline
• Cloud And MapReduce
• MapReduce architecture
• Example applications
• Getting started with Hadoop
• Tuning MapReduce
Cloud Computing
• The emergence of cloud computing
has made a tremendous impact on
the Information Technology (IT) industry
• Cloud computing moved away from personal computers and
the individual enterprise application server to services
provided by the cloud of computers
• The resources like CPU and storage are provided as general
utilities to the users on-demand based through internet
• Cloud computing is in initial stages, with many issues still to
be addressed.
CLOUD COMPUTING SERVICES
Outline
• Cloud And MapReduce
• MapReduce architecture
• Example applications
• Getting started with Hadoop
• Tuning MapReduce
Mapreduce
Framework
MapReduce History
• Historically, data processing was completely done using
database technologies. Most of the data had a well-defined
structure and was often stored in relational databases
• Data soon reached terabytes and then petabytes
• Google developed a new programming model called
MapReduce to handle large-scale data analysis,and later they
introduced the model through their seminal paper
MapReduce: Simplified Data Processing on Large Clusters.
What the paper says
Example: Facebook Lexicon
www.facebook.com/lexicon
What is MapReduce used for?
• At Google:
• Index construction for Google Search
• Article clustering for Google News
• Statistical machine translation
• At Yahoo!:
• “Web map” powering Yahoo! Search
• Spam detection for Yahoo! Mail
• At Facebook:
• Data mining
• Ad optimization
• Spam detection
MapReduce Framework
• computing paradigm for processing data that resides on hundreds of
computers
• popularized recently by Google, Hadoop, and many others
• more of a framework
• makes problem solving easier and harder
• inter-cluster network utilization
• performance of a job that will be distributed
• published by Google without any actual source code
MapReduce Terminology
Outline
• Cloud And MapReduce
• MapReduce Basics
• Example applications
• Getting started with Hadoop
• Tuning MapReduce
Word Count -"Hello World" of
MapReduce world.
• The word count job accepts an input directory, a mapper
function, and a reducer function as inputs.
• We use the mapper function to process the data in parallel,
and we use the reducer function to collect results of the
mapper and produce the final results.
• Mapper sends its results to reducer using a key-value based
model.
• $bin/hadoop -cp hadoop-microbook.jar
microbook.wordcount. WordCount amazon-meta.txt
wordcount-output1
WorkFlow
Example : Word Count
19Map
Tasks
Reduce
Tasks
• Job: Count the occurrences of each word in a data set
Outline
• Cloud And MapReduce
• MapReduce Basics
• Example applications
• Mapreduce Architecture
• Getting started with Hadoop
• Tuning MapReduce
How Mapreduce Works
At the highest level, there are four independent entities:
• The client, which submits the MapReduce job.
• The jobtracker, which coordinates the job run. The jobtracker
is a Java application whose main class is JobTracker.
• The tasktrackers, which run the tasks that the job has been
split into.
• The distributed filesystem (normally HDFS), which is used
for sharing job files between the other entities.
Anatomy of a Mapreduce Job
Developing a MapReduce Application
• The Configuration API
Configuration conf = new Configuration();
conf.addResource("configuration-1.xml");
conf.addResource("configuration-2.xml");
• GenericOptionsParser, Tool, and ToolRunner
• Writing a Unit Test
• Testing the Driver
• Launching a Job
% hadoop jar hadoop-examples.jar v3.MaxTemperatureDriver -
conf conf/hadoop-cluster.xml  Input/ncdc/all max-temp
• Retrieving the Results
This is where the Magic Happens
public class MaxTemperatureDriver extends Configured implements Tool {
@Override
Job job = new Job(getConf(), "Max temperature");
job.setJarByClass(getClass());
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(MaxTemperatureMapper.class);
job.setCombinerClass(MaxTemperatureReducer.class);
job.setReducerClass(MaxTemperatureReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new MaxTemperatureDriver(), args);
System.exit(exitCode);
}
}
Configuring Map Reduce params
• <configuration>
• <property>
• <name>mapred.job.tracker</name>
• <value>MASTER_NODE:9001</value>
• </property>
• <property>
• <name>mapred.local.dir</name>
• <value>HADOOP_DATA_DIR/local</value>
• </property>
• <property>
• <name>mapred.tasktracker.map.tasks.maximum</name>
• <value>8</value>
• </property>
• </configuration>
• $bin/hadoop -cp hadoop-microbook.jar microbook.wordcount.
WordCount amazon-meta.txt wordcount-output1
Q & A
Outline
• Cloud And MapReduce
• MapReduce architecture
• Example applications
• Getting started with Hadoop
• Tuning MapReduce
Hadoop Clusters
Inpioneerdaystheyusedoxenforheavypulling,
andwhenoneoxcouldn’tbudgealog,
theydidn’ttryto growalargerox.Weshouldn’tbe
tryingforbiggercomputers,butfor
moresystemsofcomputers.
—GraceHopper
Why Hadoop is able to compete?
30
Scalability (petabytes of data,
thousands of machines)
Database
vs.
Flexibility in accepting all data
formats (no schema)
Commodity inexpensive hardware
Efficient and simple fault-tolerant
mechanism
Performance (tons of indexing,
tuning, data organization tech.)
Features:
- Provenance tracking
- Annotation management
- ….
What is Hadoop
• Hadoop is a software framework for distributed processing of large
datasets across large clusters of computers
• Large datasets  Terabytes or petabytes of data
• Large clusters  hundreds or thousands of nodes
• Hadoop is open-source implementation for Google MapReduce
• HDFS is a filesystem designed for storing very large files with
streaming data access patterns, running on clusters of commodity
hardware
31
What is Hadoop (Cont’d)
• Hadoop framework consists on two main layers
• Distributed file system (HDFS)
• Execution engine (MapReduce)
• Hadoop is designed as a master-slave shared-nothing architecture
32
Design Principles of Hadoop
• Automatic parallelization & distribution
• computation across thousands of nodes and Hidden from the end-user
• Fault tolerance and automatic recovery
• Nodes/tasks will fail and will recover automatically
• Clean and simple programming abstraction
• Users only provide two functions “map” and “reduce”
• Need to process big data
• Commodity hardware
• Large number of low-end cheap machines working in parallel to solve a
computing problem
33
Hardware Specs
• Memory
• RAM
• Total tasks
• No Raid required
• No Blade server
• Dedicated Switch
• Dedicated 1GB line
Who Uses MapReduce/Hadoop
• Google: Inventors of MapReduce computing paradigm
• Yahoo: Developing Hadoop open-source of MapReduce
• IBM, Microsoft, Oracle
• Facebook, Amazon, AOL, NetFlex
• Many others + universities and research labs
• Many enterprises are turning to Hadoop
• Especially applications generating big data
• Web applications, social networks, scientific applications
35
Hadoop:How it Works
• Hadoop implements Google’s MapReduce, using HDFS
• MapReduce divides applications into many small blocks of work.
• HDFS creates multiple replicas of data blocks for reliability, placing them
on compute nodes around the cluster.
• MapReduce can then process the data where it is located.
• Hadoop ‘s target is to run on clusters of the order of 10,000-nodes.
36
SathyaSaiUniversity,Prashanti
Nilayam
WorkFlow
Hadoop: Assumptions
It is written with large clusters of computers in mind and is built
around the following assumptions:
• Hardware will fail.
• Processing will be run in batches.
• Applications that run on HDFS have large data sets.
• It should provide high aggregate data bandwidth
• Applications need a write-once-read-many access model.
• Moving Computation is Cheaper than Moving Data.
• Portability is important.
Complete Overview
Hadoop Distributed File System (HDFS)
40
Centralized namenode
- Maintains metadata info about files
Many datanode (1000s)
- Store the actual data
- Files are divided into blocks
- Each block is replicated N times
(Default = 3)
File F 1 2 3 4 5
Blocks (64 MB)
Main Properties of HDFS
• Large: A HDFS instance may consist of thousands of server
machines, each storing part of the file system’s data
• Replication: Each data block is replicated many times
(default is 3)
• Failure: Failure is the norm rather than exception
• Fault Tolerance: Detection of faults and quick, automatic
recovery from them is a core architectural goal of HDFS
• Namenode is consistently checking Datanodes
41
Outline
• Cloud And MapReduce
• MapReduce architecture
• Example applications
• Getting started with Hadoop
• Tuning MapReduce
Tuning Parameters
Mapping workers to
Processors
• The input data (on HDFS) is stored on the local disks of the machines
in the cluster. HDFS divides each file into 64 MB blocks, and stores
several copies of each block (typically 3 copies) on different
machines.
• The MapReduce master takes the location information of the input
files into account and attempts to schedule a map task on a machine
that contains a replica of the corresponding input data. Failing that, it
attempts to schedule a map task near a replica of that task's input
data. When running large MapReduce operations on a significant
fraction of the workers in a cluster, most input data is read locally and
consumes no network bandwidth.
44
SathyaSaiUniversity,Prashanti
Nilayam
Task Granularity
• The map phase has M pieces and the reduce phase has R pieces.
• M and R should be much larger than the number of worker
machines.
• Having each worker perform many different tasks improves dynamic
load balancing, and also speeds up recovery when a worker fails.
• Larger the M and R, more the decisions the master must make
• R is often constrained by users because the output of each reduce task
ends up in a separate output file.
• Typically, (at Google), M = 200,000 and R = 5,000, using 2,000
worker machines.
45
SathyaSaiUniversity,Prashanti
Nilayam
Speculative Execution – One
approach
• Tasks may be slow for various reasons, including hardware
degradation or software mis-configuration, but the causes
may be hard to detect since the tasks still complete
• successfully, albeit after a longer time than expected. Hadoop
doesn’t try to diagnose and fix slow-running tasks;
• instead, it tries to detect when a task is running slower than
expected and launches another, equivalent, task as a backup.
Problem Statement
The problem at hand is defining a resource provisioning
framework for MapReduce jobs running in a cloud keeping in
mind performance goals such as
Resource utilization with
-optimal number of map and reduce slots
-improvements in execution time
-Highly scalable solution
References
[1] E. Bortnikov, A. Frank, E. Hillel, and S. Rao, “Predicting execution bottlenecks in map-
reduce clusters” In Proc. of the 4th USENIX conference on Hot Topics in Cloud computing,
2012.
[2] R. Buyya, S. K. Garg, and R. N. Calheiros, “SLA-Oriented Resource Provisioning for Cloud
Computing: Challenges, Architecture, and Solutions” In International Conference on Cloud and
Service Computing, 2011.
[3] S. Chaisiri, Bu-Sung Lee, and D. Niyato, “Optimization of Resource Provisioning Cost in
Cloud Computing” in Transactions On Service Computing, Vol. 5, No. 2, IEEE, April-June 2012
[4] L Cherkasova and R.H. Campbell, “Resource Provisioning Framework for MapReduce Jobs
with Performance Goals”, in Middleware 2011, LNCS 7049, pp. 165–186, 2011
[5] J. Dean, and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters”,
Communications of the ACM, Jan 2008
[6] Y. Hu, J. Wong, G. Iszlai, and M. Litoiu, “Resource Provisioning for Cloud Computing” In
Proc. of the 2009 Conference of the Center for Advanced Studies on Collaborative Research,
2009.
[7] K. Kambatla, A. Pathak, and H. Pucha, “Towards optimizing hadoop provisioning in the
cloud in Proc. of the First Workshop on Hot Topics in Cloud Computing, 2009
[8] Kuyoro S. O., Ibikunle F. and Awodele O., “Cloud Computing Security Issues and
Challenges” in International Journal of Computer Networks (IJCN), Vol. 3, Issue 5, 2011
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015

More Related Content

What's hot

An Introduction to MapReduce
An Introduction to MapReduceAn Introduction to MapReduce
An Introduction to MapReduceFrane Bandov
 
Hadoop fault-tolerance
Hadoop fault-toleranceHadoop fault-tolerance
Hadoop fault-tolerance
Ravindra Bandara
 
Application of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingApplication of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingMohammad Mustaqeem
 
Large Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part ILarge Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part IMarin Dimitrov
 
Hadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopHadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoop
Victoria López
 
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialApache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce Tutorial
Farzad Nozarian
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerancePallav Jha
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examples
Andrea Iacono
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceMahantesh Angadi
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
Vigen Sahakyan
 
Map reduce paradigm explained
Map reduce paradigm explainedMap reduce paradigm explained
Map reduce paradigm explainedDmytro Sandu
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewNisanth Simon
 
The google MapReduce
The google MapReduceThe google MapReduce
The google MapReduce
Romain Jacotin
 
MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvement
Kyong-Ha Lee
 
Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]
Lu Wei
 

What's hot (20)

An Introduction to MapReduce
An Introduction to MapReduceAn Introduction to MapReduce
An Introduction to MapReduce
 
Hadoop fault-tolerance
Hadoop fault-toleranceHadoop fault-tolerance
Hadoop fault-tolerance
 
Application of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingApplication of MapReduce in Cloud Computing
Application of MapReduce in Cloud Computing
 
Large Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part ILarge Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part I
 
Hadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopHadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoop
 
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialApache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce Tutorial
 
Unit 1
Unit 1Unit 1
Unit 1
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
 
Hadoop-Introduction
Hadoop-IntroductionHadoop-Introduction
Hadoop-Introduction
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examples
 
Hadoop 2
Hadoop 2Hadoop 2
Hadoop 2
 
Pig Experience
Pig ExperiencePig Experience
Pig Experience
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Map reduce paradigm explained
Map reduce paradigm explainedMap reduce paradigm explained
Map reduce paradigm explained
 
MapReduce in Cloud Computing
MapReduce in Cloud ComputingMapReduce in Cloud Computing
MapReduce in Cloud Computing
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
 
The google MapReduce
The google MapReduceThe google MapReduce
The google MapReduce
 
MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvement
 
Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]
 

Viewers also liked

Getting Started on Hadoop
Getting Started on HadoopGetting Started on Hadoop
Getting Started on Hadoop
Paco Nathan
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
Lynn Langit
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reducerantav
 
Aggarwal Draft
Aggarwal DraftAggarwal Draft
Aggarwal Draft
Deanna Kosaraju
 
70a monitoring & troubleshooting
70a monitoring & troubleshooting70a monitoring & troubleshooting
70a monitoring & troubleshootingmapr-academy
 
Bfit for healthcare - A Document Management System for Healthcare Industry
Bfit for healthcare - A Document Management System for Healthcare IndustryBfit for healthcare - A Document Management System for Healthcare Industry
Bfit for healthcare - A Document Management System for Healthcare Industry
Globalsion Software Sdn Bhd
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed Debugging
Great Wide Open
 
Why Are Change Management And Metrics Such Crucial Aspects To Your Overall De...
Why Are Change Management And Metrics Such Crucial Aspects To Your Overall De...Why Are Change Management And Metrics Such Crucial Aspects To Your Overall De...
Why Are Change Management And Metrics Such Crucial Aspects To Your Overall De...
AIIM International
 
DMAvatar
DMAvatarDMAvatar
DMAvatar
HGTechSolutions
 
Technology Investment for Mutual Insurance Companies
Technology Investment for Mutual Insurance CompaniesTechnology Investment for Mutual Insurance Companies
Technology Investment for Mutual Insurance CompaniesChris Reynolds
 
A Practical Guide to Capturing, Organizing, and Securing Your Documents
A Practical Guide to Capturing, Organizing, and Securing Your DocumentsA Practical Guide to Capturing, Organizing, and Securing Your Documents
A Practical Guide to Capturing, Organizing, and Securing Your Documents
Scott Abel
 
MapReduce for Idiots
MapReduce for IdiotsMapReduce for Idiots
MapReduce for Idiots
petewarden
 
Introduction to MapReduce Data Transformations
Introduction to MapReduce Data TransformationsIntroduction to MapReduce Data Transformations
Introduction to MapReduce Data Transformations
swooledge
 
Alfresco As SharePoint Alternative - Architecture Overview
Alfresco As SharePoint Alternative - Architecture OverviewAlfresco As SharePoint Alternative - Architecture Overview
Alfresco As SharePoint Alternative - Architecture Overview
Alfresco Software
 
The Chief Data Officer Agenda: Metrics for Information and Data Management
The Chief Data Officer Agenda: Metrics for Information and Data ManagementThe Chief Data Officer Agenda: Metrics for Information and Data Management
The Chief Data Officer Agenda: Metrics for Information and Data Management
DATAVERSITY
 
Scale your Alfresco Solutions
Scale your Alfresco Solutions Scale your Alfresco Solutions
Scale your Alfresco Solutions
Alfresco Software
 
Intro To Alfresco Part 1
Intro To Alfresco Part 1Intro To Alfresco Part 1
Intro To Alfresco Part 1
Jeff Potts
 
EDRMS Pre implementation project plan
EDRMS Pre implementation project planEDRMS Pre implementation project plan
EDRMS Pre implementation project planDonna_Maree_Findlay
 
Alfresco 5.2 REST API
Alfresco 5.2 REST APIAlfresco 5.2 REST API
Alfresco 5.2 REST API
J V
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with Hadoop
OReillyStrata
 

Viewers also liked (20)

Getting Started on Hadoop
Getting Started on HadoopGetting Started on Hadoop
Getting Started on Hadoop
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
Aggarwal Draft
Aggarwal DraftAggarwal Draft
Aggarwal Draft
 
70a monitoring & troubleshooting
70a monitoring & troubleshooting70a monitoring & troubleshooting
70a monitoring & troubleshooting
 
Bfit for healthcare - A Document Management System for Healthcare Industry
Bfit for healthcare - A Document Management System for Healthcare IndustryBfit for healthcare - A Document Management System for Healthcare Industry
Bfit for healthcare - A Document Management System for Healthcare Industry
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed Debugging
 
Why Are Change Management And Metrics Such Crucial Aspects To Your Overall De...
Why Are Change Management And Metrics Such Crucial Aspects To Your Overall De...Why Are Change Management And Metrics Such Crucial Aspects To Your Overall De...
Why Are Change Management And Metrics Such Crucial Aspects To Your Overall De...
 
DMAvatar
DMAvatarDMAvatar
DMAvatar
 
Technology Investment for Mutual Insurance Companies
Technology Investment for Mutual Insurance CompaniesTechnology Investment for Mutual Insurance Companies
Technology Investment for Mutual Insurance Companies
 
A Practical Guide to Capturing, Organizing, and Securing Your Documents
A Practical Guide to Capturing, Organizing, and Securing Your DocumentsA Practical Guide to Capturing, Organizing, and Securing Your Documents
A Practical Guide to Capturing, Organizing, and Securing Your Documents
 
MapReduce for Idiots
MapReduce for IdiotsMapReduce for Idiots
MapReduce for Idiots
 
Introduction to MapReduce Data Transformations
Introduction to MapReduce Data TransformationsIntroduction to MapReduce Data Transformations
Introduction to MapReduce Data Transformations
 
Alfresco As SharePoint Alternative - Architecture Overview
Alfresco As SharePoint Alternative - Architecture OverviewAlfresco As SharePoint Alternative - Architecture Overview
Alfresco As SharePoint Alternative - Architecture Overview
 
The Chief Data Officer Agenda: Metrics for Information and Data Management
The Chief Data Officer Agenda: Metrics for Information and Data ManagementThe Chief Data Officer Agenda: Metrics for Information and Data Management
The Chief Data Officer Agenda: Metrics for Information and Data Management
 
Scale your Alfresco Solutions
Scale your Alfresco Solutions Scale your Alfresco Solutions
Scale your Alfresco Solutions
 
Intro To Alfresco Part 1
Intro To Alfresco Part 1Intro To Alfresco Part 1
Intro To Alfresco Part 1
 
EDRMS Pre implementation project plan
EDRMS Pre implementation project planEDRMS Pre implementation project plan
EDRMS Pre implementation project plan
 
Alfresco 5.2 REST API
Alfresco 5.2 REST APIAlfresco 5.2 REST API
Alfresco 5.2 REST API
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with Hadoop
 

Similar to Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015

project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2aswini pilli
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2Aswini Ashu
 
Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecture
saipriyacoool
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
York University
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
Sandeep Singh
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to Hadoop
GERARDO BARBERENA
 
Hadoop - Introduction to HDFS
Hadoop - Introduction to HDFSHadoop - Introduction to HDFS
Hadoop - Introduction to HDFS
Vibrant Technologies & Computers
 
002 Introduction to hadoop v3
002   Introduction to hadoop v3002   Introduction to hadoop v3
002 Introduction to hadoop v3
Dendej Sawarnkatat
 
Apache Spark
Apache SparkApache Spark
Apache Spark
SugumarSarDurai
 
Learn what is Hadoop-and-BigData
Learn  what is Hadoop-and-BigDataLearn  what is Hadoop-and-BigData
Learn what is Hadoop-and-BigData
Thanusha154
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreduce
hansen3032
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
Csaba Toth
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
Bikas Saha
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
Hortonworks
 
Hadoop live online training
Hadoop live online trainingHadoop live online training
Hadoop live online training
Harika583
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
Arvind Kumar
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
Hortonworks
 
Hadoop: A distributed framework for Big Data
Hadoop: A distributed framework for Big DataHadoop: A distributed framework for Big Data
Hadoop: A distributed framework for Big Data
Dhanashri Yadav
 

Similar to Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015 (20)

project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecture
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
Hadoop
HadoopHadoop
Hadoop
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to Hadoop
 
Hadoop - Introduction to HDFS
Hadoop - Introduction to HDFSHadoop - Introduction to HDFS
Hadoop - Introduction to HDFS
 
002 Introduction to hadoop v3
002   Introduction to hadoop v3002   Introduction to hadoop v3
002 Introduction to hadoop v3
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
Learn what is Hadoop-and-BigData
Learn  what is Hadoop-and-BigDataLearn  what is Hadoop-and-BigData
Learn what is Hadoop-and-BigData
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreduce
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 
Anju
AnjuAnju
Anju
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
 
Hadoop live online training
Hadoop live online trainingHadoop live online training
Hadoop live online training
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Hadoop: A distributed framework for Big Data
Hadoop: A distributed framework for Big DataHadoop: A distributed framework for Big Data
Hadoop: A distributed framework for Big Data
 

More from Deanna Kosaraju

Speak Out and Change the World! Voices 2015
Speak Out and Change the World!   Voices 2015Speak Out and Change the World!   Voices 2015
Speak Out and Change the World! Voices 2015
Deanna Kosaraju
 
Breaking the Code of Interview Implicit Bias to Value Different Gender Compet...
Breaking the Code of Interview Implicit Bias to Value Different Gender Compet...Breaking the Code of Interview Implicit Bias to Value Different Gender Compet...
Breaking the Code of Interview Implicit Bias to Value Different Gender Compet...
Deanna Kosaraju
 
Change IT! Voices 2015
Change IT! Voices 2015Change IT! Voices 2015
Change IT! Voices 2015
Deanna Kosaraju
 
How Can We Make Interacting With Technology and Science Exciting and Fun Expe...
How Can We Make Interacting With Technology and Science Exciting and Fun Expe...How Can We Make Interacting With Technology and Science Exciting and Fun Expe...
How Can We Make Interacting With Technology and Science Exciting and Fun Expe...
Deanna Kosaraju
 
Measure Impact, Not Activity - Voices 2015
Measure Impact, Not Activity - Voices 2015Measure Impact, Not Activity - Voices 2015
Measure Impact, Not Activity - Voices 2015
Deanna Kosaraju
 
Women’s INpowerment: The First-ever Global Survey to Hear Voice, Value and Vi...
Women’s INpowerment: The First-ever Global Survey to Hear Voice, Value and Vi...Women’s INpowerment: The First-ever Global Survey to Hear Voice, Value and Vi...
Women’s INpowerment: The First-ever Global Survey to Hear Voice, Value and Vi...
Deanna Kosaraju
 
The Language of Leadership - Voices 2015
The Language of Leadership - Voices 2015The Language of Leadership - Voices 2015
The Language of Leadership - Voices 2015
Deanna Kosaraju
 
Mentors and Role Models - Best Practices in Many Cultures - Voices 2015
Mentors and Role Models - Best Practices in Many Cultures - Voices 2015Mentors and Role Models - Best Practices in Many Cultures - Voices 2015
Mentors and Role Models - Best Practices in Many Cultures - Voices 2015
Deanna Kosaraju
 
Panel: Cracking the Glass Ceiling: Growing Female Technology Professionals - ...
Panel: Cracking the Glass Ceiling: Growing Female Technology Professionals - ...Panel: Cracking the Glass Ceiling: Growing Female Technology Professionals - ...
Panel: Cracking the Glass Ceiling: Growing Female Technology Professionals - ...
Deanna Kosaraju
 
Heart Rate Variability and the Digital Health Revolution - Voices 2015
Heart Rate Variability and the Digital Health Revolution - Voices 2015Heart Rate Variability and the Digital Health Revolution - Voices 2015
Heart Rate Variability and the Digital Health Revolution - Voices 2015
Deanna Kosaraju
 
Women and CS, Lessons Learned From Turkey - Voices 2015
Women and CS, Lessons Learned From Turkey - Voices 2015Women and CS, Lessons Learned From Turkey - Voices 2015
Women and CS, Lessons Learned From Turkey - Voices 2015
Deanna Kosaraju
 
Communications Platform Provides "Your School at your Fingertips" for Busy Pa...
Communications Platform Provides "Your School at your Fingertips" for Busy Pa...Communications Platform Provides "Your School at your Fingertips" for Busy Pa...
Communications Platform Provides "Your School at your Fingertips" for Busy Pa...
Deanna Kosaraju
 
ASEAN Women in Tech - Voices 2015
ASEAN Women in Tech - Voices 2015ASEAN Women in Tech - Voices 2015
ASEAN Women in Tech - Voices 2015
Deanna Kosaraju
 
Empowering Women Technology Startup Founders to Succeed - Voices 2015
Empowering Women Technology Startup Founders to Succeed - Voices 2015Empowering Women Technology Startup Founders to Succeed - Voices 2015
Empowering Women Technology Startup Founders to Succeed - Voices 2015
Deanna Kosaraju
 
Innovation a Destination and a Journey - Voices 2015
Innovation a Destination and a Journey - Voices 2015Innovation a Destination and a Journey - Voices 2015
Innovation a Destination and a Journey - Voices 2015
Deanna Kosaraju
 
Agility and Cloud Computing - Voices 2015
Agility and Cloud Computing - Voices 2015Agility and Cloud Computing - Voices 2015
Agility and Cloud Computing - Voices 2015
Deanna Kosaraju
 
The Confidence Gap: Igniting Brilliance through Feminine Leadership - Voices...
The Confidence Gap:  Igniting Brilliance through Feminine Leadership - Voices...The Confidence Gap:  Igniting Brilliance through Feminine Leadership - Voices...
The Confidence Gap: Igniting Brilliance through Feminine Leadership - Voices...
Deanna Kosaraju
 
Business Intelligence Engineering - Voices 2015
Business Intelligence Engineering - Voices 2015Business Intelligence Engineering - Voices 2015
Business Intelligence Engineering - Voices 2015
Deanna Kosaraju
 
Agility and cloud computing
Agility and cloud computingAgility and cloud computing
Agility and cloud computing
Deanna Kosaraju
 
J johnson global tech draft size reduce
J johnson global tech draft size reduceJ johnson global tech draft size reduce
J johnson global tech draft size reduce
Deanna Kosaraju
 

More from Deanna Kosaraju (20)

Speak Out and Change the World! Voices 2015
Speak Out and Change the World!   Voices 2015Speak Out and Change the World!   Voices 2015
Speak Out and Change the World! Voices 2015
 
Breaking the Code of Interview Implicit Bias to Value Different Gender Compet...
Breaking the Code of Interview Implicit Bias to Value Different Gender Compet...Breaking the Code of Interview Implicit Bias to Value Different Gender Compet...
Breaking the Code of Interview Implicit Bias to Value Different Gender Compet...
 
Change IT! Voices 2015
Change IT! Voices 2015Change IT! Voices 2015
Change IT! Voices 2015
 
How Can We Make Interacting With Technology and Science Exciting and Fun Expe...
How Can We Make Interacting With Technology and Science Exciting and Fun Expe...How Can We Make Interacting With Technology and Science Exciting and Fun Expe...
How Can We Make Interacting With Technology and Science Exciting and Fun Expe...
 
Measure Impact, Not Activity - Voices 2015
Measure Impact, Not Activity - Voices 2015Measure Impact, Not Activity - Voices 2015
Measure Impact, Not Activity - Voices 2015
 
Women’s INpowerment: The First-ever Global Survey to Hear Voice, Value and Vi...
Women’s INpowerment: The First-ever Global Survey to Hear Voice, Value and Vi...Women’s INpowerment: The First-ever Global Survey to Hear Voice, Value and Vi...
Women’s INpowerment: The First-ever Global Survey to Hear Voice, Value and Vi...
 
The Language of Leadership - Voices 2015
The Language of Leadership - Voices 2015The Language of Leadership - Voices 2015
The Language of Leadership - Voices 2015
 
Mentors and Role Models - Best Practices in Many Cultures - Voices 2015
Mentors and Role Models - Best Practices in Many Cultures - Voices 2015Mentors and Role Models - Best Practices in Many Cultures - Voices 2015
Mentors and Role Models - Best Practices in Many Cultures - Voices 2015
 
Panel: Cracking the Glass Ceiling: Growing Female Technology Professionals - ...
Panel: Cracking the Glass Ceiling: Growing Female Technology Professionals - ...Panel: Cracking the Glass Ceiling: Growing Female Technology Professionals - ...
Panel: Cracking the Glass Ceiling: Growing Female Technology Professionals - ...
 
Heart Rate Variability and the Digital Health Revolution - Voices 2015
Heart Rate Variability and the Digital Health Revolution - Voices 2015Heart Rate Variability and the Digital Health Revolution - Voices 2015
Heart Rate Variability and the Digital Health Revolution - Voices 2015
 
Women and CS, Lessons Learned From Turkey - Voices 2015
Women and CS, Lessons Learned From Turkey - Voices 2015Women and CS, Lessons Learned From Turkey - Voices 2015
Women and CS, Lessons Learned From Turkey - Voices 2015
 
Communications Platform Provides "Your School at your Fingertips" for Busy Pa...
Communications Platform Provides "Your School at your Fingertips" for Busy Pa...Communications Platform Provides "Your School at your Fingertips" for Busy Pa...
Communications Platform Provides "Your School at your Fingertips" for Busy Pa...
 
ASEAN Women in Tech - Voices 2015
ASEAN Women in Tech - Voices 2015ASEAN Women in Tech - Voices 2015
ASEAN Women in Tech - Voices 2015
 
Empowering Women Technology Startup Founders to Succeed - Voices 2015
Empowering Women Technology Startup Founders to Succeed - Voices 2015Empowering Women Technology Startup Founders to Succeed - Voices 2015
Empowering Women Technology Startup Founders to Succeed - Voices 2015
 
Innovation a Destination and a Journey - Voices 2015
Innovation a Destination and a Journey - Voices 2015Innovation a Destination and a Journey - Voices 2015
Innovation a Destination and a Journey - Voices 2015
 
Agility and Cloud Computing - Voices 2015
Agility and Cloud Computing - Voices 2015Agility and Cloud Computing - Voices 2015
Agility and Cloud Computing - Voices 2015
 
The Confidence Gap: Igniting Brilliance through Feminine Leadership - Voices...
The Confidence Gap:  Igniting Brilliance through Feminine Leadership - Voices...The Confidence Gap:  Igniting Brilliance through Feminine Leadership - Voices...
The Confidence Gap: Igniting Brilliance through Feminine Leadership - Voices...
 
Business Intelligence Engineering - Voices 2015
Business Intelligence Engineering - Voices 2015Business Intelligence Engineering - Voices 2015
Business Intelligence Engineering - Voices 2015
 
Agility and cloud computing
Agility and cloud computingAgility and cloud computing
Agility and cloud computing
 
J johnson global tech draft size reduce
J johnson global tech draft size reduceJ johnson global tech draft size reduce
J johnson global tech draft size reduce
 

Recently uploaded

When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 

Recently uploaded (20)

When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 

Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015

  • 1. Running MapReduce Programs in Clouds -Anshul Aggarwal Cisco Systems
  • 3. What is MapReduce? • Simple data-parallel programming model designed for scalability and fault-tolerance • Pioneered by Google • Processes 20 petabytes of data per day • Popularized by open-source Hadoop project • Used at Yahoo!, Facebook, Amazon, …
  • 5. Outline • Cloud And MapReduce • MapReduce architecture • Example applications • Getting started with Hadoop • Tuning MapReduce
  • 6. Cloud Computing • The emergence of cloud computing has made a tremendous impact on the Information Technology (IT) industry • Cloud computing moved away from personal computers and the individual enterprise application server to services provided by the cloud of computers • The resources like CPU and storage are provided as general utilities to the users on-demand based through internet • Cloud computing is in initial stages, with many issues still to be addressed.
  • 8. Outline • Cloud And MapReduce • MapReduce architecture • Example applications • Getting started with Hadoop • Tuning MapReduce
  • 10. MapReduce History • Historically, data processing was completely done using database technologies. Most of the data had a well-defined structure and was often stored in relational databases • Data soon reached terabytes and then petabytes • Google developed a new programming model called MapReduce to handle large-scale data analysis,and later they introduced the model through their seminal paper MapReduce: Simplified Data Processing on Large Clusters.
  • 13. What is MapReduce used for? • At Google: • Index construction for Google Search • Article clustering for Google News • Statistical machine translation • At Yahoo!: • “Web map” powering Yahoo! Search • Spam detection for Yahoo! Mail • At Facebook: • Data mining • Ad optimization • Spam detection
  • 14. MapReduce Framework • computing paradigm for processing data that resides on hundreds of computers • popularized recently by Google, Hadoop, and many others • more of a framework • makes problem solving easier and harder • inter-cluster network utilization • performance of a job that will be distributed • published by Google without any actual source code
  • 16. Outline • Cloud And MapReduce • MapReduce Basics • Example applications • Getting started with Hadoop • Tuning MapReduce
  • 17. Word Count -"Hello World" of MapReduce world. • The word count job accepts an input directory, a mapper function, and a reducer function as inputs. • We use the mapper function to process the data in parallel, and we use the reducer function to collect results of the mapper and produce the final results. • Mapper sends its results to reducer using a key-value based model. • $bin/hadoop -cp hadoop-microbook.jar microbook.wordcount. WordCount amazon-meta.txt wordcount-output1
  • 19. Example : Word Count 19Map Tasks Reduce Tasks • Job: Count the occurrences of each word in a data set
  • 20. Outline • Cloud And MapReduce • MapReduce Basics • Example applications • Mapreduce Architecture • Getting started with Hadoop • Tuning MapReduce
  • 21. How Mapreduce Works At the highest level, there are four independent entities: • The client, which submits the MapReduce job. • The jobtracker, which coordinates the job run. The jobtracker is a Java application whose main class is JobTracker. • The tasktrackers, which run the tasks that the job has been split into. • The distributed filesystem (normally HDFS), which is used for sharing job files between the other entities.
  • 22. Anatomy of a Mapreduce Job
  • 23. Developing a MapReduce Application • The Configuration API Configuration conf = new Configuration(); conf.addResource("configuration-1.xml"); conf.addResource("configuration-2.xml"); • GenericOptionsParser, Tool, and ToolRunner • Writing a Unit Test • Testing the Driver • Launching a Job % hadoop jar hadoop-examples.jar v3.MaxTemperatureDriver - conf conf/hadoop-cluster.xml Input/ncdc/all max-temp • Retrieving the Results
  • 24. This is where the Magic Happens public class MaxTemperatureDriver extends Configured implements Tool { @Override Job job = new Job(getConf(), "Max temperature"); job.setJarByClass(getClass()); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setMapperClass(MaxTemperatureMapper.class); job.setCombinerClass(MaxTemperatureReducer.class); job.setReducerClass(MaxTemperatureReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); return job.waitForCompletion(true) ? 0 : 1; } public static void main(String[] args) throws Exception { int exitCode = ToolRunner.run(new MaxTemperatureDriver(), args); System.exit(exitCode); } }
  • 25. Configuring Map Reduce params • <configuration> • <property> • <name>mapred.job.tracker</name> • <value>MASTER_NODE:9001</value> • </property> • <property> • <name>mapred.local.dir</name> • <value>HADOOP_DATA_DIR/local</value> • </property> • <property> • <name>mapred.tasktracker.map.tasks.maximum</name> • <value>8</value> • </property> • </configuration> • $bin/hadoop -cp hadoop-microbook.jar microbook.wordcount. WordCount amazon-meta.txt wordcount-output1
  • 26. Q & A
  • 27. Outline • Cloud And MapReduce • MapReduce architecture • Example applications • Getting started with Hadoop • Tuning MapReduce
  • 30. Why Hadoop is able to compete? 30 Scalability (petabytes of data, thousands of machines) Database vs. Flexibility in accepting all data formats (no schema) Commodity inexpensive hardware Efficient and simple fault-tolerant mechanism Performance (tons of indexing, tuning, data organization tech.) Features: - Provenance tracking - Annotation management - ….
  • 31. What is Hadoop • Hadoop is a software framework for distributed processing of large datasets across large clusters of computers • Large datasets  Terabytes or petabytes of data • Large clusters  hundreds or thousands of nodes • Hadoop is open-source implementation for Google MapReduce • HDFS is a filesystem designed for storing very large files with streaming data access patterns, running on clusters of commodity hardware 31
  • 32. What is Hadoop (Cont’d) • Hadoop framework consists on two main layers • Distributed file system (HDFS) • Execution engine (MapReduce) • Hadoop is designed as a master-slave shared-nothing architecture 32
  • 33. Design Principles of Hadoop • Automatic parallelization & distribution • computation across thousands of nodes and Hidden from the end-user • Fault tolerance and automatic recovery • Nodes/tasks will fail and will recover automatically • Clean and simple programming abstraction • Users only provide two functions “map” and “reduce” • Need to process big data • Commodity hardware • Large number of low-end cheap machines working in parallel to solve a computing problem 33
  • 34. Hardware Specs • Memory • RAM • Total tasks • No Raid required • No Blade server • Dedicated Switch • Dedicated 1GB line
  • 35. Who Uses MapReduce/Hadoop • Google: Inventors of MapReduce computing paradigm • Yahoo: Developing Hadoop open-source of MapReduce • IBM, Microsoft, Oracle • Facebook, Amazon, AOL, NetFlex • Many others + universities and research labs • Many enterprises are turning to Hadoop • Especially applications generating big data • Web applications, social networks, scientific applications 35
  • 36. Hadoop:How it Works • Hadoop implements Google’s MapReduce, using HDFS • MapReduce divides applications into many small blocks of work. • HDFS creates multiple replicas of data blocks for reliability, placing them on compute nodes around the cluster. • MapReduce can then process the data where it is located. • Hadoop ‘s target is to run on clusters of the order of 10,000-nodes. 36 SathyaSaiUniversity,Prashanti Nilayam
  • 38. Hadoop: Assumptions It is written with large clusters of computers in mind and is built around the following assumptions: • Hardware will fail. • Processing will be run in batches. • Applications that run on HDFS have large data sets. • It should provide high aggregate data bandwidth • Applications need a write-once-read-many access model. • Moving Computation is Cheaper than Moving Data. • Portability is important.
  • 40. Hadoop Distributed File System (HDFS) 40 Centralized namenode - Maintains metadata info about files Many datanode (1000s) - Store the actual data - Files are divided into blocks - Each block is replicated N times (Default = 3) File F 1 2 3 4 5 Blocks (64 MB)
  • 41. Main Properties of HDFS • Large: A HDFS instance may consist of thousands of server machines, each storing part of the file system’s data • Replication: Each data block is replicated many times (default is 3) • Failure: Failure is the norm rather than exception • Fault Tolerance: Detection of faults and quick, automatic recovery from them is a core architectural goal of HDFS • Namenode is consistently checking Datanodes 41
  • 42. Outline • Cloud And MapReduce • MapReduce architecture • Example applications • Getting started with Hadoop • Tuning MapReduce
  • 44. Mapping workers to Processors • The input data (on HDFS) is stored on the local disks of the machines in the cluster. HDFS divides each file into 64 MB blocks, and stores several copies of each block (typically 3 copies) on different machines. • The MapReduce master takes the location information of the input files into account and attempts to schedule a map task on a machine that contains a replica of the corresponding input data. Failing that, it attempts to schedule a map task near a replica of that task's input data. When running large MapReduce operations on a significant fraction of the workers in a cluster, most input data is read locally and consumes no network bandwidth. 44 SathyaSaiUniversity,Prashanti Nilayam
  • 45. Task Granularity • The map phase has M pieces and the reduce phase has R pieces. • M and R should be much larger than the number of worker machines. • Having each worker perform many different tasks improves dynamic load balancing, and also speeds up recovery when a worker fails. • Larger the M and R, more the decisions the master must make • R is often constrained by users because the output of each reduce task ends up in a separate output file. • Typically, (at Google), M = 200,000 and R = 5,000, using 2,000 worker machines. 45 SathyaSaiUniversity,Prashanti Nilayam
  • 46. Speculative Execution – One approach • Tasks may be slow for various reasons, including hardware degradation or software mis-configuration, but the causes may be hard to detect since the tasks still complete • successfully, albeit after a longer time than expected. Hadoop doesn’t try to diagnose and fix slow-running tasks; • instead, it tries to detect when a task is running slower than expected and launches another, equivalent, task as a backup.
  • 47. Problem Statement The problem at hand is defining a resource provisioning framework for MapReduce jobs running in a cloud keeping in mind performance goals such as Resource utilization with -optimal number of map and reduce slots -improvements in execution time -Highly scalable solution
  • 48. References [1] E. Bortnikov, A. Frank, E. Hillel, and S. Rao, “Predicting execution bottlenecks in map- reduce clusters” In Proc. of the 4th USENIX conference on Hot Topics in Cloud computing, 2012. [2] R. Buyya, S. K. Garg, and R. N. Calheiros, “SLA-Oriented Resource Provisioning for Cloud Computing: Challenges, Architecture, and Solutions” In International Conference on Cloud and Service Computing, 2011. [3] S. Chaisiri, Bu-Sung Lee, and D. Niyato, “Optimization of Resource Provisioning Cost in Cloud Computing” in Transactions On Service Computing, Vol. 5, No. 2, IEEE, April-June 2012 [4] L Cherkasova and R.H. Campbell, “Resource Provisioning Framework for MapReduce Jobs with Performance Goals”, in Middleware 2011, LNCS 7049, pp. 165–186, 2011 [5] J. Dean, and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters”, Communications of the ACM, Jan 2008 [6] Y. Hu, J. Wong, G. Iszlai, and M. Litoiu, “Resource Provisioning for Cloud Computing” In Proc. of the 2009 Conference of the Center for Advanced Studies on Collaborative Research, 2009. [7] K. Kambatla, A. Pathak, and H. Pucha, “Towards optimizing hadoop provisioning in the cloud in Proc. of the First Workshop on Hot Topics in Cloud Computing, 2009 [8] Kuyoro S. O., Ibikunle F. and Awodele O., “Cloud Computing Security Issues and Challenges” in International Journal of Computer Networks (IJCN), Vol. 3, Issue 5, 2011

Editor's Notes

  1. When you run the MapReduce job, Hadoop first reads the input files from the input directory line by line. Then Hadoop invokes the mapper once for each line passing the line as the argument. Subsequently, each mapper parses the line, and extracts words included in the line it received as the input. After processing, the mapper sends the word count to the reducer by emitting the word and word count as name value pairs.
  2. Writing a program in MapReduce has a certain flow to it. You start by writing your map and reduce functions, ideally with unit tests to make sure they do what you expect. Then you write a driver program to run a job, which can run from your IDE using a small subset of the data to check that it is working. If it fails, then you can use your IDE’s debugger to find the source of the problem. With this information, you can expand your unit tests to cover this case and improve your mapper or reducer as appropriate to handle such input correctly. When the program runs as expected against the small dataset, you are ready to unleash it on a cluster. Running against the full dataset is likely to expose some more issues, which you can fix as before, by expanding your tests and mapper or reducer to handle the new cases. Debugging failing programs in the cluster is a challenge, so we look at some common techniques to make it easier.
  3. We solve problems involving large datasets using many computers where we can parallel process the dataset using those computers. However, writing a program that processes a dataset in a distributed setup is a heavy undertaking. The challenges of such a program are shown as follows: Although it is possible to write such a program, it is a waste to write such programs again and again. MapReduce-based frameworks like Hadoop lets users write only the