Apache Hadoop MapReduce
● What is it ?
● Why use it ?
● How does it work
● Some examples
● Big users
MapReduce – What is it ?
● Processing engine of Hadoop
● Developers create Map and Reduce jobs
● Used for big data batch p...
MapReduce – Why use it ?
● Your data in Terabyte / Petabyte range
● You have huge I/O
● Hadoop framework takes care of
– J...
MapReduce – How does it work ?
Take word counting as an example, something that Google does
all of the time.
MapReduce – How does it work ?
● Input data split into shards
● Split data mapped to key,value pairs i.e. Bear,1
● Mapped ...
MapReduce - Some examples
A visual example with colours to show you the cycle
Split -> Map -> Shuffle -> Reduce
MapReduce - Some examples
A visual example of MapReduce with job and task trackers added to
individual map and reduce jobs.
Hadoop MapReduce – Big users
● Users
– Facebook
– Yahoo
– Amazon
– Ebay
● Providers
– Amazon
– Cloudera
– HortonWorks
– Ma...
Contact Us
● Feel free to contact us at
– www.semtech-solutions.co.nz
– info@semtech-solutions.co.nz
● We offer IT project...
Contact Us
● Feel free to contact us at
– www.semtech-solutions.co.nz
– info@semtech-solutions.co.nz
● We offer IT project...
Upcoming SlideShare
Loading in …5
×

An Introduction to Apache Hadoop MapReduce

1,096 views

Published on

An Introduction to Apache Hadoop MapReduce, what is it and how does
it work ? What is the map reduce cycle and how are jobs managed.
Why should it be used and who are big users and providers ?

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,096
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
31
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

An Introduction to Apache Hadoop MapReduce

  1. 1. Apache Hadoop MapReduce ● What is it ? ● Why use it ? ● How does it work ● Some examples ● Big users
  2. 2. MapReduce – What is it ? ● Processing engine of Hadoop ● Developers create Map and Reduce jobs ● Used for big data batch processing ● Parallel processing of huge data volumes ● Fault tolerant ● Scalable
  3. 3. MapReduce – Why use it ? ● Your data in Terabyte / Petabyte range ● You have huge I/O ● Hadoop framework takes care of – Job and task management – Failures – Storage – Replication ● You just write Map and Reduce jobs
  4. 4. MapReduce – How does it work ? Take word counting as an example, something that Google does all of the time.
  5. 5. MapReduce – How does it work ? ● Input data split into shards ● Split data mapped to key,value pairs i.e. Bear,1 ● Mapped data shuffled/sorted by key i.e. Bear ● Sorted data reduced i.e. Bear, 2 ● Final data stored on HDFS ● There might be extra map layer before shuffle ● JobTracker controls all tasks in job ● TaskTracker controls map and reduce
  6. 6. MapReduce - Some examples A visual example with colours to show you the cycle Split -> Map -> Shuffle -> Reduce
  7. 7. MapReduce - Some examples A visual example of MapReduce with job and task trackers added to individual map and reduce jobs.
  8. 8. Hadoop MapReduce – Big users ● Users – Facebook – Yahoo – Amazon – Ebay ● Providers – Amazon – Cloudera – HortonWorks – MapR
  9. 9. Contact Us ● Feel free to contact us at – www.semtech-solutions.co.nz – info@semtech-solutions.co.nz ● We offer IT project consultancy ● We are happy to hear about your problems ● You can just pay for those hours that you need ● To solve your problems
  10. 10. Contact Us ● Feel free to contact us at – www.semtech-solutions.co.nz – info@semtech-solutions.co.nz ● We offer IT project consultancy ● We are happy to hear about your problems ● You can just pay for those hours that you need ● To solve your problems

×