• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
An Introduction to Apache Hadoop MapReduce
 

An Introduction to Apache Hadoop MapReduce

on

  • 850 views

An Introduction to Apache Hadoop MapReduce, what is it and how does ...

An Introduction to Apache Hadoop MapReduce, what is it and how does
it work ? What is the map reduce cycle and how are jobs managed.
Why should it be used and who are big users and providers ?

Statistics

Views

Total Views
850
Views on SlideShare
850
Embed Views
0

Actions

Likes
1
Downloads
22
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    An Introduction to Apache Hadoop MapReduce An Introduction to Apache Hadoop MapReduce Presentation Transcript

    • Apache Hadoop MapReduce ● What is it ? ● Why use it ? ● How does it work ● Some examples ● Big users
    • MapReduce – What is it ? ● Processing engine of Hadoop ● Developers create Map and Reduce jobs ● Used for big data batch processing ● Parallel processing of huge data volumes ● Fault tolerant ● Scalable
    • MapReduce – Why use it ? ● Your data in Terabyte / Petabyte range ● You have huge I/O ● Hadoop framework takes care of – Job and task management – Failures – Storage – Replication ● You just write Map and Reduce jobs
    • MapReduce – How does it work ? Take word counting as an example, something that Google does all of the time.
    • MapReduce – How does it work ? ● Input data split into shards ● Split data mapped to key,value pairs i.e. Bear,1 ● Mapped data shuffled/sorted by key i.e. Bear ● Sorted data reduced i.e. Bear, 2 ● Final data stored on HDFS ● There might be extra map layer before shuffle ● JobTracker controls all tasks in job ● TaskTracker controls map and reduce
    • MapReduce - Some examples A visual example with colours to show you the cycle Split -> Map -> Shuffle -> Reduce
    • MapReduce - Some examples A visual example of MapReduce with job and task trackers added to individual map and reduce jobs.
    • Hadoop MapReduce – Big users ● Users – Facebook – Yahoo – Amazon – Ebay ● Providers – Amazon – Cloudera – HortonWorks – MapR
    • Contact Us ● Feel free to contact us at – www.semtech-solutions.co.nz – info@semtech-solutions.co.nz ● We offer IT project consultancy ● We are happy to hear about your problems ● You can just pay for those hours that you need ● To solve your problems
    • Contact Us ● Feel free to contact us at – www.semtech-solutions.co.nz – info@semtech-solutions.co.nz ● We offer IT project consultancy ● We are happy to hear about your problems ● You can just pay for those hours that you need ● To solve your problems