Your SlideShare is downloading. ×
0
Hands-on MapReduce Programming
Hands-on MapReduce Programming
Hands-on MapReduce Programming
Hands-on MapReduce Programming
Hands-on MapReduce Programming
Hands-on MapReduce Programming
Hands-on MapReduce Programming
Hands-on MapReduce Programming
Hands-on MapReduce Programming
Hands-on MapReduce Programming
Hands-on MapReduce Programming
Hands-on MapReduce Programming
Hands-on MapReduce Programming
Hands-on MapReduce Programming
Hands-on MapReduce Programming
Hands-on MapReduce Programming
Hands-on MapReduce Programming
Hands-on MapReduce Programming
Hands-on MapReduce Programming
Hands-on MapReduce Programming
Hands-on MapReduce Programming
Hands-on MapReduce Programming
Hands-on MapReduce Programming
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Hands-on MapReduce Programming

2,638

Published on

Varad Meru, an Orzota engineer explains the basics of MapReduce using a non-Wordcount example!

Varad Meru, an Orzota engineer explains the basics of MapReduce using a non-Wordcount example!

Published in: Technology
1 Comment
9 Likes
Statistics
Notes
  • http://dbmanagement.info/Tutorials/MapReduce.htm
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
2,638
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
1
Likes
9
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. © Orzota, Inc. 2012
  • 2. Hands-­‐on  MapReduce  Programming   Varad  Meru   SDE  -­‐  Orzota,  Inc.     varad@orzota.com   © Orzota, Inc. 2012 2
  • 3. About Orzotal  Mission:  Make  big  data  easy  for  consumpEon  l  Big  data  professional  services    l  Founded  in  March  2012    l  Headquartered  in  Silicon  Valley,  California  l  Offshore  offices  in  Chennai,  India  l  Strong  engineering  team   o  Founders  worked  in  Yahoo!,  NeUlix,  Sun   © Orzota, Inc. 2012 3
  • 4. About Me•  Orzota,  Inc.   •  Currently  working  with  Hadoop,  Mahout,  Hive.  •  Past  Work  Experience   •  Persistent  Systems  –  Search  (Nutch,  Solr,  Hadoop)  and   RecommendaEon  Engines  (Mahout,  Data  Clustering),   User  Behavior  AnalyEcs.  •  Area  of  Interest   •  Data  Science,  Distributed  Systems,  InformaEon  Retrieval    linkedin.com/in/vmeru twi^er.com/vrdmr © Orzota, Inc. 2012 4
  • 5. Agenda•  IntroducEon  •  Programming  MapReduce  •  map()  method  •  reduce()  method  •  Driver  Class  •  Running  MapReduce  Job   © Orzota, Inc. 2012 5
  • 6. IntroductionMapReduce  in  2  minutes  –    Problem  Statement  –  Sum  of  Double  of  set  of   Numbers.   1 2   1 3 4 5 6 8 9 11 17 21 3 6   4 8 5 10  The  intermediate  array  aaer     6 12 8 16  Processing   9 18 11 22 17 34 © Orzota, Inc. 2012 6 21 42
  • 7. Introduction – contd.Mapping  Phase   Code f(x) being sent to the slave node for applying the logic on the data piece. In our•  Splibng  the  input   9 case the data piece is an entry from the Array. 17 8•  Sending  slaves The Master Node 6 (datanodes)  the   This node contains the mapping  code  -­‐  f(x).   code of the 1 function to be applied on individual entries•  Apply  the  f(x)  method   11 of Array Written in the on  the  data  split   1 map() method in Hadoop. 4 21 3 Slave Nodes Mapping Phase © Orzota, Inc. 2012 7
  • 8. Introduction – contd.Spill  Phase  •  Masternode  directs  the   18 34 16 Mappers  to  send  the   The  Master   processed  f(x)  output   12 Node.   The  Results  of  the   data  to  intermediate   Processed  Data   (from  the  slave   2 nodes  is  given  to  s   locaEon.   specific  node   where  reducer   funcEon  runs)   22•  Shuffle  and  SorEng   2 8 42 6 Slave Nodes Spill  Phase  :-­‐  Shuffle  and  Sort   © Orzota, Inc. 2012 8
  • 9. Introduction – contd.Reduce  Phase  •  MasterNode   g(x)=162 (JobTracker)  to  invokes   The Master the  Reduce  task  once   Node. The Results of the  spilling  is  over.   the Processed Data (from the slave nodes is given to s•  Get  locaEon  of  the  Spill   specific node where reducer function runs) output  from   MasterNode   (Namenode).   Slave Nodes Reducer Phase © Orzota, Inc. 2012 9
  • 10. MapReduce ProgrammingSteps  involved  in  wriEng  a  MapReduce  program  •  Write  the  Mapper  •  Write  the  Reducer  •  Write  the  Driver    Life’s  Simple  unEl  you  start  customizing  and  work  on  Data  Cleansing   © Orzota, Inc. 2012 10
  • 11. map() Method•  Internally  called  •  Contains  the  logic  which  runs  on  HDFS  splits  •  Takes  the  following  parameters   •  Key   •  Value   •  OutputCollector   •  Reporter  -­‐  facility for Map-Reduce applications to report progress.  •  Collector  collects  the  mapper’s  output  key  and  value   © Orzota, Inc. 2012 11
  • 12. map() Method – contd.•  Overriding  the  map()  method  of  the  Mapper  class   /* (non-Javadoc) * @see org.apache.hadoop.mapred.Mapper#map(java.lang.Object, java.lang.Object, org.apache.hadoop.mapred.OutputCollector, org.apache.hadoop.mapred.Reporter) */ @Override public void map( LongWritable _key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { © Orzota, Inc. 2012 12
  • 13. reduce() Method•  The  output  of  the  Mappers  is  Shuffled  and  Sorted  on   the  output  Mapper  keys.  •  Contains  the  logic  which  runs  on  HDFS  temporary  files   generated.  •  Takes  the  following  parameters   •  Key   •  Value   •  OutputCollector   •  Reporter  -­‐  facility for Map-Reduce applications to report progress.   © Orzota, Inc. 2012 13
  • 14. reduce() Method•  Overriding  the  map()  method  of  the  Mapper  class   /* (non-Javadoc) * @see org.apache.hadoop.mapred.Reducer#reduce (java.lang.Object, java.util.Iterator, org.apache.hadoop.mapred.OutputCollector, org.apache.hadoop.mapred.Reporter) */ @Override public void reduce( Text _key, Iterator<IntWritable> values, OutputCollector<Text,IntWritable> output, Reporter reporter) throws IOException { © Orzota, Inc. 2012 14
  • 15. Driver Class•  Responsible  for   sebng  up  the   JobClient  object  to   keep  the  details   about  the  Hadoop   Job.   © Orzota, Inc. 2012 15
  • 16. Demo  © Orzota, Inc. 2012 16
  • 17. Flow © Orzota, Inc. 2012 17Source: http://answers.oreilly.com/topic/459-anatomy-of-a-mapreduce-job-run-with-hadoop/
  • 18. ExecuEng  MapReduce  Program  Eclipse  •  For  eclipse-­‐plugin  mode  only.  Refer  to  the  Eclipse   Setup  blog  (Ref.  3)  Cluster  •  Start  the  Cluster  (size  does  not  ma^er  for  running   the  hadoop  job)  •  $ hadoop jar <jarPath>.jar <arguments>   © Orzota, Inc. 2012 18
  • 19. Demo  © Orzota, Inc. 2012 19
  • 20. Flow © Orzota, Inc. 2012 20Source: http://developer.yahoo.com/hadoop/tutorial/module4.html#dataflow
  • 21. QuesEons  ?   © Orzota, Inc. 2012 21
  • 22. For more information•  hadoop.apache.org  •  orzota.com/blog/single-­‐node-­‐ hadoop-­‐setup-­‐2  •  orzota.com/blog/eclipse-­‐setup-­‐ for-­‐hadoop-­‐development  •  orzota.com/blog/step-­‐by-­‐step-­‐ mapreduce-­‐programming  •  developer.yahoo.com/hadoop/ tutorial   © Orzota, Inc. 2012 22
  • 23. Thank  You  &  Happy  Hadooping  J          Contact  Us  at  –                  info@orzota.com   www.linkedin.com/company/orzota-­‐inc-­‐     www.twi^er.com/orzota   © Orzota, Inc. 2012 23

×