© Orzota, Inc. 2012
Hands-­‐on	  MapReduce	  Programming	                 Varad	  Meru	               SDE	  -­‐	  Orzota,	  Inc.	             ...
About Orzotal         Mission:	  Make	  big	  data	  easy	  for	  consumpEon	  l         Big	  data	  professional	  ser...
About Me•  Orzota,	  Inc.	     •  Currently	  working	  with	  Hadoop,	  Mahout,	  Hive.	  •  Past	  Work	  Experience	   ...
Agenda•    IntroducEon	  •    Programming	  MapReduce	  •    map()	  method	  •    reduce()	  method	  •    Driver	  Class...
IntroductionMapReduce	  in	  2	  minutes	  –	  	  Problem	  Statement	  –	  Sum	  of	  Double	  of	  set	  of	     Numbers...
Introduction – contd.Mapping	  Phase	                                                        Code f(x) being sent to the  ...
Introduction – contd.Spill	  Phase	  •  Masternode	  directs	  the	                                      18               ...
Introduction – contd.Reduce	  Phase	  •  MasterNode	                                                           g(x)=162   ...
MapReduce ProgrammingSteps	  involved	  in	  wriEng	  a	  MapReduce	  program	  •    Write	  the	  Mapper	  •    Write	  t...
map() Method•         Internally	  called	  •         Contains	  the	  logic	  which	  runs	  on	  HDFS	  splits	  •      ...
map() Method – contd.•    Overriding	  the	  map()	  method	  of	  the	  Mapper	  class	       /* (non-Javadoc)	      * @s...
reduce() Method•  The	  output	  of	  the	  Mappers	  is	  Shuffled	  and	  Sorted	  on	     the	  output	  Mapper	  keys.	 ...
reduce() Method•    Overriding	  the	  map()	  method	  of	  the	  Mapper	  class	       /* (non-Javadoc)	      * @see org...
Driver Class•    Responsible	  for	       sebng	  up	  the	       JobClient	  object	  to	       keep	  the	  details	    ...
Demo	  © Orzota, Inc. 2012   16
Flow                                  © Orzota, Inc. 2012                              17Source: http://answers.oreilly.co...
ExecuEng	  MapReduce	  Program	  Eclipse	  •    For	  eclipse-­‐plugin	  mode	  only.	  Refer	  to	  the	  Eclipse	       ...
Demo	  © Orzota, Inc. 2012   19
Flow                            © Orzota, Inc. 2012                            20Source: http://developer.yahoo.com/hadoop...
QuesEons	  ?	    © Orzota, Inc. 2012   21
For more information•    hadoop.apache.org	  •    orzota.com/blog/single-­‐node-­‐     hadoop-­‐setup-­‐2	  •    orzota.co...
Thank	  You	  &	  Happy	  Hadooping	  J	         	   	   	   	  Contact	  Us	  at	  –	  	  	         	   	   	   	   	   ...
Upcoming SlideShare
Loading in …5
×

Hands-on MapReduce Programming

3,516 views

Published on

Varad Meru, an Orzota engineer explains the basics of MapReduce using a non-Wordcount example!

Published in: Technology
1 Comment
9 Likes
Statistics
Notes
  • http://dbmanagement.info/Tutorials/MapReduce.htm
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
3,516
On SlideShare
0
From Embeds
0
Number of Embeds
240
Actions
Shares
0
Downloads
0
Comments
1
Likes
9
Embeds 0
No embeds

No notes for slide

Hands-on MapReduce Programming

  1. 1. © Orzota, Inc. 2012
  2. 2. Hands-­‐on  MapReduce  Programming   Varad  Meru   SDE  -­‐  Orzota,  Inc.     varad@orzota.com   © Orzota, Inc. 2012 2
  3. 3. About Orzotal  Mission:  Make  big  data  easy  for  consumpEon  l  Big  data  professional  services    l  Founded  in  March  2012    l  Headquartered  in  Silicon  Valley,  California  l  Offshore  offices  in  Chennai,  India  l  Strong  engineering  team   o  Founders  worked  in  Yahoo!,  NeUlix,  Sun   © Orzota, Inc. 2012 3
  4. 4. About Me•  Orzota,  Inc.   •  Currently  working  with  Hadoop,  Mahout,  Hive.  •  Past  Work  Experience   •  Persistent  Systems  –  Search  (Nutch,  Solr,  Hadoop)  and   RecommendaEon  Engines  (Mahout,  Data  Clustering),   User  Behavior  AnalyEcs.  •  Area  of  Interest   •  Data  Science,  Distributed  Systems,  InformaEon  Retrieval    linkedin.com/in/vmeru twi^er.com/vrdmr © Orzota, Inc. 2012 4
  5. 5. Agenda•  IntroducEon  •  Programming  MapReduce  •  map()  method  •  reduce()  method  •  Driver  Class  •  Running  MapReduce  Job   © Orzota, Inc. 2012 5
  6. 6. IntroductionMapReduce  in  2  minutes  –    Problem  Statement  –  Sum  of  Double  of  set  of   Numbers.   1 2   1 3 4 5 6 8 9 11 17 21 3 6   4 8 5 10  The  intermediate  array  aaer     6 12 8 16  Processing   9 18 11 22 17 34 © Orzota, Inc. 2012 6 21 42
  7. 7. Introduction – contd.Mapping  Phase   Code f(x) being sent to the slave node for applying the logic on the data piece. In our•  Splibng  the  input   9 case the data piece is an entry from the Array. 17 8•  Sending  slaves The Master Node 6 (datanodes)  the   This node contains the mapping  code  -­‐  f(x).   code of the 1 function to be applied on individual entries•  Apply  the  f(x)  method   11 of Array Written in the on  the  data  split   1 map() method in Hadoop. 4 21 3 Slave Nodes Mapping Phase © Orzota, Inc. 2012 7
  8. 8. Introduction – contd.Spill  Phase  •  Masternode  directs  the   18 34 16 Mappers  to  send  the   The  Master   processed  f(x)  output   12 Node.   The  Results  of  the   data  to  intermediate   Processed  Data   (from  the  slave   2 nodes  is  given  to  s   locaEon.   specific  node   where  reducer   funcEon  runs)   22•  Shuffle  and  SorEng   2 8 42 6 Slave Nodes Spill  Phase  :-­‐  Shuffle  and  Sort   © Orzota, Inc. 2012 8
  9. 9. Introduction – contd.Reduce  Phase  •  MasterNode   g(x)=162 (JobTracker)  to  invokes   The Master the  Reduce  task  once   Node. The Results of the  spilling  is  over.   the Processed Data (from the slave nodes is given to s•  Get  locaEon  of  the  Spill   specific node where reducer function runs) output  from   MasterNode   (Namenode).   Slave Nodes Reducer Phase © Orzota, Inc. 2012 9
  10. 10. MapReduce ProgrammingSteps  involved  in  wriEng  a  MapReduce  program  •  Write  the  Mapper  •  Write  the  Reducer  •  Write  the  Driver    Life’s  Simple  unEl  you  start  customizing  and  work  on  Data  Cleansing   © Orzota, Inc. 2012 10
  11. 11. map() Method•  Internally  called  •  Contains  the  logic  which  runs  on  HDFS  splits  •  Takes  the  following  parameters   •  Key   •  Value   •  OutputCollector   •  Reporter  -­‐  facility for Map-Reduce applications to report progress.  •  Collector  collects  the  mapper’s  output  key  and  value   © Orzota, Inc. 2012 11
  12. 12. map() Method – contd.•  Overriding  the  map()  method  of  the  Mapper  class   /* (non-Javadoc) * @see org.apache.hadoop.mapred.Mapper#map(java.lang.Object, java.lang.Object, org.apache.hadoop.mapred.OutputCollector, org.apache.hadoop.mapred.Reporter) */ @Override public void map( LongWritable _key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { © Orzota, Inc. 2012 12
  13. 13. reduce() Method•  The  output  of  the  Mappers  is  Shuffled  and  Sorted  on   the  output  Mapper  keys.  •  Contains  the  logic  which  runs  on  HDFS  temporary  files   generated.  •  Takes  the  following  parameters   •  Key   •  Value   •  OutputCollector   •  Reporter  -­‐  facility for Map-Reduce applications to report progress.   © Orzota, Inc. 2012 13
  14. 14. reduce() Method•  Overriding  the  map()  method  of  the  Mapper  class   /* (non-Javadoc) * @see org.apache.hadoop.mapred.Reducer#reduce (java.lang.Object, java.util.Iterator, org.apache.hadoop.mapred.OutputCollector, org.apache.hadoop.mapred.Reporter) */ @Override public void reduce( Text _key, Iterator<IntWritable> values, OutputCollector<Text,IntWritable> output, Reporter reporter) throws IOException { © Orzota, Inc. 2012 14
  15. 15. Driver Class•  Responsible  for   sebng  up  the   JobClient  object  to   keep  the  details   about  the  Hadoop   Job.   © Orzota, Inc. 2012 15
  16. 16. Demo  © Orzota, Inc. 2012 16
  17. 17. Flow © Orzota, Inc. 2012 17Source: http://answers.oreilly.com/topic/459-anatomy-of-a-mapreduce-job-run-with-hadoop/
  18. 18. ExecuEng  MapReduce  Program  Eclipse  •  For  eclipse-­‐plugin  mode  only.  Refer  to  the  Eclipse   Setup  blog  (Ref.  3)  Cluster  •  Start  the  Cluster  (size  does  not  ma^er  for  running   the  hadoop  job)  •  $ hadoop jar <jarPath>.jar <arguments>   © Orzota, Inc. 2012 18
  19. 19. Demo  © Orzota, Inc. 2012 19
  20. 20. Flow © Orzota, Inc. 2012 20Source: http://developer.yahoo.com/hadoop/tutorial/module4.html#dataflow
  21. 21. QuesEons  ?   © Orzota, Inc. 2012 21
  22. 22. For more information•  hadoop.apache.org  •  orzota.com/blog/single-­‐node-­‐ hadoop-­‐setup-­‐2  •  orzota.com/blog/eclipse-­‐setup-­‐ for-­‐hadoop-­‐development  •  orzota.com/blog/step-­‐by-­‐step-­‐ mapreduce-­‐programming  •  developer.yahoo.com/hadoop/ tutorial   © Orzota, Inc. 2012 22
  23. 23. Thank  You  &  Happy  Hadooping  J          Contact  Us  at  –                  info@orzota.com   www.linkedin.com/company/orzota-­‐inc-­‐     www.twi^er.com/orzota   © Orzota, Inc. 2012 23

×