Lightweight wrapper for Hive on Amazon EMR

7,309 views

Published on

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
7,309
On SlideShare
0
From Embeds
0
Number of Embeds
3,565
Actions
Shares
0
Downloads
4
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Lightweight wrapper for Hive on Amazon EMR

  1. 1. Lightweight wrapper for Hive on Amazon EMR @stanaka
  2. 2. n n  CTO n 
  3. 3. Hive on Amazon EMR n  Amazon Elastic MapReducen  Hive n  HiveQL
  4. 4. EMR @ n  Hadoop n  20n  n n n  MapReduce n  Hadoop Streaming n  Perl Mapper, Reducer
  5. 5. EMR n  !n  !n  n  S3 n  ?n 
  6. 6. 1: S3 n  S3 n  log2s3.pl n  S3 n  1 1 (logrotate)
  7. 7. Hive n  Hive SerDen  n  n  label:valuen  n  LogFormat  "time:%tthost:%htreq:%rtstatus:%>s  ... n  time:[25/Sep/2011:12:17:51  +0900]              host:11.22.33.44
  8. 8. 2: n  Hive n n  Wrapper n  Perl n  Net::Amazon::EMR n  ruby elastic-mapreducen  Net::Amazon::EMR::Wrapper
  9. 9. n n  n  n  SerDen  HiveQLn  n  Perln  !!!
  10. 10. use  Net::Amazon::EMR::Wrapper;my  $emr  =  Net::Amazon::EMR::Wrapper-­‐>new({        name  =>  ’testcluster,        start_cluster  =>  1,        num_instances  =>  5,        slave_instance_type  =>  m1.small,        master_instance_type  =>  m1.small,        alive  =>  0,        log  =>  Log::Dispatch::Config-­‐>instance,        jar  =>  s3://somelogs/_lib/hadoop/hive/serde/hatena-­‐serde.jar,});
  11. 11. $emr-­‐>create_table(diary);$emr-­‐>add_partition(diary,      [  DateTime-­‐>new(year  =>  2011,  month  =>  5,  day  =>  19),          DateTime-­‐>new(year  =>  2011,  month  =>  5,  day  =>  20)  ],);$emr-­‐>do_select(      "select  *  from  logs  limit  10",    #  HiveQL      “line_sample”);                                  #   $emr-­‐>do_select(      "select  count(1)  from  logs",      "page  view");$emr-­‐>do_select(      "select  count(distinct  referer)  from  logs",        "unique  referer");$emr-­‐>get_results;    #
  12. 12. %  perl  bin/sample.pl[Sun  Sep  25  11:30:57  2011]  [notice]  j-­‐1IT82USV5OM0J  is  initiating[Sun  Sep  25  11:37:00  2011]  [notice]  testcluster-­‐20449-­‐1317004256  (j-­‐1IT82USV5OM0J)  is  ready.[Sun  Sep  25  11:37:49  2011]  [notice]  Step  create_table  is  finished.  (2  steps  left)[Sun  Sep  25  11:47:20  2011]  [notice]  Step  page  view  is  finished.  (1  steps  left)[Sun  Sep  25  11:56:54  2011]  [notice]  Step  unique  referer  is  finished.  (0  steps  left)result:$VAR1  =  {                    unique  referer  =>  ’24235538,                    page  view  =>  ’3596154323,                };[Sun  Sep  25  11:56:57  2011]  [notice]  Finished.
  13. 13. Wrapper #1 n  n  Perl n  n  cron n  HiveQL
  14. 14. Wrapper #2 n  n  n  .. n  n  S3 n 

×