Advertisement

More Related Content

Advertisement
Advertisement

Lightweight wrapper for Hive on Amazon EMR

  1. Lightweight wrapper for Hive on Amazon EMR @stanaka
  2. n  n  CTO n 
  3. Hive on Amazon EMR n  Amazon Elastic MapReduce n  Hive n  HiveQL
  4. EMR @ n  Hadoop n  20 n  n  n  n  MapReduce n  Hadoop Streaming n  Perl Mapper, Reducer
  5. EMR n  ! n  ! n  n  S3 n  ? n 
  6. 1: S3 n  S3 n  log2s3.pl n  S3 n  1 1 (logrotate)
  7. Hive n  Hive SerDe n  n  n  label:value n  n  LogFormat  "time:%tthost:%htreq:%rtstatus:%>s  ... n  time:[25/Sep/2011:12:17:51  +0900]              host:11.22.33.44
  8. 2: n  Hive n  n  Wrapper n  Perl n  Net::Amazon::EMR n  ruby elastic-mapreduce n  Net::Amazon::EMR::Wrapper
  9. n  n  n  n  SerDe n  HiveQL n  n  Perl n  !!!
  10. use  Net::Amazon::EMR::Wrapper; my  $emr  =  Net::Amazon::EMR::Wrapper-­‐>new({        name  =>  ’testcluster',        start_cluster  =>  1,        num_instances  =>  5,        slave_instance_type  =>  'm1.small',        master_instance_type  =>  'm1.small',        alive  =>  0,        log  =>  Log::Dispatch::Config-­‐>instance,        jar  =>  's3://somelogs/_lib/hadoop/hive/serde/hatena-­‐ serde.jar', });
  11. $emr-­‐>create_table('diary'); $emr-­‐>add_partition('diary',      [  DateTime-­‐>new(year  =>  2011,  month  =>  5,  day  =>  19),          DateTime-­‐>new(year  =>  2011,  month  =>  5,  day  =>  20)  ], ); $emr-­‐>do_select(      "select  *  from  logs  limit  10",    #  HiveQL      “line_sample”);                                  #   $emr-­‐>do_select(      "select  count(1)  from  logs",      "page  view"); $emr-­‐>do_select(      "select  count(distinct  referer)  from  logs",        "unique  referer"); $emr-­‐>get_results;    #
  12. %  perl  bin/sample.pl [Sun  Sep  25  11:30:57  2011]  [notice]  j-­‐1IT82USV5OM0J  is  initiating [Sun  Sep  25  11:37:00  2011]  [notice]  testcluster-­‐20449-­‐1317004256   (j-­‐1IT82USV5OM0J)  is  ready. [Sun  Sep  25  11:37:49  2011]  [notice]  Step  'create_table'  is   finished.  (2  steps  left) [Sun  Sep  25  11:47:20  2011]  [notice]  Step  'page  view'  is  finished.   (1  steps  left) [Sun  Sep  25  11:56:54  2011]  [notice]  Step  'unique  referer'  is   finished.  (0  steps  left) result: $VAR1  =  {                    'unique  referer'  =>  ’24235538',                    'page  view'  =>  ’3596154323',                }; [Sun  Sep  25  11:56:57  2011]  [notice]  Finished.
  13. Wrapper #1 n  n  Perl n  n  cron n  HiveQL
  14. Wrapper #2 n  n  n  .. n  n  S3 n 
Advertisement