Your SlideShare is downloading. ×
Lightweight wrapper for Hive on Amazon EMR
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Lightweight wrapper for Hive on Amazon EMR

6,788
views

Published on

Published in: Technology

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
6,788
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
4
Comments
0
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Lightweight wrapper for Hive on Amazon EMR @stanaka
  • 2. n n  CTO n 
  • 3. Hive on Amazon EMR n  Amazon Elastic MapReducen  Hive n  HiveQL
  • 4. EMR @ n  Hadoop n  20n  n n n  MapReduce n  Hadoop Streaming n  Perl Mapper, Reducer
  • 5. EMR n  !n  !n  n  S3 n  ?n 
  • 6. 1: S3 n  S3 n  log2s3.pl n  S3 n  1 1 (logrotate)
  • 7. Hive n  Hive SerDen  n  n  label:valuen  n  LogFormat  "time:%tthost:%htreq:%rtstatus:%>s  ... n  time:[25/Sep/2011:12:17:51  +0900]              host:11.22.33.44
  • 8. 2: n  Hive n n  Wrapper n  Perl n  Net::Amazon::EMR n  ruby elastic-mapreducen  Net::Amazon::EMR::Wrapper
  • 9. n n  n  n  SerDen  HiveQLn  n  Perln  !!!
  • 10. use  Net::Amazon::EMR::Wrapper;my  $emr  =  Net::Amazon::EMR::Wrapper-­‐>new({        name  =>  ’testcluster,        start_cluster  =>  1,        num_instances  =>  5,        slave_instance_type  =>  m1.small,        master_instance_type  =>  m1.small,        alive  =>  0,        log  =>  Log::Dispatch::Config-­‐>instance,        jar  =>  s3://somelogs/_lib/hadoop/hive/serde/hatena-­‐serde.jar,});
  • 11. $emr-­‐>create_table(diary);$emr-­‐>add_partition(diary,      [  DateTime-­‐>new(year  =>  2011,  month  =>  5,  day  =>  19),          DateTime-­‐>new(year  =>  2011,  month  =>  5,  day  =>  20)  ],);$emr-­‐>do_select(      "select  *  from  logs  limit  10",    #  HiveQL      “line_sample”);                                  #   $emr-­‐>do_select(      "select  count(1)  from  logs",      "page  view");$emr-­‐>do_select(      "select  count(distinct  referer)  from  logs",        "unique  referer");$emr-­‐>get_results;    #
  • 12. %  perl  bin/sample.pl[Sun  Sep  25  11:30:57  2011]  [notice]  j-­‐1IT82USV5OM0J  is  initiating[Sun  Sep  25  11:37:00  2011]  [notice]  testcluster-­‐20449-­‐1317004256  (j-­‐1IT82USV5OM0J)  is  ready.[Sun  Sep  25  11:37:49  2011]  [notice]  Step  create_table  is  finished.  (2  steps  left)[Sun  Sep  25  11:47:20  2011]  [notice]  Step  page  view  is  finished.  (1  steps  left)[Sun  Sep  25  11:56:54  2011]  [notice]  Step  unique  referer  is  finished.  (0  steps  left)result:$VAR1  =  {                    unique  referer  =>  ’24235538,                    page  view  =>  ’3596154323,                };[Sun  Sep  25  11:56:57  2011]  [notice]  Finished.
  • 13. Wrapper #1 n  n  Perl n  n  cron n  HiveQL
  • 14. Wrapper #2 n  n  n  .. n  n  S3 n