Hadoop Conference Japan 2013 Winter   Huahin Framework          for       Hadoop                       JJaann  2211,,  220...
•   Ryu Kobayashi (@ryu_kobayashi)•   BrainPad Inc.•   Hadoop, Cassandra, Machine Learning, ...      AD                   ...
What is  HuahinFramework?
Huahin Frameworkhttp://huahinframework.org
Hadoop Family   Logo is ...
Huahin logo is ...Very very very cute!
Huahin Framework                http://huahinframework.orgWe released some software which developed in an officein June 201...
Huahin Framework                    http://huahinframework.orgThe origin of the name of Huahin Framework  There is a custo...
Huahin Framework              http://huahinframework.orgHuahin Framework Configuration Main is consists of the following el...
Huahin Framework               http://huahinframework.orgHuahin Core •   Simplified MapReduce programs •   Do not have to w...
Huahin Framework              http://huahinframework.orgHuahin Example •   Page top 10 rank exampleFirst, natural MapReduc...
Huahin Framework    http://huahinframework.org Data of page top 10 rank    Example: format is Tab delimitedJan 21, 2013   ...
Huahin Framework                 http://huahinframework.orgPage top 10 rank of natural MapReduce                          ...
Huahin Framework                 http://huahinframework.orgPage top 10 rank of natural MapReduce                          ...
Huahin Framework                 http://huahinframework.orgPage top 10 rank of natural MapReduce                          ...
Huahin Framework                                                                                  http://huahinframework.o...
Huahin Framework                                                               http://huahinframework.org                P...
Huahin Framework               http://huahinframework.orgPage top 10 rank of natural MapReduce •   This is a very long ......
Huahin Framework                                                                       http://huahinframework.org         ...
Huahin Framework               http://huahinframework.orgPage top 10 rank of Huahin MapReduce •   This is a very short!! •...
Huahin Framework              http://huahinframework.orgHuahin Core •   Other     • Simple Join     • Big Join     • etc ...
Huahin Framework            http://huahinframework.orgHuahin Tools • A collection of tools generic operation.   • Currentl...
Huahin Framework                 http://huahinframework.orgHuahin Manager •   Manager to manage the MapReduce Job     • Ge...
Huahin Framework              http://huahinframework.orgHuahin Manager •   For 2.0.2-alpha and CDH4     • Getting the Appl...
Huahin Framework                     http://huahinframework.orgHuahin Manager •   EMR Support     • Setting bootstrap     ...
Huahin Framework           http://huahinframework.orgHuahin Manager Operating environment of Huahin ManagerHuahin         ...
Huahin Framework                 http://huahinframework.orgHuahin EManager Manager that specializes in EMR •   Manager to ...
Huahin Framework              http://huahinframework.orgHuahin EManager •   Register queue     • The following functions c...
Huahin Framework                        http://huahinframework.orgHuahin EManager    Operating environment of Huahin EMana...
Huahin Framework            http://huahinframework.orgHuahin EManager Operating environment of Huahin EManager The place t...
Huahin Framework             http://huahinframework.orgHuahin EManager Operating environment of Huahin EManager The place ...
Huahin Framework             http://huahinframework.orgHuahin EManager Register queue Done using the PUT or POST method of...
Huahin Framework                        http://huahinframework.org     Huahin EManager        Register queueExamples of PU...
Huahin Framework                        http://huahinframework.org     Huahin EManager        List of Job FlowExample of G...
Huahin Framework                        http://huahinframework.org     Huahin EManager        Queue APIExample of register...
Huahin Framework               http://huahinframework.orgHuahin EManager  Kill of JobThere is a command to kill the Job ru...
Huahin Framework                 http://huahinframework.orgHuahin EManager   Kill of JobIt made possible the Kill API from...
Huahin Framework              http://huahinframework.orgConclusion •   Huahin Core     • Unlike the Hive and Pig     • Whe...
Huahin Framework                http://huahinframework.orgThe current version •   Huahin Core 0.1.4 •   Huahin Unit 0.1.4 ...
Thanks!!!
Upcoming SlideShare
Loading in...5
×

Huahin Framework for Hadoop, Hadoop Conference Japan 2013 Winter

1,936

Published on

Huahin Framework for Hadoop
Hadoop Conference Japan 2013 Winter

1 Comment
8 Likes
Statistics
Notes
  • http://dbmanagement.info/Tutorials/Hadoop.htm
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
1,936
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
69
Comments
1
Likes
8
Embeds 0
No embeds

No notes for slide

Huahin Framework for Hadoop, Hadoop Conference Japan 2013 Winter

  1. 1. Hadoop Conference Japan 2013 Winter Huahin Framework for Hadoop JJaann 2211,, 22001133 @@rryyuu__kkoobbaayyaasshhii
  2. 2. • Ryu Kobayashi (@ryu_kobayashi)• BrainPad Inc.• Hadoop, Cassandra, Machine Learning, ... AD Now on sale!!!
  3. 3. What is HuahinFramework?
  4. 4. Huahin Frameworkhttp://huahinframework.org
  5. 5. Hadoop Family Logo is ...
  6. 6. Huahin logo is ...Very very very cute!
  7. 7. Huahin Framework http://huahinframework.orgWe released some software which developed in an officein June 2012 as OSS. * It is what was used in the panel log analysis. * Please refer to the slide of the "Hadoop ConferenceJapan 2011 Fall" for more information.  http://goo.gl/C9tzfHuahin Framework is a general term for multipleproducts.
  8. 8. Huahin Framework http://huahinframework.orgThe origin of the name of Huahin Framework There is a custom to decide on a wine region in the code name of the office. Huahin = Hua Hin = Tourist destinations in Thailand = Wine region When it comes to Thailand... Tt is the elephant ! As such, Huahin image
  9. 9. Huahin Framework http://huahinframework.orgHuahin Framework Configuration Main is consists of the following elements: • Huahin Core • Huahin Tools • Huahin Manager
  10. 10. Huahin Framework http://huahinframework.orgHuahin Core • Simplified MapReduce programs • Do not have to write it yourself Writable and Secondary Sort • The basic grouping, sorting, etc., the idea from SQL • If you want to write, can write natural MapReduce • C++ is the same as a superset of C • It can do Hive or Pig. However, if it really want to give the performances.(Parallel computation, etc...) • There Huahin Unit as a test driver • Wraps the MRUnit • Example of implementation
  11. 11. Huahin Framework http://huahinframework.orgHuahin Example • Page top 10 rank exampleFirst, natural MapReduce.Second, Huahin MapReduce.
  12. 12. Huahin Framework http://huahinframework.org Data of page top 10 rank Example: format is Tab delimitedJan 21, 2013 user1 /index.htmlJan 21, 2013 user1 /index2.htmlJan 21, 2013 user2 /contents/foo.htmlJan 21, 2013 user42 /bar.htmlJan 21, 2013 user3 /index.htmlJan 21, 2013 user7 /news/index.htmlJan 21, 2013 user4 /release/2013.htmlJan 21, 2013 user3 /index2.htmlJan 21, 2013 user7 /download.htmlJan 21, 2013 user5 /bar.htmlJan 21, 2013 user12 /release/2012.htmlJan 21, 2013 user5 /contents/foo.htmlJan 21, 2013 user23 /page2.htmlJan 21, 2013 user53 /news.htmlJan 21, 2013 user6 /download.htmlJan 21, 2013 user21 /bar.htmlJan 21, 2013 user18 /index.html
  13. 13. Huahin Framework http://huahinframework.orgPage top 10 rank of natural MapReduce JobTools public class PathTop10RankJobTool extends Configured implements Tool { @Override public int run(String[] arg0) throws Exception { Job firstJob = new Job(getConf(), "first"); firstJob.setJarByClass(PathTop10RankJobTool.class); TextInputFormat.setInputPaths(firstJob, "input"); firstJob.setInputFormatClass(TextInputFormat.class); firstJob.setMapperClass(PathTop10RankFirstMapper.class); firstJob.setMapOutputKeyClass(FirstKeyWritable.class); firstJob.setMapOutputValueClass(IntWritable.class); firstJob.setReducerClass(PathTop10RankFirstReducer.class); firstJob.setOutputKeyClass(SecondKeyWritable.class); firstJob.setOutputValueClass(IntWritable.class); SequenceFileOutputFormat.setOutputPath(firstJob, new Path("first")); firstJob.setOutputFormatClass(SequenceFileOutputFormat.class); if (!firstJob.waitForCompletion(true)) { return -1; } Job secondJob = new Job(getConf(), "second"); secondJob.setJarByClass(PathTop10RankJobTool.class); SequenceFileInputFormat.setInputPaths(secondJob, "first"); secondJob.setInputFormatClass(SequenceFileInputFormat.class); secondJob.setMapperClass(Mapper.class); secondJob.setMapOutputKeyClass(SecondKeyWritable.class); secondJob.setMapOutputValueClass(IntWritable.class); secondJob.setGroupingComparatorClass(PathTop10RankGroupingComparatorClass.class); secondJob.setPartitionerClass(PathTop10RankPartitioner.class); secondJob.setSortComparatorClass(PathTop10RankingSortComparator.class); secondJob.setReducerClass(PathTop10RankSecondReducer.class); secondJob.setOutputKeyClass(SecondKeyWritable.class); secondJob.setOutputValueClass(IntWritable.class); TextOutputFormat.setOutputPath(secondJob, new Path("output")); secondJob.setOutputFormatClass(TextOutputFormat.class); return secondJob.waitForCompletion(true) ? 0 : -1; } }
  14. 14. Huahin Framework http://huahinframework.orgPage top 10 rank of natural MapReduce FirstMapper public class PathTop10RankFirstMapper extends Mapper<LongWritable, Text, FirstKeyWritable, IntWritable> { private IntWritable ONE = new IntWritable(1); @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String[] s = value.toString().split("t"); context.write(new FirstKeyWritable(s[0], s[2]), ONE); } } FirstReducer public class PathTop10RankFirstReducer extends Reducer<FirstKeyWritable, IntWritable, SecondKeyWritable, IntWritable> { @Override protected void reduce(FirstKeyWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int pv = 0; for (IntWritable i : values) { pv += i.get(); } context.write( new SecondKeyWritable(key.getDate().toString(), key.getPage().toString(), pv), new IntWritable(pv)); } }
  15. 15. Huahin Framework http://huahinframework.orgPage top 10 rank of natural MapReduce SecondReducer public class PathTop10RankSecondReducer extends Reducer<SecondKeyWritable, IntWritable, SecondKeyWritable, IntWritable> { @Override protected void reduce(SecondKeyWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int rank = 0; for (IntWritable i : values) { if (rank > 10) { break; } context.write(key, i); rank++; } } }
  16. 16. Huahin Framework http://huahinframework.org Page top 10 rank of natural MapReduce FirstKeyWritable SecondKeyWritable public class SecondKeyWritable implements WritableComparable<SecondKeyWritable> {public class FirstKeyWritable implements WritableComparable<FirstKeyWritable> { private Text date = new Text(); private Text date = new Text(); private Text page = new Text(); private Text page = new Text(); private IntWritable pv = new IntWritable(); public FirstKeyWritable() { public SecondKeyWritable() { } } public FirstKeyWritable(String date, String page) { public SecondKeyWritable(String date, String page, int pv) { this.date.set(date); this.date.set(date); this.page.set(page); this.page.set(page); } this.pv.set(pv); } @Override public void readFields(DataInput in) throws IOException { @Override this.date.readFields(in); public void readFields(DataInput in) throws IOException { this.page.readFields(in); this.date.readFields(in); } this.page.readFields(in); this.pv.readFields(in); @Override } public void write(DataOutput out) throws IOException { this.date.write(out); @Override this.page.write(out); public void write(DataOutput out) throws IOException { } this.date.write(out); this.page.write(out); @Override this.pv.write(out); public int compareTo(FirstKeyWritable o) { } int compare = this.date.toString().compareTo(o.date.toString()); if (compare != 0) { @Override return compare; public int compareTo(SecondKeyWritable o) { } return this.date.toString().compareTo(o.date.toString()); return this.page.toString().compareTo(o.page.toString()); } } @Override @Override public boolean equals(Object obj) { public boolean equals(Object obj) { if (obj == null) { if (obj == null) { return false; return false; } } if (!(obj instanceof SecondKeyWritable)) { if (!(obj instanceof FirstKeyWritable)) { return false; return false; } } SecondKeyWritable o = (SecondKeyWritable) obj; FirstKeyWritable o = (FirstKeyWritable) obj; return this.date.equals(o.getDate()); return this.date.equals(o.getDate()) && } this.page.equals(o.getPage()); } @Override public String toString() { /** return this.date + "t" + this.page; * @return the date } */ public Text getDate() { /** return date; * @return the date } */ public Text getDate() { /** return date; * @param date the date to set } */ public void setDate(Text date) { /** this.date = date; * @param date the date to set } */ public void setDate(Text date) { /** this.date = date; * @return the page } */ public Text getPage() { /** return page; * @return the page } */ public Text getPage() { /** return page; * @param page the page to set } */ public void setPage(Text page) { /** this.page = page; * @param page the page to set } */} public void setPage(Text page) { this.page = page; } /** * @return the pv */ public IntWritable getPv() { return pv; } /** * @param pv the pv to set */ public void setPv(IntWritable pv) { this.pv = pv; } }
  17. 17. Huahin Framework http://huahinframework.org Page top 10 rank of natural MapReduce GroupingComparator SortComparatorpublic class PathTop10RankGroupingComparatorClass extends WritableComparator { public class PathTop10RankingSortComparator extends WritableComparator { public PathTop10RankGroupingComparatorClass() { public PathTop10RankingSortComparator() { super(SecondKeyWritable.class, true); super(SecondKeyWritable.class, true); } } @SuppressWarnings({ "rawtypes", "unchecked" }) @SuppressWarnings({ "rawtypes", "unchecked" }) @Override @Override public int compare(Object a, Object b) { public int compare(Object a, Object b) { if (a instanceof SecondKeyWritable && b instanceof SecondKeyWritable) { if (a instanceof SecondKeyWritable && b instanceof SecondKeyWritable) { Comparable one = SecondKeyWritable.class.cast(a).getDate(); Comparable one = SecondKeyWritable.class.cast(a).getDate(); Comparable another = SecondKeyWritable.class.cast(b).getDate(); Comparable another = SecondKeyWritable.class.cast(b).getDate(); return one.compareTo(another); } int compare = one.compareTo(another); return super.compare(a, b); if (compare != 0) { } return compare;} } Comparable oneOrder = SecondKeyWritable.class.cast(a).getPv(); Comparable anotherOrder = SecondKeyWritable.class.cast(b).getPv(); return oneOrder.compareTo(anotherOrder); } return super.compare(a, b); Partitioner } }public class PathTop10RankPartitioner extends Partitioner<SecondKeyWritable, IntWritable> { @Override public int getPartition(SecondKeyWritable key, IntWritable value, int numPartitioner) { return Math.abs(key.getDate().hashCode()) % numPartitioner; }}
  18. 18. Huahin Framework http://huahinframework.orgPage top 10 rank of natural MapReduce • This is a very long ... • About 307 lines
  19. 19. Huahin Framework http://huahinframework.org Page top 10 rank of Huahin MapReduce JobTools FirstSummarizerpublic class PathRankingJobTool extends SimpleJobTool { public class FirstSummarizer extends Summarizer { @Override @Override protected String setInputPath(String[] args) { public void init() { return args[0]; } } @Override @Override public void summarize(Writer writer) protected String setOutputPath(String[] args) { throws IOException, InterruptedException { return args[1]; int pv = 0; } while (hasNext()) { Record record = next(writer); /* (non-Javadoc) pv += record.getValueInteger("PV"); * @see org.huahin.core.SimpleJobTool#setup() } */ @Override Record emitRecord = new Record(); protected void setup() throws Exception { emitRecord.addGrouping("DATE", getGroupingRecord().getGroupingString("DATE")); final String[] labels = new String[] { "DATE", "USER", "URL" }; emitRecord.addSort(pv, Record.SORT_UPPER, 1); emitRecord.addValue("PATH", getGroupingRecord().getGroupingString("PATH")); SimpleJob job1 = addJob(labels, StringUtil.TAB); emitRecord.addValue("PV", pv); job1.setFilter(FirstFilter.class); writer.write(emitRecord); job1.setSummarizer(FirstSummarizer.class); } SimpleJob job2 = addJob(); @Override job2.setSummarizer(SecondSummarizer.class); public void summarizerSetup() { } }} } FirstFilter SecondSummarizerpublic class FirstFilter extends Filter { public class SecondSummarizer extends Summarizer { @Override @Override public void init() { public void init() { } } @Override @Override public void filter(Record record, Writer writer) public void summarize(Writer writer) throws IOException, InterruptedException { throws IOException, InterruptedException { Record emitRecord = new Record(); int rank = 1; emitRecord.addGrouping("DATE", record.getValueString("DATE")); while (hasNext()) { emitRecord.addGrouping("PATH", record.getValueString("URL")); if (rank > 10) { emitRecord.addValue("PV", 1); break; writer.write(emitRecord); } } Record record = next(writer); @Override Record emitRecord = new Record(); public void filterSetup() { emitRecord.addValue("PATH", record.getValueString("PATH")); } emitRecord.addValue("UU", record.getValueInteger("UU"));} writer.write(emitRecord); rank++; } } @Override public void summarizerSetup() { } }
  20. 20. Huahin Framework http://huahinframework.orgPage top 10 rank of Huahin MapReduce • This is a very short!! • About 100 lines
  21. 21. Huahin Framework http://huahinframework.orgHuahin Core • Other • Simple Join • Big Join • etc ...
  22. 22. Huahin Framework http://huahinframework.orgHuahin Tools • A collection of tools generic operation. • Currently only Apache Log molding... • Operating environment • On Premises Hadoop • Stand Alone • Multi Thread execution for small data • EMR • S3://huahin/tools/huahin-tools.0.1.0.jar
  23. 23. Huahin Framework http://huahinframework.orgHuahin Manager • Manager to manage the MapReduce Job • Get the Job list • Get the Job detail • Kill Job • Execution Job • Run queue management • MapReduce Jar • Hive Scripts • Pig Scripts • Execution Hive Query • Execution Pig Latin • Execution is done in all the REST API. • Supported Apache Hadoop 1.0.X and 2.0.2-alpha • Supported CDH3 and CDH4
  24. 24. Huahin Framework http://huahinframework.orgHuahin Manager • For 2.0.2-alpha and CDH4 • Getting the Application list • Getting the Cluster info • Kill Application • Proxy to YARN APIs
  25. 25. Huahin Framework http://huahinframework.orgHuahin Manager • EMR Support • Setting bootstrap s3://huahin/manager/configure • Security group setting in order to access the REST API. • Security group that you set will be created during the startup of the EMR. ElasticMapReduce-master • Values to be set • Port range: 9010 • Source: IP addresses that are allowed to connect
  26. 26. Huahin Framework http://huahinframework.orgHuahin Manager Operating environment of Huahin ManagerHuahin Various HiveServer(1and 2)Manager operations Hadoop ClusterREST API
  27. 27. Huahin Framework http://huahinframework.orgHuahin EManager Manager that specializes in EMR • Manager to manage the Job Flow •Get the Job Flow list •Get the Job Flow detail •Kill Job Flow Step •Execution Job • Run queue management • Register of queue • Get the queue detail • Remove queue
  28. 28. Huahin Framework http://huahinframework.orgHuahin EManager • Register queue • The following functions can be assigned to the queue at the EMR supports. •Hive •Pig •Streaming •Custom JAR • EManager can specify the cluster size to be started. EManager assign a queue to a cluster that is free. (EMR to be a good point to bring up multiple cluster!)
  29. 29. Huahin Framework http://huahinframework.orgHuahin EManager Operating environment of Huahin EManager Huahin Manager will be started by the Various Master node operations bootstrap. On premises Amazon or EC2 Instance Elastic MapReduce Huahin Huahin Manager will EManager Various be started by the Master node operations bootstrap. Amazon Elastic MapReduce REST API※ NOTICE: Setup the security group
  30. 30. Huahin Framework http://huahinframework.orgHuahin EManager Operating environment of Huahin EManager The place that is different when EManager starts in Management Console and Tools. • EManager recycle one Job Flow Not attempt to start and end every time the EMR.Order to save costs and performances. ※ It Currently can not Management Console. However, Can be done from the command line and SDK. • However, reboot automatically when the upper limit of the number reaches 255 Step.
  31. 31. Huahin Framework http://huahinframework.orgHuahin EManager Operating environment of Huahin EManager The place that is different when EManager starts in Management Console and Tools. • It is booting for one hour • for cost(accounting and performance) • It do shutdown automatically before the timing charged. • However, if it were running the Job is carried over to the next billing timing.
  32. 32. Huahin Framework http://huahinframework.orgHuahin EManager Register queue Done using the PUT or POST method of registration of the queue. • PUT:If it have a script or JAR on the S3, It do Job Flow or only the execution of Step. • POST:Place the JAR or script in the local to S3. Boot and execution Step of Job Flow. It is a feature not in the EMR. And, option to remove the files that were POST. • All registration is done in JSON.
  33. 33. Huahin Framework http://huahinframework.org Huahin EManager Register queueExamples of PUT in the Hive:$ curl -X PUT http://localhost:9020/queue/register/hive -F ARGUMENTS={"script":"s3://huahin/wordcount.hql","arguments":["arg1","arg2"]}Optional arguments of JSONExamples of POST in the Hive:$ curl -X POST http://localhost:9020/queue/register/hive -F SCRIPT=@wordcount.hql -F ARGUMENTS={"script":"s3://huahin/wordcount.hql","arguments":["arg1","arg2"]}Optional arguments of JSONDeleted after execution by setting the "true": "deleteOnExit"It no default deleted.
  34. 34. Huahin Framework http://huahinframework.org Huahin EManager List of Job FlowExample of Get all Job Flow list:$ curl -X GET http://localhost:9020/jobflow/listExample of get running Job Flow list:$ curl -X GET http://localhost:9020/jobflow/runningsExample of Job Flow detail:$ curl -X GET http://localhost:9020/jobflow/describe/j-XXXXXXXXXXXX
  35. 35. Huahin Framework http://huahinframework.org Huahin EManager Queue APIExample of registered queue list:$ curl -X GET http://localhost:9020/queue/listExample of runnings queue list:$ curl -X GET http://localhost:9020/queue/runningsExample of get queue detail:$ curl -X GET http://localhost:9020/queue/describe/S_XXXXXXXXXXXXExample of delete queue:$ curl -X DELETE http://localhost:9020/queue/kill/S_XXXXXXXXXXXX
  36. 36. Huahin Framework http://huahinframework.orgHuahin EManager Kill of JobThere is a command to kill the Job running on Hadoop.hadoop job -kill job_XXXXXXXXXXHowever, there is no function that EMR. If start a Job bymistake, there is no choice but to terminate the Job Flow.It will be able to kill by SSH to connect to the master node ofthe EMR, type the above command.Troublesome...
  37. 37. Huahin Framework http://huahinframework.orgHuahin EManager Kill of JobIt made possible the Kill API from EManager (Manager)! Example of Step kill: $ curl -X DELETE http://localhost:9020/jobflow/kill/step/S_XXXXXXXXXXXX
  38. 38. Huahin Framework http://huahinframework.orgConclusion • Huahin Core • Unlike the Hive and Pig • When it want to use MapReduce to some extent the natural. • Huahin Tools • Still... • Huahin Manager • All REST API operation • Integration with other systems • Huahin EManager • Integration with other systems • Cost and Performance management • Kill Step of Job Flow!
  39. 39. Huahin Framework http://huahinframework.orgThe current version • Huahin Core 0.1.4 • Huahin Unit 0.1.4 • Huahin Tools 0.1.0 • Huahin Manager • 0.1.4 for Apache Hadoop 1.0.4 • 0.1.4 for CDH3 • 0.2.1 for Apache hadoop 2.0.2-alpha • 0.2.1 for CDH4 • Huahin EManager 0.1.1
  40. 40. Thanks!!!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×