• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Huahin Framework for Hadoop, Hadoop Conference Japan 2013 Winter
 

Huahin Framework for Hadoop, Hadoop Conference Japan 2013 Winter

on

  • 2,038 views

Huahin Framework for Hadoop

Huahin Framework for Hadoop
Hadoop Conference Japan 2013 Winter

Statistics

Views

Total Views
2,038
Views on SlideShare
2,026
Embed Views
12

Actions

Likes
8
Downloads
62
Comments
0

2 Embeds 12

https://twitter.com 7
http://192.168.6.179 5

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Huahin Framework for Hadoop, Hadoop Conference Japan 2013 Winter Huahin Framework for Hadoop, Hadoop Conference Japan 2013 Winter Presentation Transcript

    • Hadoop Conference Japan 2013 Winter Huahin Framework for Hadoop JJaann 2211,, 22001133 @@rryyuu__kkoobbaayyaasshhii
    • • Ryu Kobayashi (@ryu_kobayashi)• BrainPad Inc.• Hadoop, Cassandra, Machine Learning, ... AD Now on sale!!!
    • What is HuahinFramework?
    • Huahin Frameworkhttp://huahinframework.org
    • Hadoop Family Logo is ...
    • Huahin logo is ...Very very very cute!
    • Huahin Framework http://huahinframework.orgWe released some software which developed in an officein June 2012 as OSS. * It is what was used in the panel log analysis. * Please refer to the slide of the "Hadoop ConferenceJapan 2011 Fall" for more information.  http://goo.gl/C9tzfHuahin Framework is a general term for multipleproducts.
    • Huahin Framework http://huahinframework.orgThe origin of the name of Huahin Framework There is a custom to decide on a wine region in the code name of the office. Huahin = Hua Hin = Tourist destinations in Thailand = Wine region When it comes to Thailand... Tt is the elephant ! As such, Huahin image
    • Huahin Framework http://huahinframework.orgHuahin Framework Configuration Main is consists of the following elements: • Huahin Core • Huahin Tools • Huahin Manager
    • Huahin Framework http://huahinframework.orgHuahin Core • Simplified MapReduce programs • Do not have to write it yourself Writable and Secondary Sort • The basic grouping, sorting, etc., the idea from SQL • If you want to write, can write natural MapReduce • C++ is the same as a superset of C • It can do Hive or Pig. However, if it really want to give the performances.(Parallel computation, etc...) • There Huahin Unit as a test driver • Wraps the MRUnit • Example of implementation
    • Huahin Framework http://huahinframework.orgHuahin Example • Page top 10 rank exampleFirst, natural MapReduce.Second, Huahin MapReduce.
    • Huahin Framework http://huahinframework.org Data of page top 10 rank Example: format is Tab delimitedJan 21, 2013 user1 /index.htmlJan 21, 2013 user1 /index2.htmlJan 21, 2013 user2 /contents/foo.htmlJan 21, 2013 user42 /bar.htmlJan 21, 2013 user3 /index.htmlJan 21, 2013 user7 /news/index.htmlJan 21, 2013 user4 /release/2013.htmlJan 21, 2013 user3 /index2.htmlJan 21, 2013 user7 /download.htmlJan 21, 2013 user5 /bar.htmlJan 21, 2013 user12 /release/2012.htmlJan 21, 2013 user5 /contents/foo.htmlJan 21, 2013 user23 /page2.htmlJan 21, 2013 user53 /news.htmlJan 21, 2013 user6 /download.htmlJan 21, 2013 user21 /bar.htmlJan 21, 2013 user18 /index.html
    • Huahin Framework http://huahinframework.orgPage top 10 rank of natural MapReduce JobTools public class PathTop10RankJobTool extends Configured implements Tool { @Override public int run(String[] arg0) throws Exception { Job firstJob = new Job(getConf(), "first"); firstJob.setJarByClass(PathTop10RankJobTool.class); TextInputFormat.setInputPaths(firstJob, "input"); firstJob.setInputFormatClass(TextInputFormat.class); firstJob.setMapperClass(PathTop10RankFirstMapper.class); firstJob.setMapOutputKeyClass(FirstKeyWritable.class); firstJob.setMapOutputValueClass(IntWritable.class); firstJob.setReducerClass(PathTop10RankFirstReducer.class); firstJob.setOutputKeyClass(SecondKeyWritable.class); firstJob.setOutputValueClass(IntWritable.class); SequenceFileOutputFormat.setOutputPath(firstJob, new Path("first")); firstJob.setOutputFormatClass(SequenceFileOutputFormat.class); if (!firstJob.waitForCompletion(true)) { return -1; } Job secondJob = new Job(getConf(), "second"); secondJob.setJarByClass(PathTop10RankJobTool.class); SequenceFileInputFormat.setInputPaths(secondJob, "first"); secondJob.setInputFormatClass(SequenceFileInputFormat.class); secondJob.setMapperClass(Mapper.class); secondJob.setMapOutputKeyClass(SecondKeyWritable.class); secondJob.setMapOutputValueClass(IntWritable.class); secondJob.setGroupingComparatorClass(PathTop10RankGroupingComparatorClass.class); secondJob.setPartitionerClass(PathTop10RankPartitioner.class); secondJob.setSortComparatorClass(PathTop10RankingSortComparator.class); secondJob.setReducerClass(PathTop10RankSecondReducer.class); secondJob.setOutputKeyClass(SecondKeyWritable.class); secondJob.setOutputValueClass(IntWritable.class); TextOutputFormat.setOutputPath(secondJob, new Path("output")); secondJob.setOutputFormatClass(TextOutputFormat.class); return secondJob.waitForCompletion(true) ? 0 : -1; } }
    • Huahin Framework http://huahinframework.orgPage top 10 rank of natural MapReduce FirstMapper public class PathTop10RankFirstMapper extends Mapper<LongWritable, Text, FirstKeyWritable, IntWritable> { private IntWritable ONE = new IntWritable(1); @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String[] s = value.toString().split("t"); context.write(new FirstKeyWritable(s[0], s[2]), ONE); } } FirstReducer public class PathTop10RankFirstReducer extends Reducer<FirstKeyWritable, IntWritable, SecondKeyWritable, IntWritable> { @Override protected void reduce(FirstKeyWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int pv = 0; for (IntWritable i : values) { pv += i.get(); } context.write( new SecondKeyWritable(key.getDate().toString(), key.getPage().toString(), pv), new IntWritable(pv)); } }
    • Huahin Framework http://huahinframework.orgPage top 10 rank of natural MapReduce SecondReducer public class PathTop10RankSecondReducer extends Reducer<SecondKeyWritable, IntWritable, SecondKeyWritable, IntWritable> { @Override protected void reduce(SecondKeyWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int rank = 0; for (IntWritable i : values) { if (rank > 10) { break; } context.write(key, i); rank++; } } }
    • Huahin Framework http://huahinframework.org Page top 10 rank of natural MapReduce FirstKeyWritable SecondKeyWritable public class SecondKeyWritable implements WritableComparable<SecondKeyWritable> {public class FirstKeyWritable implements WritableComparable<FirstKeyWritable> { private Text date = new Text(); private Text date = new Text(); private Text page = new Text(); private Text page = new Text(); private IntWritable pv = new IntWritable(); public FirstKeyWritable() { public SecondKeyWritable() { } } public FirstKeyWritable(String date, String page) { public SecondKeyWritable(String date, String page, int pv) { this.date.set(date); this.date.set(date); this.page.set(page); this.page.set(page); } this.pv.set(pv); } @Override public void readFields(DataInput in) throws IOException { @Override this.date.readFields(in); public void readFields(DataInput in) throws IOException { this.page.readFields(in); this.date.readFields(in); } this.page.readFields(in); this.pv.readFields(in); @Override } public void write(DataOutput out) throws IOException { this.date.write(out); @Override this.page.write(out); public void write(DataOutput out) throws IOException { } this.date.write(out); this.page.write(out); @Override this.pv.write(out); public int compareTo(FirstKeyWritable o) { } int compare = this.date.toString().compareTo(o.date.toString()); if (compare != 0) { @Override return compare; public int compareTo(SecondKeyWritable o) { } return this.date.toString().compareTo(o.date.toString()); return this.page.toString().compareTo(o.page.toString()); } } @Override @Override public boolean equals(Object obj) { public boolean equals(Object obj) { if (obj == null) { if (obj == null) { return false; return false; } } if (!(obj instanceof SecondKeyWritable)) { if (!(obj instanceof FirstKeyWritable)) { return false; return false; } } SecondKeyWritable o = (SecondKeyWritable) obj; FirstKeyWritable o = (FirstKeyWritable) obj; return this.date.equals(o.getDate()); return this.date.equals(o.getDate()) && } this.page.equals(o.getPage()); } @Override public String toString() { /** return this.date + "t" + this.page; * @return the date } */ public Text getDate() { /** return date; * @return the date } */ public Text getDate() { /** return date; * @param date the date to set } */ public void setDate(Text date) { /** this.date = date; * @param date the date to set } */ public void setDate(Text date) { /** this.date = date; * @return the page } */ public Text getPage() { /** return page; * @return the page } */ public Text getPage() { /** return page; * @param page the page to set } */ public void setPage(Text page) { /** this.page = page; * @param page the page to set } */} public void setPage(Text page) { this.page = page; } /** * @return the pv */ public IntWritable getPv() { return pv; } /** * @param pv the pv to set */ public void setPv(IntWritable pv) { this.pv = pv; } }
    • Huahin Framework http://huahinframework.org Page top 10 rank of natural MapReduce GroupingComparator SortComparatorpublic class PathTop10RankGroupingComparatorClass extends WritableComparator { public class PathTop10RankingSortComparator extends WritableComparator { public PathTop10RankGroupingComparatorClass() { public PathTop10RankingSortComparator() { super(SecondKeyWritable.class, true); super(SecondKeyWritable.class, true); } } @SuppressWarnings({ "rawtypes", "unchecked" }) @SuppressWarnings({ "rawtypes", "unchecked" }) @Override @Override public int compare(Object a, Object b) { public int compare(Object a, Object b) { if (a instanceof SecondKeyWritable && b instanceof SecondKeyWritable) { if (a instanceof SecondKeyWritable && b instanceof SecondKeyWritable) { Comparable one = SecondKeyWritable.class.cast(a).getDate(); Comparable one = SecondKeyWritable.class.cast(a).getDate(); Comparable another = SecondKeyWritable.class.cast(b).getDate(); Comparable another = SecondKeyWritable.class.cast(b).getDate(); return one.compareTo(another); } int compare = one.compareTo(another); return super.compare(a, b); if (compare != 0) { } return compare;} } Comparable oneOrder = SecondKeyWritable.class.cast(a).getPv(); Comparable anotherOrder = SecondKeyWritable.class.cast(b).getPv(); return oneOrder.compareTo(anotherOrder); } return super.compare(a, b); Partitioner } }public class PathTop10RankPartitioner extends Partitioner<SecondKeyWritable, IntWritable> { @Override public int getPartition(SecondKeyWritable key, IntWritable value, int numPartitioner) { return Math.abs(key.getDate().hashCode()) % numPartitioner; }}
    • Huahin Framework http://huahinframework.orgPage top 10 rank of natural MapReduce • This is a very long ... • About 307 lines
    • Huahin Framework http://huahinframework.org Page top 10 rank of Huahin MapReduce JobTools FirstSummarizerpublic class PathRankingJobTool extends SimpleJobTool { public class FirstSummarizer extends Summarizer { @Override @Override protected String setInputPath(String[] args) { public void init() { return args[0]; } } @Override @Override public void summarize(Writer writer) protected String setOutputPath(String[] args) { throws IOException, InterruptedException { return args[1]; int pv = 0; } while (hasNext()) { Record record = next(writer); /* (non-Javadoc) pv += record.getValueInteger("PV"); * @see org.huahin.core.SimpleJobTool#setup() } */ @Override Record emitRecord = new Record(); protected void setup() throws Exception { emitRecord.addGrouping("DATE", getGroupingRecord().getGroupingString("DATE")); final String[] labels = new String[] { "DATE", "USER", "URL" }; emitRecord.addSort(pv, Record.SORT_UPPER, 1); emitRecord.addValue("PATH", getGroupingRecord().getGroupingString("PATH")); SimpleJob job1 = addJob(labels, StringUtil.TAB); emitRecord.addValue("PV", pv); job1.setFilter(FirstFilter.class); writer.write(emitRecord); job1.setSummarizer(FirstSummarizer.class); } SimpleJob job2 = addJob(); @Override job2.setSummarizer(SecondSummarizer.class); public void summarizerSetup() { } }} } FirstFilter SecondSummarizerpublic class FirstFilter extends Filter { public class SecondSummarizer extends Summarizer { @Override @Override public void init() { public void init() { } } @Override @Override public void filter(Record record, Writer writer) public void summarize(Writer writer) throws IOException, InterruptedException { throws IOException, InterruptedException { Record emitRecord = new Record(); int rank = 1; emitRecord.addGrouping("DATE", record.getValueString("DATE")); while (hasNext()) { emitRecord.addGrouping("PATH", record.getValueString("URL")); if (rank > 10) { emitRecord.addValue("PV", 1); break; writer.write(emitRecord); } } Record record = next(writer); @Override Record emitRecord = new Record(); public void filterSetup() { emitRecord.addValue("PATH", record.getValueString("PATH")); } emitRecord.addValue("UU", record.getValueInteger("UU"));} writer.write(emitRecord); rank++; } } @Override public void summarizerSetup() { } }
    • Huahin Framework http://huahinframework.orgPage top 10 rank of Huahin MapReduce • This is a very short!! • About 100 lines
    • Huahin Framework http://huahinframework.orgHuahin Core • Other • Simple Join • Big Join • etc ...
    • Huahin Framework http://huahinframework.orgHuahin Tools • A collection of tools generic operation. • Currently only Apache Log molding... • Operating environment • On Premises Hadoop • Stand Alone • Multi Thread execution for small data • EMR • S3://huahin/tools/huahin-tools.0.1.0.jar
    • Huahin Framework http://huahinframework.orgHuahin Manager • Manager to manage the MapReduce Job • Get the Job list • Get the Job detail • Kill Job • Execution Job • Run queue management • MapReduce Jar • Hive Scripts • Pig Scripts • Execution Hive Query • Execution Pig Latin • Execution is done in all the REST API. • Supported Apache Hadoop 1.0.X and 2.0.2-alpha • Supported CDH3 and CDH4
    • Huahin Framework http://huahinframework.orgHuahin Manager • For 2.0.2-alpha and CDH4 • Getting the Application list • Getting the Cluster info • Kill Application • Proxy to YARN APIs
    • Huahin Framework http://huahinframework.orgHuahin Manager • EMR Support • Setting bootstrap s3://huahin/manager/configure • Security group setting in order to access the REST API. • Security group that you set will be created during the startup of the EMR. ElasticMapReduce-master • Values to be set • Port range: 9010 • Source: IP addresses that are allowed to connect
    • Huahin Framework http://huahinframework.orgHuahin Manager Operating environment of Huahin ManagerHuahin Various HiveServer(1and 2)Manager operations Hadoop ClusterREST API
    • Huahin Framework http://huahinframework.orgHuahin EManager Manager that specializes in EMR • Manager to manage the Job Flow •Get the Job Flow list •Get the Job Flow detail •Kill Job Flow Step •Execution Job • Run queue management • Register of queue • Get the queue detail • Remove queue
    • Huahin Framework http://huahinframework.orgHuahin EManager • Register queue • The following functions can be assigned to the queue at the EMR supports. •Hive •Pig •Streaming •Custom JAR • EManager can specify the cluster size to be started. EManager assign a queue to a cluster that is free. (EMR to be a good point to bring up multiple cluster!)
    • Huahin Framework http://huahinframework.orgHuahin EManager Operating environment of Huahin EManager Huahin Manager will be started by the Various Master node operations bootstrap. On premises Amazon or EC2 Instance Elastic MapReduce Huahin Huahin Manager will EManager Various be started by the Master node operations bootstrap. Amazon Elastic MapReduce REST API※ NOTICE: Setup the security group
    • Huahin Framework http://huahinframework.orgHuahin EManager Operating environment of Huahin EManager The place that is different when EManager starts in Management Console and Tools. • EManager recycle one Job Flow Not attempt to start and end every time the EMR.Order to save costs and performances. ※ It Currently can not Management Console. However, Can be done from the command line and SDK. • However, reboot automatically when the upper limit of the number reaches 255 Step.
    • Huahin Framework http://huahinframework.orgHuahin EManager Operating environment of Huahin EManager The place that is different when EManager starts in Management Console and Tools. • It is booting for one hour • for cost(accounting and performance) • It do shutdown automatically before the timing charged. • However, if it were running the Job is carried over to the next billing timing.
    • Huahin Framework http://huahinframework.orgHuahin EManager Register queue Done using the PUT or POST method of registration of the queue. • PUT:If it have a script or JAR on the S3, It do Job Flow or only the execution of Step. • POST:Place the JAR or script in the local to S3. Boot and execution Step of Job Flow. It is a feature not in the EMR. And, option to remove the files that were POST. • All registration is done in JSON.
    • Huahin Framework http://huahinframework.org Huahin EManager Register queueExamples of PUT in the Hive:$ curl -X PUT http://localhost:9020/queue/register/hive -F ARGUMENTS={"script":"s3://huahin/wordcount.hql","arguments":["arg1","arg2"]}Optional arguments of JSONExamples of POST in the Hive:$ curl -X POST http://localhost:9020/queue/register/hive -F SCRIPT=@wordcount.hql -F ARGUMENTS={"script":"s3://huahin/wordcount.hql","arguments":["arg1","arg2"]}Optional arguments of JSONDeleted after execution by setting the "true": "deleteOnExit"It no default deleted.
    • Huahin Framework http://huahinframework.org Huahin EManager List of Job FlowExample of Get all Job Flow list:$ curl -X GET http://localhost:9020/jobflow/listExample of get running Job Flow list:$ curl -X GET http://localhost:9020/jobflow/runningsExample of Job Flow detail:$ curl -X GET http://localhost:9020/jobflow/describe/j-XXXXXXXXXXXX
    • Huahin Framework http://huahinframework.org Huahin EManager Queue APIExample of registered queue list:$ curl -X GET http://localhost:9020/queue/listExample of runnings queue list:$ curl -X GET http://localhost:9020/queue/runningsExample of get queue detail:$ curl -X GET http://localhost:9020/queue/describe/S_XXXXXXXXXXXXExample of delete queue:$ curl -X DELETE http://localhost:9020/queue/kill/S_XXXXXXXXXXXX
    • Huahin Framework http://huahinframework.orgHuahin EManager Kill of JobThere is a command to kill the Job running on Hadoop.hadoop job -kill job_XXXXXXXXXXHowever, there is no function that EMR. If start a Job bymistake, there is no choice but to terminate the Job Flow.It will be able to kill by SSH to connect to the master node ofthe EMR, type the above command.Troublesome...
    • Huahin Framework http://huahinframework.orgHuahin EManager Kill of JobIt made possible the Kill API from EManager (Manager)! Example of Step kill: $ curl -X DELETE http://localhost:9020/jobflow/kill/step/S_XXXXXXXXXXXX
    • Huahin Framework http://huahinframework.orgConclusion • Huahin Core • Unlike the Hive and Pig • When it want to use MapReduce to some extent the natural. • Huahin Tools • Still... • Huahin Manager • All REST API operation • Integration with other systems • Huahin EManager • Integration with other systems • Cost and Performance management • Kill Step of Job Flow!
    • Huahin Framework http://huahinframework.orgThe current version • Huahin Core 0.1.4 • Huahin Unit 0.1.4 • Huahin Tools 0.1.0 • Huahin Manager • 0.1.4 for Apache Hadoop 1.0.4 • 0.1.4 for CDH3 • 0.2.1 for Apache hadoop 2.0.2-alpha • 0.2.1 for CDH4 • Huahin EManager 0.1.1
    • Thanks!!!