Hadoop gets Groovy

4,063 views

Published on

Presentation on using Hadoop with the Groovy Language from Berlin Buzzwords 2012

Published in: Technology
2 Comments
9 Likes
Statistics
Notes
No Downloads
Views
Total views
4,063
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
27
Comments
2
Likes
9
Embeds 0
No embeds

No notes for slide
  • What is the knowledge/skill level of the audience?I’m taking about Groovy, not time to cover Hadoop as wellDisclaimer: I am a Groovy user, not an Expert.
  • How to describe Groovy? It depends on what you are doing with it? You can look at it and come to different conclusions based on your useIt's like Java with the datatypes they forgotIt's got ruby concepts (closures), keywords from python, and dynamic 'duck' typingIt lets you do things in the JVM that the original Java authors didn't expect
  • It's a map and reduce -in the reduce. Turtles all the way down. Or at least elephants.
  • Hadoop gets Groovy

    1. 1. Hadoop gets GroovySteve Loughran– Hortonworksstevel at hortonworks.com@steveloughranBerlin, June 2012© Hortonworks Inc. 2012
    2. 2. Where are you in this diagram? Hadoop Skills Doug,Owen Arun, Jakob Groovy Skills @steveloughran James Strachan Guillamue Laforge Page 2 © Hortonworks Inc. 2012
    3. 3. Grumpy : Groovy Hadoop Library• Something lightweight for testing• Wanted to play in the M/R layer• Already using Groovy• Liked: JVM integration, tooling, libraries, IntelliJ IDEA, Books… git@github.com:steveloughran/grumpy.git Page 3 © Hortonworks Inc. 2012
    4. 4. What is Groovy?A dynamic language within the JVM• Java++ –Maps, lists, tuples, Closures• Flavours of Ruby and Python –Duck typing, Grails, (Scripting)A way to do things in the JVM that Sun didnt imagine Page 4 © Hortonworks Inc. 2012
    5. 5. Can use & subclass java classes:class LineCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {static final def emitKey = new Text("lines")static final def one = new IntWritable(1)void map(LongWritable key, Text value, Mapper.Context context) { context.write(emitKey, one) }} Page 5 © Hortonworks Inc. 2012
    6. 6. Closures & listsclass CountReducer2 extends Reducer { def reduce(Text k, Iterable values, Reducer.Context ctx) { def sum = values.collect() {it.get() }.sum() ctx.write(k, new IntWritable(sum)); }} Page 6 © Hortonworks Inc. 2012
    7. 7. Closures & listsvalues.collect() { it.get() }.sum()List<values> -> List<int> -> int Page 7 © Hortonworks Inc. 2012
    8. 8. Result: MR jobs in GroovyIn:gate1,b46cca4d3f5f313176e50a0e38e7fde3,,2006-10-30,16:06:17,Fleurballgate1,f1191b79236083ce59981e049d863604,,2006-10-30,16:06:20,vklaptopgate1,b45c7795f5be038dda8615ab44676872,,2006-10-30,16:06:21,Franky Pankygate1,02e73779c77fcd4e9f90a193c4f3e7ff,,2006-10-30,16:06:23,gate1,eef1836efddf8dbfe5e2a3cd5c13745f,,2006-10-30,16:06:24,Vasgate1,b46cca4d3f5f313176e50a0e38e7fde3,,2006-10-30,16:06:32,Fleurballgate1,f1191b79236083ce59981e049d863604,,2006-10-30,16:06:36,vklaptopgate1,b45c7795f5be038dda8615ab44676872,,2006-10-30,16:06:37,Franky Pankygate1,eef1836efddf8dbfe5e2a3cd5c13745f,,2006-10-30,16:06:38,Vasgate1,02e73779c77fcd4e9f90a193c4f3e7ff,,2006-10-30,16:06:43,gate1,2afaf990ce75f0a7208f7f012c8d12ad,,2006-10-30,16:06:54,SmileyOut: 163,198,223 device sightings! Page 8 © Hortonworks Inc. 2012
    9. 9. why no Pig? Sliding Window Debouncevoid map(LongWritable key, BlueEvent event, Mapper.Context context) { BlueEvent ev2 = window.insert(event) List<BlueEvent> expired = window.purgeExpired(event) expired.each { evt -> emit(context, evt) }}void cleanup(Mapper.Context context) { window.each { evt -> emit(context, evt) }} Page 9 © Hortonworks Inc. 2012
    10. 10. Device sightings by day for 20071600000140000012000001000000800000600000400000200000 0 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 131 136 141 146 151 156 161 166 171 176 181 186 191 196 201 206 211 216 221 226 231 236 241 246 251 256 261 266 271 276 281 286 291 296 301 306 311 316 321 326 331 336 341 346 351 356 361 366 Page 10 © Hortonworks Inc. 2012
    11. 11. Improving Hadoop APIsConfiguration.metaClass.setAt = { key, val -> set(key.toString(), val.toString())}Configuration.metaClass.getAt = { key -> get(key)}Configuration.metaClass.add = {map -> map.each {elt -> set((elt.key).toString(), (elt.value).toString() )} Page 11 © Hortonworks Inc. 2012
    12. 12. & Configuration gets betterconf[mapscript] = new File(src).textString scriptText = conf[mapscript]conf.add([ window:60000, redscript:reduceScript ])Extending to Job class trickier –subclassing better Page 12 © Hortonworks Inc. 2012
    13. 13. New today! script driven MR jobs!protected void setup(Mapper.Context ctx) { this.ctx = ctx this.conf = ctx.configuration ScriptCompiler comp = new ScriptCompiler(conf) String scriptText = conf[mapscript] map = comp.parse(scriptText, this, ctx) } protected void map(Writable key, Writable value, Mapper.Context ctx) { map.setProperty(key,key) map.setProperty(value,value) map.run() } Page 13 © Hortonworks Inc. 2012
    14. 14. Things to consider• Performance: Groovy 2 on Java7• False friends -Types, if(), exceptions• If you can use Pig, use it.• Use Groovy for testing, extending Hadoop classes (output formatter, etc)• Play with YARN and Giraph with it Page 14 © Hortonworks Inc. 2012
    15. 15. Questions?hortonworks.com Page 15 © Hortonworks Inc. 2012
    16. 16. hortonworks.com Page 16 © Hortonworks Inc. 2012
    17. 17. Performance?• Groovy 1 over-introspects• HLL hides a lot of overhead• If your work is I/O bound, less important• Speed of development vs execution• Need to benchmark on Java 7 Page 17 © Hortonworks Inc. 2012

    ×