Your SlideShare is downloading. ×
0
Hadoop gets Groovy
Hadoop gets Groovy
Hadoop gets Groovy
Hadoop gets Groovy
Hadoop gets Groovy
Hadoop gets Groovy
Hadoop gets Groovy
Hadoop gets Groovy
Hadoop gets Groovy
Hadoop gets Groovy
Hadoop gets Groovy
Hadoop gets Groovy
Hadoop gets Groovy
Hadoop gets Groovy
Hadoop gets Groovy
Hadoop gets Groovy
Hadoop gets Groovy
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Hadoop gets Groovy

3,163

Published on

Presentation on using Hadoop with the Groovy Language from Berlin Buzzwords 2012

Presentation on using Hadoop with the Groovy Language from Berlin Buzzwords 2012

Published in: Technology
2 Comments
8 Likes
Statistics
Notes
No Downloads
Views
Total Views
3,163
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
22
Comments
2
Likes
8
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • What is the knowledge/skill level of the audience?I’m taking about Groovy, not time to cover Hadoop as wellDisclaimer: I am a Groovy user, not an Expert.
  • How to describe Groovy? It depends on what you are doing with it? You can look at it and come to different conclusions based on your useIt's like Java with the datatypes they forgotIt's got ruby concepts (closures), keywords from python, and dynamic 'duck' typingIt lets you do things in the JVM that the original Java authors didn't expect
  • It's a map and reduce -in the reduce. Turtles all the way down. Or at least elephants.
  • Transcript

    • 1. Hadoop gets GroovySteve Loughran– Hortonworksstevel at hortonworks.com@steveloughranBerlin, June 2012© Hortonworks Inc. 2012
    • 2. Where are you in this diagram? Hadoop Skills Doug,Owen Arun, Jakob Groovy Skills @steveloughran James Strachan Guillamue Laforge Page 2 © Hortonworks Inc. 2012
    • 3. Grumpy : Groovy Hadoop Library• Something lightweight for testing• Wanted to play in the M/R layer• Already using Groovy• Liked: JVM integration, tooling, libraries, IntelliJ IDEA, Books… git@github.com:steveloughran/grumpy.git Page 3 © Hortonworks Inc. 2012
    • 4. What is Groovy?A dynamic language within the JVM• Java++ –Maps, lists, tuples, Closures• Flavours of Ruby and Python –Duck typing, Grails, (Scripting)A way to do things in the JVM that Sun didnt imagine Page 4 © Hortonworks Inc. 2012
    • 5. Can use & subclass java classes:class LineCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {static final def emitKey = new Text("lines")static final def one = new IntWritable(1)void map(LongWritable key, Text value, Mapper.Context context) { context.write(emitKey, one) }} Page 5 © Hortonworks Inc. 2012
    • 6. Closures & listsclass CountReducer2 extends Reducer { def reduce(Text k, Iterable values, Reducer.Context ctx) { def sum = values.collect() {it.get() }.sum() ctx.write(k, new IntWritable(sum)); }} Page 6 © Hortonworks Inc. 2012
    • 7. Closures & listsvalues.collect() { it.get() }.sum()List<values> -> List<int> -> int Page 7 © Hortonworks Inc. 2012
    • 8. Result: MR jobs in GroovyIn:gate1,b46cca4d3f5f313176e50a0e38e7fde3,,2006-10-30,16:06:17,Fleurballgate1,f1191b79236083ce59981e049d863604,,2006-10-30,16:06:20,vklaptopgate1,b45c7795f5be038dda8615ab44676872,,2006-10-30,16:06:21,Franky Pankygate1,02e73779c77fcd4e9f90a193c4f3e7ff,,2006-10-30,16:06:23,gate1,eef1836efddf8dbfe5e2a3cd5c13745f,,2006-10-30,16:06:24,Vasgate1,b46cca4d3f5f313176e50a0e38e7fde3,,2006-10-30,16:06:32,Fleurballgate1,f1191b79236083ce59981e049d863604,,2006-10-30,16:06:36,vklaptopgate1,b45c7795f5be038dda8615ab44676872,,2006-10-30,16:06:37,Franky Pankygate1,eef1836efddf8dbfe5e2a3cd5c13745f,,2006-10-30,16:06:38,Vasgate1,02e73779c77fcd4e9f90a193c4f3e7ff,,2006-10-30,16:06:43,gate1,2afaf990ce75f0a7208f7f012c8d12ad,,2006-10-30,16:06:54,SmileyOut: 163,198,223 device sightings! Page 8 © Hortonworks Inc. 2012
    • 9. why no Pig? Sliding Window Debouncevoid map(LongWritable key, BlueEvent event, Mapper.Context context) { BlueEvent ev2 = window.insert(event) List<BlueEvent> expired = window.purgeExpired(event) expired.each { evt -> emit(context, evt) }}void cleanup(Mapper.Context context) { window.each { evt -> emit(context, evt) }} Page 9 © Hortonworks Inc. 2012
    • 10. Device sightings by day for 20071600000140000012000001000000800000600000400000200000 0 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 131 136 141 146 151 156 161 166 171 176 181 186 191 196 201 206 211 216 221 226 231 236 241 246 251 256 261 266 271 276 281 286 291 296 301 306 311 316 321 326 331 336 341 346 351 356 361 366 Page 10 © Hortonworks Inc. 2012
    • 11. Improving Hadoop APIsConfiguration.metaClass.setAt = { key, val -> set(key.toString(), val.toString())}Configuration.metaClass.getAt = { key -> get(key)}Configuration.metaClass.add = {map -> map.each {elt -> set((elt.key).toString(), (elt.value).toString() )} Page 11 © Hortonworks Inc. 2012
    • 12. & Configuration gets betterconf[mapscript] = new File(src).textString scriptText = conf[mapscript]conf.add([ window:60000, redscript:reduceScript ])Extending to Job class trickier –subclassing better Page 12 © Hortonworks Inc. 2012
    • 13. New today! script driven MR jobs!protected void setup(Mapper.Context ctx) { this.ctx = ctx this.conf = ctx.configuration ScriptCompiler comp = new ScriptCompiler(conf) String scriptText = conf[mapscript] map = comp.parse(scriptText, this, ctx) } protected void map(Writable key, Writable value, Mapper.Context ctx) { map.setProperty(key,key) map.setProperty(value,value) map.run() } Page 13 © Hortonworks Inc. 2012
    • 14. Things to consider• Performance: Groovy 2 on Java7• False friends -Types, if(), exceptions• If you can use Pig, use it.• Use Groovy for testing, extending Hadoop classes (output formatter, etc)• Play with YARN and Giraph with it Page 14 © Hortonworks Inc. 2012
    • 15. Questions?hortonworks.com Page 15 © Hortonworks Inc. 2012
    • 16. hortonworks.com Page 16 © Hortonworks Inc. 2012
    • 17. Performance?• Groovy 1 over-introspects• HLL hides a lot of overhead• If your work is I/O bound, less important• Speed of development vs execution• Need to benchmark on Java 7 Page 17 © Hortonworks Inc. 2012

    ×