Hadoop + Clojure
  Hadoop World NYC
Friday, October 2, 2009

Stuart Sierra, AltLaw.org
JVM Languages
                              Object
             Functional      Oriented

Native to                       ...
Clojure

●   a new Lisp,
    neither Common Lisp nor Scheme
●   Dynamic, Functional
●   Immutability and concurrency
●   H...
Clojure Primitive Types
String       "Hello, World!n"
Integer      42
Double       2.0e64
BigInteger   9223372036854775808...
Clojure Collections
List   (print :hello "NYC")

Vector [:eat "Pie" 3.14159]

Map    {:lisp 1   "The Rest" 0}

Set    #{2 ...
public void greet(String name) {
  System.out.println("Hi, " + name);
}

greet("New York");
Hi, New York


(defn greet [na...
public double average(double[] nums) {
  double total = 0;
  for (int i = 0; i < nums.length; i++) {
    total += nums[i];...
Data Structures as Functions
(def m {:f "foo"     (def s #{1 5 3})
        :b "bar"})
                     (s 3)
(m :f)   ...
(import '(com.example.package
            MyClass YourClass))

(. object method arguments)

(new MyClass arguments)


(.me...
...open a stream...
try {
    ...do stuff with the stream...
} finally {
    stream.close();
}

(defmacro with-open [args ...
synchronous   asynchronous


coordinated      ref

independent     atom         agent

unshared         var
(map function values)
         list of values
(reduce function values)
         single value


mapper(key, value)
        ...
public static class MapClass extends MapReduceBase
  implements Mapper<LongWritable, Text, Text, IntWritable> {

    priva...
(mapper key value)
        list of key-value pairs

(reducer key values)
        list of key-value pairs
Clojure-Hadoop 1
(defn mapper-map [this key val out reporter]
  (doseq [word (enumeration-seq
                (StringToken...
Clojure-Hadoop 2
(defn my-map [key value]
   (map (fn [token] [token 1])
        (enumeration-seq (StringTokenizer. value)...
Clojure print/read
        read



                STRING
DATA




        print
Clojure-Hadoop 3
(defn my-map [key val]
 (map (fn [token] [token 1])
      (enumeration-seq (StringTokenizer. val))))

(de...
public static class MapClass extends MapReduceBase
  implements Mapper<LongWritable, Text, Text, IntWritable> {

    priva...
Clojure-Hadoop 3
(defn my-map [key val]
 (map (fn [token] [token 1])
      (enumeration-seq (StringTokenizer. val))))

(de...
More
●   http://clojure.org/
●   Google Groups: Clojure
●   #clojure on irc.freenode.net
●   #clojure on Twitter
●   http:...
Upcoming SlideShare
Loading in...5
×

Hadoop + Clojure

4,593

Published on

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,593
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
54
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Hadoop + Clojure

  1. 1. Hadoop + Clojure Hadoop World NYC Friday, October 2, 2009 Stuart Sierra, AltLaw.org
  2. 2. JVM Languages Object Functional Oriented Native to Groovy the JVM Clojure Scala Ported to JRuby Armed Bear CL the JVM Jython Kawa Rhino Java is dead, long live the JVM
  3. 3. Clojure ● a new Lisp, neither Common Lisp nor Scheme ● Dynamic, Functional ● Immutability and concurrency ● Hosted on the JVM ● Open Source (Eclipse Public License)
  4. 4. Clojure Primitive Types String "Hello, World!n" Integer 42 Double 2.0e64 BigInteger 9223372036854775808 BigDecimal 1.0M Ratio 3/4 Boolean true, false Symbol foo Keyword :foo null nil
  5. 5. Clojure Collections List (print :hello "NYC") Vector [:eat "Pie" 3.14159] Map {:lisp 1 "The Rest" 0} Set #{2 1 3 5 "Eureka"} Homoiconicity
  6. 6. public void greet(String name) { System.out.println("Hi, " + name); } greet("New York"); Hi, New York (defn greet [name] (println "Hello," name)) (greet "New York") Hello, New York
  7. 7. public double average(double[] nums) { double total = 0; for (int i = 0; i < nums.length; i++) { total += nums[i]; } return total / nums.length; } (defn average [& nums] (/ (reduce + nums) (count nums))) (average 1 2 3 4) 5/2
  8. 8. Data Structures as Functions (def m {:f "foo" (def s #{1 5 3}) :b "bar"}) (s 3) (m :f) true "foo" (s 7) (:b m) false "bar"
  9. 9. (import '(com.example.package MyClass YourClass)) (. object method arguments) (new MyClass arguments) (.method object arguments) Syntactic (MyClass. arguments) Sugar (MyClass/staticMethod)
  10. 10. ...open a stream... try { ...do stuff with the stream... } finally { stream.close(); } (defmacro with-open [args & body] `(let ~args (try ~@body (finally (.close ~(first args)))))) (with-open [stream (...open a stream...)] ...do stuff with stream...)
  11. 11. synchronous asynchronous coordinated ref independent atom agent unshared var
  12. 12. (map function values) list of values (reduce function values) single value mapper(key, value) list of key-value pairs reducer(key, values) list of key-value pairs
  13. 13. public static class MapClass extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer itr = new StringTokenizer(line); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); output.collect(word, one); } } } public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } }
  14. 14. (mapper key value) list of key-value pairs (reducer key values) list of key-value pairs
  15. 15. Clojure-Hadoop 1 (defn mapper-map [this key val out reporter] (doseq [word (enumeration-seq (StringTokenizer. (str val)))] (.collect out (Text. word) (IntWritable. 1)))) (defn reducer-reduce [this key vals out reprter] (let [sum (reduce + (map (fn [w] (.get w)) (iterator-seq values)))] (.collect output key (IntWritable. sum)))) (gen-job-classes)
  16. 16. Clojure-Hadoop 2 (defn my-map [key value] (map (fn [token] [token 1]) (enumeration-seq (StringTokenizer. value)))) (def mapper-map (wrap-map my-map int-string-map-reader)) (defn my-reduce [key values] [[key (reduce + values)]]) (def reducer-reduce (wrap-reduce my-reduce)) (gen-job-classes)
  17. 17. Clojure print/read read STRING DATA print
  18. 18. Clojure-Hadoop 3 (defn my-map [key val] (map (fn [token] [token 1]) (enumeration-seq (StringTokenizer. val)))) (defn my-reduce [key values] [[key (reduce + values)]]) (defjob job :map my-map :map-reader int-string-map-reader :reduce my-reduce :inputformat :text)
  19. 19. public static class MapClass extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer itr = new StringTokenizer(line); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); output.collect(word, one); } } } public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } }
  20. 20. Clojure-Hadoop 3 (defn my-map [key val] (map (fn [token] [token 1]) (enumeration-seq (StringTokenizer. val)))) (defn my-reduce [key values] [[key (reduce + values)]]) (defjob job :map my-map :map-reader int-string-map-reader :reduce my-reduce :inputformat :text)
  21. 21. More ● http://clojure.org/ ● Google Groups: Clojure ● #clojure on irc.freenode.net ● #clojure on Twitter ● http://richhickey.github.com/clojure-contrib ● http://stuartsierra.com/ ● http://github.com/stuartsierra ● http://www.altlaw.org/
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×