Cascalog                   Programmation logique pour Hadoop                             Bertrand Dechoux   13 Octobre 201...
MapReduce : et vous? Python      ▶   map(function, iterable, ...)      ▶   reduce(function,iterable[, initializer]) Perl...
Hadoop MapReduce : la théorie Map      ▶   Map(k1,v1) -> list(k2,v2) Reduce      ▶   Reduce(k2, list (v2)) -> list(k3,v3...
Hadoop MapReduce : la théorie Map      ▶ Map(k1,v1) -> list(k2,v2)      ▶ SortByKey(list(k2,v2)) -> list(k2,v2) Reduce  ...
Hadoop MapReduce : la pratique                             public class WordCount {                                 public...
Cascading : des abstractions necessaires                                                       6Saturday, October 13, 2012
Cascading : des abstractions necessaires                                                       7Saturday, October 13, 2012
Cascading : ‘field algebra’ ?!                                       X                                                    ...
Cascalog                   programmation logique pour Hadoop (my-predicate ?var1 42 ?var3 :> ?var4 ?var5)                ...
Cascalog : select ... from ... (?<- (stdout) [?person] (person ?person))                                                 ...
Cascalog : select ... from ... (?<- (stdout) [?person] (person ?person)) (?<- (stdout) [?person ?age] (age ?person ?age)...
Cascalog : select ... from ... (?<- (stdout) [?person] (person ?person)) (?<- (stdout) [?person ?age] (age ?person ?age)...
Cascalog : select ... from ... (?<- (stdout) [?person] (person ?person)) (?<- (stdout) [?person ?age] (age ?person ?age)...
Cascalog : select ... from ... where (?<- (stdout) [?person ?age]                             (age ?person ?age)         ...
Cascalog : select ... as ... from ... (?<- (stdout) [?person ?junior]                               (age ?person ?age)   ...
Cascalog : select count(*) from ... group by ... (?<- (stdout) [?count]                             (age _ _)            ...
Cascalog : select count(*) from ... group by ... (?<- (stdout) [?junior ?count]                             (age _ ?age) ...
Cascalog : select ... from ... join ... (?<- (stdout) [?person ?age ?gender]                             (age ?person ?ag...
Cascalog : select ... from ... (select ...) (let [many-follows                  (<- [?person] (follows ?person _)        ...
Cascalog : définir vos fonctions (defn toUpperCase [person] (.toUpperCase person))     (?<- (stdout) [?PERSON]           ...
Une conclusion? ‘nouveaux’ datastores, ‘nouveaux’ types de requetage      ▶   Cascalog, RDF, Datomic, Neo4j ... Affinité...
http://blog.xebia.fr/author/bdechoux/                             @BertrandDechoux                                  ?     ...
Upcoming SlideShare
Loading in …5
×

OSDC.fr 2012 :: Cascalog : progammation logique pour Hadoop

782 views

Published on

Hadoop est devenu une référence dans l’univers du BigData, et MapReduce, un nouveau paradigme pour exploiter les données. Implémenter directement les traitements de données avec MapReduce donne certainement le plus de flexibilité, mais cela revient à utiliser de l’assembleur. Cascalog est sans doute l’alternative la plus concise. Basée sur Clojure, cette solution vous laisse dans un environnement familier (la JVM) tout en vous apportant une abstraction fort utile par le biais de la programmation logique.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
782
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

OSDC.fr 2012 :: Cascalog : progammation logique pour Hadoop

  1. 1. Cascalog Programmation logique pour Hadoop Bertrand Dechoux 13 Octobre 2012Saturday, October 13, 2012
  2. 2. MapReduce : et vous? Python ▶ map(function, iterable, ...) ▶ reduce(function,iterable[, initializer]) Perl ▶ map BLOCK LIST ▶ reduce BLOCK LIST Ruby ▶ map {|item| block} -> new_ary / collect {|item| block} -> new_ary ▶ reduce(initial,sym) -> obj / inject(initial,sym) -> obj Smalltalk ▶ collect:aBlock=TheArray ▶ inject: thisValue into: binaryBlock PHP ▶ array array_map ( callable $callback, array $arr1 [, array $...]) ▶ mixed array_reduce (array $input, callable $function [, mixed $initial = NULL]) 2Saturday, October 13, 2012
  3. 3. Hadoop MapReduce : la théorie Map ▶ Map(k1,v1) -> list(k2,v2) Reduce ▶ Reduce(k2, list (v2)) -> list(k3,v3) 3Saturday, October 13, 2012
  4. 4. Hadoop MapReduce : la théorie Map ▶ Map(k1,v1) -> list(k2,v2) ▶ SortByKey(list(k2,v2)) -> list(k2,v2) Reduce ▶ MergeByKey(list,list,...) -> list(k2,list(v2)) ▶ Reduce(k2, list (v2)) -> list(k3,v3) 4Saturday, October 13, 2012
  5. 5. Hadoop MapReduce : la pratique public class WordCount { public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); X private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } } public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "wordcount"); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); } } 5Saturday, October 13, 2012
  6. 6. Cascading : des abstractions necessaires 6Saturday, October 13, 2012
  7. 7. Cascading : des abstractions necessaires 7Saturday, October 13, 2012
  8. 8. Cascading : ‘field algebra’ ?! X 8Saturday, October 13, 2012
  9. 9. Cascalog programmation logique pour Hadoop (my-predicate ?var1 42 ?var3 :> ?var4 ?var5) 9Saturday, October 13, 2012
  10. 10. Cascalog : select ... from ... (?<- (stdout) [?person] (person ?person)) 10Saturday, October 13, 2012
  11. 11. Cascalog : select ... from ... (?<- (stdout) [?person] (person ?person)) (?<- (stdout) [?person ?age] (age ?person ?age)) 11Saturday, October 13, 2012
  12. 12. Cascalog : select ... from ... (?<- (stdout) [?person] (person ?person)) (?<- (stdout) [?person ?age] (age ?person ?age)) (?<- (stdout) [?age] (age _ ?age)) 12Saturday, October 13, 2012
  13. 13. Cascalog : select ... from ... (?<- (stdout) [?person] (person ?person)) (?<- (stdout) [?person ?age] (age ?person ?age)) (?<- (stdout) [?age] (age _ ?age)) (?<- (stdout) [?person] (age ?person 42)) 13Saturday, October 13, 2012
  14. 14. Cascalog : select ... from ... where (?<- (stdout) [?person ?age] (age ?person ?age) (< ?age 30)) 14Saturday, October 13, 2012
  15. 15. Cascalog : select ... as ... from ... (?<- (stdout) [?person ?junior] (age ?person ?age) (< ?age 30 :> ?junior)) 15Saturday, October 13, 2012
  16. 16. Cascalog : select count(*) from ... group by ... (?<- (stdout) [?count] (age _ _) (c/count ?count)) 16Saturday, October 13, 2012
  17. 17. Cascalog : select count(*) from ... group by ... (?<- (stdout) [?junior ?count] (age _ ?age) (< ?age 30 :> ?junior) (c/count ?count)) 17Saturday, October 13, 2012
  18. 18. Cascalog : select ... from ... join ... (?<- (stdout) [?person ?age ?gender] (age ?person ?age) (gender ?person ?gender)) 18Saturday, October 13, 2012
  19. 19. Cascalog : select ... from ... (select ...) (let [many-follows (<- [?person] (follows ?person _) (c/count ?count) (> ?count 2))] (?<- (stdout) [?personA ?personB] (many-follows ?personA) (many-follows ?personB) (follows ?personA ?personB))) 19Saturday, October 13, 2012
  20. 20. Cascalog : définir vos fonctions (defn toUpperCase [person] (.toUpperCase person)) (?<- (stdout) [?PERSON] (person ?person) (toUpperCase ?person :> ?PERSON)) 20Saturday, October 13, 2012
  21. 21. Une conclusion? ‘nouveaux’ datastores, ‘nouveaux’ types de requetage ▶ Cascalog, RDF, Datomic, Neo4j ... Affinitée entre le paradigme fonctionel ▶ Et les traitements de données? ▶ Et vous? Cascalog mais aussi... ... PIG 21Saturday, October 13, 2012
  22. 22. http://blog.xebia.fr/author/bdechoux/ @BertrandDechoux ? 22Saturday, October 13, 2012

×