Cloud Computing        i    Hadoop           X JPL   Barcelona, 01/07/2011    Marc de Palol       @lant
Qui sóc ?
Qui sóc ?
Qui sóc ?
Qui sóc ?
Qui sóc ?
Qui sóc ?
Grid Computing vs Cloud
Grid Computing vs Cloud
Els dos són sistemes distribuïts   “A distributed system is one in which the failure   of a computer you didnt even know e...
Els dos són sistemes distribuïts   “A distributed system is one in which the failure   of a computer you didnt even know e...
Cloud
Cloud
Hadoop
Hadoop   MapReduce: Simplified Data Processing on Large Clusters   Jeffrey Dean and Sanjay Ghemawat   OSDI04: Sixth Sympos...
Hadoop
Hadoop
Hadoop         ●             Nutch         ●             Lucene         ●             Hadoop         ●             Avro
Hadoop  “Flexible infrastructure for large scale  computational and data processing on  a network of commodity hardware”  ...
Hadoop  “Flexible infrastructure for large scale  computational and data processing on  a network of commodity hardware”  ...
Hadoop  “Flexible infrastructure for large scale  computational and data processing on  a network of commodity hardware”  ...
Map & ReduceMap :V = [ 1 , 2 , 3 , 4 , 5 ]Def quadrat( x ) = x * x;Map ( V, quadrat ) =  For (var v : V) {    Output quadr...
Map & ReduceMap :                       Reduce :V = [ 1 , 2 , 3 , 4 , 5 ]   V = [ 1 , 4 , 9 , 16 , 25 ]Def quadrat( x ) = ...
Hadoop DFS         The Google File System         Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung         19th ACM Sym...
Exemple DFS
Exemple DFS          Mapper          Entrada: [ “paraula1”, “paraula2”,                     “paraula3”, “paraula1” ]      ...
Exemple          DFS                “paraula1” : [ 2, x, y]                 2 del mapper 1                 x del mapper 2 ...
Exemple DFS                        “paraula1”:x                        “paraula2”:y       “paraula1”   ∑   “paraula3”:z   ...
Exemple de codipublic static class Map extends Mapper<LongWritable, Text, Text,       IntWritable> {       private final s...
Exemple de codi public static class Reduce extends Reducer<Text, IntWritable,       Text, IntWritable> {       public void...
Exemple de codipublic static void main(String[] args) throws Exception {       Configuration conf = new Configuration();  ...
Workflow   DB  LOGS           HDFS   DB NoSQL
Qui ho utilitza?
Qui ho utilitza?
Ecosistema Hadoop
Ecosistema Hadoop
Comunitat HadoopSuport:
Interessats ? Per provar Hadoop:    http://www.cloudera.com ► Downloads    http://hadoop.apache.org Grup dusuaris de Hadoo...
Preguntes ?     Marc de Palolmarc.de.palol@gmail.com         @lant
Upcoming SlideShare
Loading in …5
×

Cloud jpl

1,564 views

Published on

The slides I used for my talk about Cloud Computing and Hadoop in the Xth Jornades de Programari Lliure de Barcelona.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,564
On SlideShare
0
From Embeds
0
Number of Embeds
26
Actions
Shares
0
Downloads
17
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Cloud jpl

  1. 1. Cloud Computing i Hadoop X JPL Barcelona, 01/07/2011 Marc de Palol @lant
  2. 2. Qui sóc ?
  3. 3. Qui sóc ?
  4. 4. Qui sóc ?
  5. 5. Qui sóc ?
  6. 6. Qui sóc ?
  7. 7. Qui sóc ?
  8. 8. Grid Computing vs Cloud
  9. 9. Grid Computing vs Cloud
  10. 10. Els dos són sistemes distribuïts “A distributed system is one in which the failure of a computer you didnt even know existed can render your own computer unusable” Leslie Lamport
  11. 11. Els dos són sistemes distribuïts “A distributed system is one in which the failure of a computer you didnt even know existed can render your own computer unusable” Leslie Lamport “A distributed system consists of multiple autonomous computers that communicate through a computer network.” Wikipedia
  12. 12. Cloud
  13. 13. Cloud
  14. 14. Hadoop
  15. 15. Hadoop MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat OSDI04: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, December, 2004.
  16. 16. Hadoop
  17. 17. Hadoop
  18. 18. Hadoop ● Nutch ● Lucene ● Hadoop ● Avro
  19. 19. Hadoop “Flexible infrastructure for large scale computational and data processing on a network of commodity hardware” Parand Tony Darugar
  20. 20. Hadoop “Flexible infrastructure for large scale computational and data processing on a network of commodity hardware” Parand Tony Darugar
  21. 21. Hadoop “Flexible infrastructure for large scale computational and data processing on a network of commodity hardware” Parand Tony Darugar
  22. 22. Map & ReduceMap :V = [ 1 , 2 , 3 , 4 , 5 ]Def quadrat( x ) = x * x;Map ( V, quadrat ) = For (var v : V) { Output quadrat(v); }}[1, 4, 9, 16, 25]
  23. 23. Map & ReduceMap : Reduce :V = [ 1 , 2 , 3 , 4 , 5 ] V = [ 1 , 4 , 9 , 16 , 25 ]Def quadrat( x ) = x * x;Map ( V, quadrat ) = Reduce ( V ) = For (var v : V) { Var acum = 0; output quadrat(v); For (var v : V) { } acum = acum + v} } }[1, 4, 9, 16, 25] 55
  24. 24. Hadoop DFS The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung 19th ACM Symposium on Operating Systems Principles, Lake George, NY, October, 2003. ● Dissenyat per Big Data ● Des de fa poc permet append ● Write Once, Read Many ● No pot ser muntat al SO ● Datanode per màquina ● Lectura seqüencial ● Un Name Node per cluster (SPOAD) ● Estable i robust ● Tolerància a errors HW ● Estable i robust ● Replica Rack Aware ● Estable i robust
  25. 25. Exemple DFS
  26. 26. Exemple DFS Mapper Entrada: [ “paraula1”, “paraula2”, “paraula3”, “paraula1” ] Sortida: [ “paraula1” : 2, “paraula2” : 1, “paraula3” : 1 ]
  27. 27. Exemple DFS “paraula1” : [ 2, x, y] 2 del mapper 1 x del mapper 2 y del mapper 3 “paraula2” : [ x, z, w] x del mapper 1 z del mapper 2 w del mapper 3 “paraula3” : [ ... ]
  28. 28. Exemple DFS “paraula1”:x “paraula2”:y “paraula1” ∑ “paraula3”:z ... “paraula2” ∑ “paraula3” ∑
  29. 29. Exemple de codipublic static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } }
  30. 30. Exemple de codi public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } }
  31. 31. Exemple de codipublic static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "wordcount"); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); }
  32. 32. Workflow DB LOGS HDFS DB NoSQL
  33. 33. Qui ho utilitza?
  34. 34. Qui ho utilitza?
  35. 35. Ecosistema Hadoop
  36. 36. Ecosistema Hadoop
  37. 37. Comunitat HadoopSuport:
  38. 38. Interessats ? Per provar Hadoop: http://www.cloudera.com ► Downloads http://hadoop.apache.org Grup dusuaris de Hadoop i escalabilitat a nivell nacional: https://groups.google.com/group/spain-scalability-users Grups al LinkedIn: Hadoop España Hive España
  39. 39. Preguntes ? Marc de Palolmarc.de.palol@gmail.com @lant

×