SlideShare a Scribd company logo
1 of 27
Download to read offline
01-­‐1	
  ©	
  Copyright	
  2010-­‐2013	
  Cloudera.	
  All	
  rights	
  reserved.	
  Not	
  to	
  be	
  reproduced	
  without	
  prior	
  wri>en	
  consent.	
  
Hadoop	
  101:	
  
WriCng	
  a	
  Java	
  MapReduce	
  Program	
  	
  
Ian	
  Wrigley	
  
Sr.	
  Curriculum	
  Manager,	
  Cloudera	
  	
  
ian@cloudera.com	
  |	
  @iwrigley	
  
01-­‐2	
  ©	
  Copyright	
  2010-­‐2013	
  Cloudera.	
  All	
  rights	
  reserved.	
  Not	
  to	
  be	
  reproduced	
  without	
  prior	
  wri>en	
  consent.	
  
And,	
  by	
  the	
  way,	
  what	
  is	
  Hadoop?	
  
Why	
  the	
  World	
  Needs	
  Hadoop	
  
01-­‐3	
  ©	
  Copyright	
  2010-­‐2013	
  Cloudera.	
  All	
  rights	
  reserved.	
  Not	
  to	
  be	
  reproduced	
  without	
  prior	
  wri>en	
  consent.	
  
§ Every	
  day…	
  
– More	
  than	
  1.5	
  billion	
  shares	
  are	
  traded	
  on	
  the	
  NYSE	
  
– Facebook	
  stores	
  2.7	
  billion	
  comments	
  and	
  Likes	
  
§ Every	
  minute…	
  
– Foursquare	
  handles	
  more	
  than	
  2,000	
  check-­‐ins	
  
– TransUnion	
  makes	
  nearly	
  70,000	
  updates	
  to	
  credit	
  files	
  
§ And	
  every	
  second…	
  
– Banks	
  process	
  more	
  than	
  10,000	
  credit	
  card	
  transacCons	
  
Volume	
  
01-­‐4	
  ©	
  Copyright	
  2010-­‐2013	
  Cloudera.	
  All	
  rights	
  reserved.	
  Not	
  to	
  be	
  reproduced	
  without	
  prior	
  wri>en	
  consent.	
  
§ We	
  are	
  genera;ng	
  data	
  faster	
  than	
  ever	
  
– Processes	
  are	
  increasingly	
  automated	
  
– People	
  are	
  increasingly	
  interacCng	
  online	
  
– Systems	
  are	
  increasingly	
  interconnected	
  
Velocity	
  
01-­‐5	
  ©	
  Copyright	
  2010-­‐2013	
  Cloudera.	
  All	
  rights	
  reserved.	
  Not	
  to	
  be	
  reproduced	
  without	
  prior	
  wri>en	
  consent.	
  
§ We’re	
  producing	
  a	
  variety	
  of	
  data,	
  including	
  
– Audio	
  
– Video	
  
– Images	
  
– Log	
  files	
  
– Web	
  pages	
  
– Product	
  raCng	
  comments	
  
– Social	
  network	
  connecCons	
  
§ Not	
  all	
  of	
  this	
  maps	
  cleanly	
  to	
  the	
  rela;onal	
  model	
  
Variety	
  
01-­‐6	
  ©	
  Copyright	
  2010-­‐2013	
  Cloudera.	
  All	
  rights	
  reserved.	
  Not	
  to	
  be	
  reproduced	
  without	
  prior	
  wri>en	
  consent.	
  
§ One	
  tweet	
  is	
  an	
  anecdote	
  
– But	
  a	
  million	
  tweets	
  may	
  signal	
  important	
  trends	
  
§ One	
  person’s	
  product	
  review	
  is	
  an	
  opinion	
  
– But	
  a	
  million	
  reviews	
  might	
  uncover	
  a	
  design	
  flaw	
  
§ One	
  person’s	
  diagnosis	
  is	
  an	
  isolated	
  case	
  
– But	
  a	
  million	
  medical	
  records	
  could	
  lead	
  to	
  a	
  cure	
  
Big	
  Data	
  Can	
  Mean	
  Big	
  Opportunity	
  
01-­‐7	
  ©	
  Copyright	
  2010-­‐2013	
  Cloudera.	
  All	
  rights	
  reserved.	
  Not	
  to	
  be	
  reproduced	
  without	
  prior	
  wri>en	
  consent.	
  
A	
  Scalable	
  Data	
  Processing	
  Framework	
  
MapReduce	
  
01-­‐8	
  ©	
  Copyright	
  2010-­‐2013	
  Cloudera.	
  All	
  rights	
  reserved.	
  Not	
  to	
  be	
  reproduced	
  without	
  prior	
  wri>en	
  consent.	
  
§ MapReduce	
  is	
  a	
  programming	
  model	
  
– It’s	
  a	
  way	
  of	
  processing	
  data	
  	
  
§ In	
  Hadoop,	
  you	
  supply	
  two	
  func;ons	
  to	
  process	
  data:	
  Map	
  and	
  Reduce	
  
– Map:	
  typically	
  used	
  to	
  transform,	
  parse,	
  or	
  filter	
  data	
  
– Reduce:	
  typically	
  used	
  to	
  summarize	
  results	
  
§ The	
  Map	
  func;on	
  always	
  runs	
  first	
  
– The	
  Reduce	
  funcCon	
  runs	
  acerwards	
  
– The	
  Hadoop	
  framework	
  performs	
  a	
  shuffle	
  and	
  sort	
  to	
  transfer	
  data	
  
from	
  the	
  Map	
  funcCon	
  to	
  the	
  Reduce	
  funcCon	
  
§ Each	
  piece	
  is	
  simple,	
  but	
  can	
  be	
  powerful	
  when	
  combined	
  
What	
  is	
  MapReduce?	
  
01-­‐9	
  ©	
  Copyright	
  2010-­‐2013	
  Cloudera.	
  All	
  rights	
  reserved.	
  Not	
  to	
  be	
  reproduced	
  without	
  prior	
  wri>en	
  consent.	
  
§ …	
  in	
  which	
  Ian	
  waves	
  his	
  hands	
  around	
  and	
  aRempts	
  to	
  explain	
  the	
  
MapReduce	
  flow	
  
MapReduce:	
  An	
  Example	
  
01-­‐10	
  ©	
  Copyright	
  2010-­‐2013	
  Cloudera.	
  All	
  rights	
  reserved.	
  Not	
  to	
  be	
  reproduced	
  without	
  prior	
  wri>en	
  consent.	
  
§ MapReduce	
  processing	
  in	
  Hadoop	
  is	
  batch-­‐oriented	
  
§ Usually	
  wriRen	
  in	
  Java	
  
– This	
  uses	
  Hadoop’s	
  API	
  directly	
  
– You	
  can	
  do	
  basic	
  MapReduce	
  in	
  other	
  languages	
  
– Using	
  the	
  Hadoop	
  Streaming	
  wrapper	
  program	
  
– Some	
  advanced	
  features	
  require	
  Java	
  code	
  
MapReduce	
  Code	
  for	
  Hadoop	
  
01-­‐11	
  ©	
  Copyright	
  2010-­‐2013	
  Cloudera.	
  All	
  rights	
  reserved.	
  Not	
  to	
  be	
  reproduced	
  without	
  prior	
  wri>en	
  consent.	
  
§ Some	
  (very)	
  basic	
  concepts:	
  
– Input	
  and	
  output	
  data	
  is	
  typed	
  
– The	
  framework	
  passes	
  each	
  input	
  record	
  to	
  the	
  Mapper	
  in	
  turn	
  
– A	
  record	
  is	
  a	
  (key,	
  value)	
  pair	
  
– For	
  text	
  files:	
  
– The	
  key	
  is	
  the	
  byte	
  offset	
  of	
  the	
  start	
  of	
  the	
  line	
  
– The	
  value	
  is	
  the	
  line	
  itself	
  
– Output	
  data	
  from	
  the	
  Mapper	
  is	
  transferred	
  to	
  the	
  Reducer	
  via	
  a	
  
process	
  known	
  as	
  the	
  shuffle	
  and	
  sort	
  
– Reducers	
  receive	
  (key,	
  Iterable	
  of	
  values)	
  sets,	
  in	
  sorted	
  key	
  order	
  
– Job	
  is	
  configured	
  and	
  executed	
  using	
  a	
  driver	
  class	
  
Basic	
  Java	
  API	
  Concepts	
  
01-­‐12	
  ©	
  Copyright	
  2010-­‐2013	
  Cloudera.	
  All	
  rights	
  reserved.	
  Not	
  to	
  be	
  reproduced	
  without	
  prior	
  wri>en	
  consent.	
  
	
  	
  
Data	
  Flow	
  
Map	
  input	
  
Map	
  output	
   Reduce	
  input	
   Reduce	
  output	
  
Shuffle	
  
and	
  sort	
  
Nashville J. Jones 12.95 2013-07-21
Memphis S. Smith 66.57 2013-07-21
Nashville T. Harding 55.35 2013-07-22
Knoxville S. Warne 10.99 2013-07-22
Kingsport M. Thompson 99.95 2013-07-22
Nashville 12.95
Memphis 66.57
Nashville 55.35
Knoxville 10.99
Kingsport 99.95
Kingsport[99.95]
Knoxville[10.99]
Memphis [66.57]
Nashville[12.95, 55.35]
Kingsport 99.95
Knoxville 10.99
Memphis 66.57
Nashville 68.30
01-­‐13	
  ©	
  Copyright	
  2010-­‐2013	
  Cloudera.	
  All	
  rights	
  reserved.	
  Not	
  to	
  be	
  reproduced	
  without	
  prior	
  wri>en	
  consent.	
  
Java	
  MR	
  Job	
  Example:	
  Mapper	
  
package com.cloudera.example;
import java.io.IOException;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class StoreSalesMapper extends Mapper<LongWritable, Text,
Text, DoubleWritable> {
1
2
3
4
5
6
7
8
9
10
Input	
  key	
  and	
  value	
  types	
  
Output	
  key	
  and	
  value	
  types	
  
01-­‐14	
  ©	
  Copyright	
  2010-­‐2013	
  Cloudera.	
  All	
  rights	
  reserved.	
  Not	
  to	
  be	
  reproduced	
  without	
  prior	
  wri>en	
  consent.	
  
Java	
  MR	
  Job	
  Example:	
  Mapper	
  
/*
* The map method is invoked once for each line of text in the
* input data. The method receives a key of type LongWritable
* (which corresponds to the byte offset in the current input
* file), a value of type Text (representing the line of input
* data), and a Context object (which allows us to print status
* messages, among other things).
*/
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
11
12
13
14
15
16
17
18
19
20
21
22
23
01-­‐15	
  ©	
  Copyright	
  2010-­‐2013	
  Cloudera.	
  All	
  rights	
  reserved.	
  Not	
  to	
  be	
  reproduced	
  without	
  prior	
  wri>en	
  consent.	
  
Java	
  MR	
  Job	
  Example:	
  Mapper	
  
String line = value.toString();
// ignore empty lines
if (line.trim().isEmpty()) {
return;
}
String[] fields = line.split("t");
// ensure this line is not malformed
if (fields.length != 4) {
return;
}
24
25
26
27
28
29
30
31
32
33
34
35
36
Convert	
  value	
  to	
  a	
  Java	
  String	
  
Defensive	
  programming!	
  
Split	
  record	
  into	
  fields	
  
Even	
  more	
  defensive	
  
programming!	
  
01-­‐16	
  ©	
  Copyright	
  2010-­‐2013	
  Cloudera.	
  All	
  rights	
  reserved.	
  Not	
  to	
  be	
  reproduced	
  without	
  prior	
  wri>en	
  consent.	
  
Java	
  MR	
  Job	
  Example:	
  Mapper	
  
String storeName = fields[0];
Double saleValue = Double.parseDouble(fields[2]);
context.write(new Text(storeName), new DoubleWritable(saleValue));
}
}
37
38
39
40
41
42
43
44
45
46
47
Output	
  key	
  and	
  value	
  
Extract	
  based	
  on	
  posiCon	
  
01-­‐17	
  ©	
  Copyright	
  2010-­‐2013	
  Cloudera.	
  All	
  rights	
  reserved.	
  Not	
  to	
  be	
  reproduced	
  without	
  prior	
  wri>en	
  consent.	
  
Java	
  MR	
  Job	
  Example:	
  Reducer	
  
package com.cloudera.example;
import java.io.IOException;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class SumReducer extends Reducer<Text, DoubleWritable,
Text, DoubleWritable> {
1
2
3
4
5
6
7
8
9
10
Output	
  key	
  and	
  value	
  types	
  
Input	
  key	
  and	
  value	
  types	
  
01-­‐18	
  ©	
  Copyright	
  2010-­‐2013	
  Cloudera.	
  All	
  rights	
  reserved.	
  Not	
  to	
  be	
  reproduced	
  without	
  prior	
  wri>en	
  consent.	
  
Java	
  MR	
  Job	
  Example:	
  Reducer	
  
/*
* The reduce method is invoked once for each key received from
* the shuffle and sort phase of the MapReduce framework.
* The method receives a key of type Text (representing the key),
* a set of values of type DoubleWritable, and a Context object.
*/
@Override
public void reduce(Text key, Iterable<DoubleWritable> values,
Context context) throws IOException, InterruptedException {
11
12
13
14
15
16
17
18
19
01-­‐19	
  ©	
  Copyright	
  2010-­‐2013	
  Cloudera.	
  All	
  rights	
  reserved.	
  Not	
  to	
  be	
  reproduced	
  without	
  prior	
  wri>en	
  consent.	
  
Java	
  MR	
  Job	
  Example:	
  Reducer	
  
// used to sum up the store sales
double sum = 0;
// add to it it for each new value received
for (DoubleWritable value : values) {
sum += value.get();
}
// Our output is the event type (key) and the sum (value)
context.write(key, new DoubleWritable(sum));
}
}
20
21
22
23
24
25
26
27
28
29
30
31
Output	
  key	
  and	
  value	
  
01-­‐20	
  ©	
  Copyright	
  2010-­‐2013	
  Cloudera.	
  All	
  rights	
  reserved.	
  Not	
  to	
  be	
  reproduced	
  without	
  prior	
  wri>en	
  consent.	
  
Java	
  MR	
  Job	
  Example:	
  Driver	
  
package com.cloudera.example;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.Job;
// The driver is just a regular Java class with a "main" method
public class StoreSales {
public static void main(String[] args) throws Exception {
1
2
3
4
5
6
7
8
9
10
11
12
13
01-­‐21	
  ©	
  Copyright	
  2010-­‐2013	
  Cloudera.	
  All	
  rights	
  reserved.	
  Not	
  to	
  be	
  reproduced	
  without	
  prior	
  wri>en	
  consent.	
  
Java	
  MR	
  Job	
  Example:	
  Driver	
  
// validate command line arguments (we require the user
// to specify the HDFS paths to use for the job; see below)
if (args.length != 2) {
System.out.printf("Usage: Driver <input dir> <output dir>n");
System.exit(-1);
}
// Instantiate a Job object for our job's configuration.
Job job = new Job();
// configure input and output paths based on supplied arguments
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
14
15
16
17
18
19
20
21
22
23
24
25
26
01-­‐22	
  ©	
  Copyright	
  2010-­‐2013	
  Cloudera.	
  All	
  rights	
  reserved.	
  Not	
  to	
  be	
  reproduced	
  without	
  prior	
  wri>en	
  consent.	
  
Java	
  MR	
  Job	
  Example:	
  Driver	
  
// tells Hadoop to copy the JAR containing this class
// to cluster nodes, as required to run this job
job.setJarByClass(StoreSales.class);
// give the job a descriptive name. This is optional, but
// helps us identify this job on a busy cluster
job.setJobName("Store Sale Aggregator");
// Specify which classes to use for the Mapper and Reducer
job.setMapperClass(StoreSalesMapper.class);
job.setReducerClass(SumReducer.class);
27
28
29
30
31
32
33
34
35
36
37
01-­‐23	
  ©	
  Copyright	
  2010-­‐2013	
  Cloudera.	
  All	
  rights	
  reserved.	
  Not	
  to	
  be	
  reproduced	
  without	
  prior	
  wri>en	
  consent.	
  
Java	
  MR	
  Job	
  Example:	
  Driver	
  
// specify the Mapper's output key and value classes
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(DoubleWritable.class);
// specify the job's output key and value classes
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(DoubleWritable.class);
// start the MapReduce job and wait for it to finish.
// if it finishes successfully, return 0; otherwise 1.
boolean success = job.waitForCompletion(true);
System.exit(success ? 0 : 1);
}
}
38
39
40
41
42
43
44
45
46
47
48
49
50
51
01-­‐24	
  ©	
  Copyright	
  2010-­‐2013	
  Cloudera.	
  All	
  rights	
  reserved.	
  Not	
  to	
  be	
  reproduced	
  without	
  prior	
  wri>en	
  consent.	
  
§ And	
  now…	
  the	
  program	
  actually	
  running	
  on	
  a	
  pseudo-­‐distributed	
  cluster	
  
Demo	
  
01-­‐25	
  ©	
  Copyright	
  2010-­‐2013	
  Cloudera.	
  All	
  rights	
  reserved.	
  Not	
  to	
  be	
  reproduced	
  without	
  prior	
  wri>en	
  consent.	
  
§ Obviously	
  there’s	
  much	
  more	
  to	
  the	
  Hadoop	
  API	
  than	
  this	
  
– ParCConers	
  
– Combiners	
  
– Custom	
  Writables,	
  custom	
  WritableComparables	
  
– DistributedCache	
  
– Counters	
  
– Etc.,	
  etc.,	
  etc	
  
§ …but	
  even	
  with	
  just	
  this	
  amount	
  of	
  knowledge,	
  you	
  could	
  write	
  real-­‐world	
  
Hadoop	
  applica;ons	
  
Conclusion	
  
01-­‐26	
  ©	
  Copyright	
  2010-­‐2013	
  Cloudera.	
  All	
  rights	
  reserved.	
  Not	
  to	
  be	
  reproduced	
  without	
  prior	
  wri>en	
  consent.	
  
§ Helps	
  companies	
  profit	
  from	
  all	
  their	
  data	
  
– Founded	
  by	
  experts	
  from	
  Facebook,	
  Google,	
  Oracle,	
  and	
  Yahoo	
  
§ We	
  offer	
  products	
  and	
  services	
  for	
  large-­‐scale	
  data	
  analysis	
  
– Socware	
  (CDH	
  distribuCon	
  and	
  Cloudera	
  Manager)	
  
– ConsulCng	
  and	
  support	
  services	
  
– Training	
  and	
  cerCficaCon	
  
§ Want	
  to	
  aRend	
  a	
  training	
  course?	
  Use	
  the	
  code	
  Nashville_15	
  for	
  15%	
  off	
  
any	
  Cloudera-­‐delivered	
  class	
  
About	
  Cloudera	
  
01-­‐27	
  ©	
  Copyright	
  2010-­‐2013	
  Cloudera.	
  All	
  rights	
  reserved.	
  Not	
  to	
  be	
  reproduced	
  without	
  prior	
  wri>en	
  consent.	
  

More Related Content

What's hot

Hadoop: The Default Machine Learning Platform ?
Hadoop: The Default Machine Learning Platform ?Hadoop: The Default Machine Learning Platform ?
Hadoop: The Default Machine Learning Platform ?Milind Bhandarkar
 
Shared slides-edbt-keynote-03-19-13
Shared slides-edbt-keynote-03-19-13Shared slides-edbt-keynote-03-19-13
Shared slides-edbt-keynote-03-19-13Daniel Abadi
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandraNguyen Quang
 
Machine Learning and Hadoop
Machine Learning and HadoopMachine Learning and Hadoop
Machine Learning and HadoopJosh Patterson
 
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Spark Summit
 
Big Data and its emergence
Big Data and its emergenceBig Data and its emergence
Big Data and its emergencekoolkalpz
 
Boston Spark Meetup event Slides Update
Boston Spark Meetup event Slides UpdateBoston Spark Meetup event Slides Update
Boston Spark Meetup event Slides Updatevithakur
 
What it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesWhat it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesDataWorks Summit
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟datastack
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoopVarun Narang
 
Apache Spark Overview @ ferret
Apache Spark Overview @ ferretApache Spark Overview @ ferret
Apache Spark Overview @ ferretAndrii Gakhov
 
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
 Let Spark Fly: Advantages and Use Cases for Spark on Hadoop Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
Let Spark Fly: Advantages and Use Cases for Spark on HadoopMapR Technologies
 
Hadoop by kamran khan
Hadoop by kamran khanHadoop by kamran khan
Hadoop by kamran khanKamranKhan587
 
Hadoop tools with Examples
Hadoop tools with ExamplesHadoop tools with Examples
Hadoop tools with ExamplesJoe McTee
 

What's hot (19)

Hadoop: The Default Machine Learning Platform ?
Hadoop: The Default Machine Learning Platform ?Hadoop: The Default Machine Learning Platform ?
Hadoop: The Default Machine Learning Platform ?
 
Shared slides-edbt-keynote-03-19-13
Shared slides-edbt-keynote-03-19-13Shared slides-edbt-keynote-03-19-13
Shared slides-edbt-keynote-03-19-13
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandra
 
Spark 101
Spark 101Spark 101
Spark 101
 
Machine Learning and Hadoop
Machine Learning and HadoopMachine Learning and Hadoop
Machine Learning and Hadoop
 
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
 
Big Data and its emergence
Big Data and its emergenceBig Data and its emergence
Big Data and its emergence
 
Boston Spark Meetup event Slides Update
Boston Spark Meetup event Slides UpdateBoston Spark Meetup event Slides Update
Boston Spark Meetup event Slides Update
 
lec2_ref.pdf
lec2_ref.pdflec2_ref.pdf
lec2_ref.pdf
 
Spark - Philly JUG
Spark  - Philly JUGSpark  - Philly JUG
Spark - Philly JUG
 
Cassandra
CassandraCassandra
Cassandra
 
What it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesWhat it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! Perspectives
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
Apache Spark Overview @ ferret
Apache Spark Overview @ ferretApache Spark Overview @ ferret
Apache Spark Overview @ ferret
 
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
 Let Spark Fly: Advantages and Use Cases for Spark on Hadoop Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
 
Hadoop by kamran khan
Hadoop by kamran khanHadoop by kamran khan
Hadoop by kamran khan
 
Hadoop tools with Examples
Hadoop tools with ExamplesHadoop tools with Examples
Hadoop tools with Examples
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 

Viewers also liked

Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014Josh Patterson
 
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop1012014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop101Adam Muise
 
hadoop 101 aug 21 2012 tohug
 hadoop 101 aug 21 2012 tohug hadoop 101 aug 21 2012 tohug
hadoop 101 aug 21 2012 tohugAdam Muise
 
Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014cdmaxime
 
Big Data using NoSQL Technologies
Big Data using NoSQL TechnologiesBig Data using NoSQL Technologies
Big Data using NoSQL TechnologiesAmit Singh
 
Hadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data ProcessingHadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data ProcessingYahoo Developer Network
 
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics MeetupIntroduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetupiwrigley
 
Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopSavvycom Savvycom
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsBest Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsCloudera, Inc.
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsLynn Langit
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseDataWorks Summit
 
Overview of Hadoop and HDFS
Overview of Hadoop and HDFSOverview of Hadoop and HDFS
Overview of Hadoop and HDFSBrendan Tierney
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop TutorialEdureka!
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101EMC
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
 

Viewers also liked (20)

Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
 
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop1012014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
 
hadoop 101 aug 21 2012 tohug
 hadoop 101 aug 21 2012 tohug hadoop 101 aug 21 2012 tohug
hadoop 101 aug 21 2012 tohug
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101
 
Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014
 
Hadoop 101 v1
Hadoop 101 v1Hadoop 101 v1
Hadoop 101 v1
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101
 
Big Data using NoSQL Technologies
Big Data using NoSQL TechnologiesBig Data using NoSQL Technologies
Big Data using NoSQL Technologies
 
Hadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data ProcessingHadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data Processing
 
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics MeetupIntroduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
 
Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & Hadoop
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsBest Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
 
Overview of Hadoop and HDFS
Overview of Hadoop and HDFSOverview of Hadoop and HDFS
Overview of Hadoop and HDFS
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 

Similar to Njug presentation

The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkCloudera, Inc.
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelt3rmin4t0r
 
Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeongStsg17 speaker yousunjeong
Stsg17 speaker yousunjeongYousun Jeong
 
Architecting a Scalable Hadoop Platform: Top 10 considerations for success
Architecting a Scalable Hadoop Platform: Top 10 considerations for successArchitecting a Scalable Hadoop Platform: Top 10 considerations for success
Architecting a Scalable Hadoop Platform: Top 10 considerations for successDataWorks Summit
 
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...Sumeet Singh
 
Impala tech-talk by Dimitris Tsirogiannis
Impala tech-talk by Dimitris TsirogiannisImpala tech-talk by Dimitris Tsirogiannis
Impala tech-talk by Dimitris TsirogiannisFelicia Haggarty
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big DataOmnia Safaan
 
Demystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceDemystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceSnowflake Computing
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationInside Analysis
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Sumeet Singh
 
Building a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with RocanaBuilding a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with RocanaTreasure Data, Inc.
 
Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshopFang Mac
 
Reactive Stream Processing Using DDS and Rx
Reactive Stream Processing Using DDS and RxReactive Stream Processing Using DDS and Rx
Reactive Stream Processing Using DDS and RxSumant Tambe
 
AquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks PresentationAquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks PresentationAquaQ Analytics
 
Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015Eric Sammer
 
Tez Data Processing over Yarn
Tez Data Processing over YarnTez Data Processing over Yarn
Tez Data Processing over YarnInMobi Technology
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0Adam Muise
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...AboutYouGmbH
 

Similar to Njug presentation (20)

The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache Spark
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthel
 
Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeongStsg17 speaker yousunjeong
Stsg17 speaker yousunjeong
 
Architecting a Scalable Hadoop Platform: Top 10 considerations for success
Architecting a Scalable Hadoop Platform: Top 10 considerations for successArchitecting a Scalable Hadoop Platform: Top 10 considerations for success
Architecting a Scalable Hadoop Platform: Top 10 considerations for success
 
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
 
Aamod_Chandra
Aamod_ChandraAamod_Chandra
Aamod_Chandra
 
Impala tech-talk by Dimitris Tsirogiannis
Impala tech-talk by Dimitris TsirogiannisImpala tech-talk by Dimitris Tsirogiannis
Impala tech-talk by Dimitris Tsirogiannis
 
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big Data
 
Demystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceDemystifying Data Warehouse as a Service
Demystifying Data Warehouse as a Service
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
 
Building a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with RocanaBuilding a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with Rocana
 
Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshop
 
Reactive Stream Processing Using DDS and Rx
Reactive Stream Processing Using DDS and RxReactive Stream Processing Using DDS and Rx
Reactive Stream Processing Using DDS and Rx
 
AquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks PresentationAquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks Presentation
 
Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015
 
Tez Data Processing over Yarn
Tez Data Processing over YarnTez Data Processing over Yarn
Tez Data Processing over Yarn
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
 

Recently uploaded

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Njug presentation

  • 1. 01-­‐1  ©  Copyright  2010-­‐2013  Cloudera.  All  rights  reserved.  Not  to  be  reproduced  without  prior  wri>en  consent.   Hadoop  101:   WriCng  a  Java  MapReduce  Program     Ian  Wrigley   Sr.  Curriculum  Manager,  Cloudera     ian@cloudera.com  |  @iwrigley  
  • 2. 01-­‐2  ©  Copyright  2010-­‐2013  Cloudera.  All  rights  reserved.  Not  to  be  reproduced  without  prior  wri>en  consent.   And,  by  the  way,  what  is  Hadoop?   Why  the  World  Needs  Hadoop  
  • 3. 01-­‐3  ©  Copyright  2010-­‐2013  Cloudera.  All  rights  reserved.  Not  to  be  reproduced  without  prior  wri>en  consent.   § Every  day…   – More  than  1.5  billion  shares  are  traded  on  the  NYSE   – Facebook  stores  2.7  billion  comments  and  Likes   § Every  minute…   – Foursquare  handles  more  than  2,000  check-­‐ins   – TransUnion  makes  nearly  70,000  updates  to  credit  files   § And  every  second…   – Banks  process  more  than  10,000  credit  card  transacCons   Volume  
  • 4. 01-­‐4  ©  Copyright  2010-­‐2013  Cloudera.  All  rights  reserved.  Not  to  be  reproduced  without  prior  wri>en  consent.   § We  are  genera;ng  data  faster  than  ever   – Processes  are  increasingly  automated   – People  are  increasingly  interacCng  online   – Systems  are  increasingly  interconnected   Velocity  
  • 5. 01-­‐5  ©  Copyright  2010-­‐2013  Cloudera.  All  rights  reserved.  Not  to  be  reproduced  without  prior  wri>en  consent.   § We’re  producing  a  variety  of  data,  including   – Audio   – Video   – Images   – Log  files   – Web  pages   – Product  raCng  comments   – Social  network  connecCons   § Not  all  of  this  maps  cleanly  to  the  rela;onal  model   Variety  
  • 6. 01-­‐6  ©  Copyright  2010-­‐2013  Cloudera.  All  rights  reserved.  Not  to  be  reproduced  without  prior  wri>en  consent.   § One  tweet  is  an  anecdote   – But  a  million  tweets  may  signal  important  trends   § One  person’s  product  review  is  an  opinion   – But  a  million  reviews  might  uncover  a  design  flaw   § One  person’s  diagnosis  is  an  isolated  case   – But  a  million  medical  records  could  lead  to  a  cure   Big  Data  Can  Mean  Big  Opportunity  
  • 7. 01-­‐7  ©  Copyright  2010-­‐2013  Cloudera.  All  rights  reserved.  Not  to  be  reproduced  without  prior  wri>en  consent.   A  Scalable  Data  Processing  Framework   MapReduce  
  • 8. 01-­‐8  ©  Copyright  2010-­‐2013  Cloudera.  All  rights  reserved.  Not  to  be  reproduced  without  prior  wri>en  consent.   § MapReduce  is  a  programming  model   – It’s  a  way  of  processing  data     § In  Hadoop,  you  supply  two  func;ons  to  process  data:  Map  and  Reduce   – Map:  typically  used  to  transform,  parse,  or  filter  data   – Reduce:  typically  used  to  summarize  results   § The  Map  func;on  always  runs  first   – The  Reduce  funcCon  runs  acerwards   – The  Hadoop  framework  performs  a  shuffle  and  sort  to  transfer  data   from  the  Map  funcCon  to  the  Reduce  funcCon   § Each  piece  is  simple,  but  can  be  powerful  when  combined   What  is  MapReduce?  
  • 9. 01-­‐9  ©  Copyright  2010-­‐2013  Cloudera.  All  rights  reserved.  Not  to  be  reproduced  without  prior  wri>en  consent.   § …  in  which  Ian  waves  his  hands  around  and  aRempts  to  explain  the   MapReduce  flow   MapReduce:  An  Example  
  • 10. 01-­‐10  ©  Copyright  2010-­‐2013  Cloudera.  All  rights  reserved.  Not  to  be  reproduced  without  prior  wri>en  consent.   § MapReduce  processing  in  Hadoop  is  batch-­‐oriented   § Usually  wriRen  in  Java   – This  uses  Hadoop’s  API  directly   – You  can  do  basic  MapReduce  in  other  languages   – Using  the  Hadoop  Streaming  wrapper  program   – Some  advanced  features  require  Java  code   MapReduce  Code  for  Hadoop  
  • 11. 01-­‐11  ©  Copyright  2010-­‐2013  Cloudera.  All  rights  reserved.  Not  to  be  reproduced  without  prior  wri>en  consent.   § Some  (very)  basic  concepts:   – Input  and  output  data  is  typed   – The  framework  passes  each  input  record  to  the  Mapper  in  turn   – A  record  is  a  (key,  value)  pair   – For  text  files:   – The  key  is  the  byte  offset  of  the  start  of  the  line   – The  value  is  the  line  itself   – Output  data  from  the  Mapper  is  transferred  to  the  Reducer  via  a   process  known  as  the  shuffle  and  sort   – Reducers  receive  (key,  Iterable  of  values)  sets,  in  sorted  key  order   – Job  is  configured  and  executed  using  a  driver  class   Basic  Java  API  Concepts  
  • 12. 01-­‐12  ©  Copyright  2010-­‐2013  Cloudera.  All  rights  reserved.  Not  to  be  reproduced  without  prior  wri>en  consent.       Data  Flow   Map  input   Map  output   Reduce  input   Reduce  output   Shuffle   and  sort   Nashville J. Jones 12.95 2013-07-21 Memphis S. Smith 66.57 2013-07-21 Nashville T. Harding 55.35 2013-07-22 Knoxville S. Warne 10.99 2013-07-22 Kingsport M. Thompson 99.95 2013-07-22 Nashville 12.95 Memphis 66.57 Nashville 55.35 Knoxville 10.99 Kingsport 99.95 Kingsport[99.95] Knoxville[10.99] Memphis [66.57] Nashville[12.95, 55.35] Kingsport 99.95 Knoxville 10.99 Memphis 66.57 Nashville 68.30
  • 13. 01-­‐13  ©  Copyright  2010-­‐2013  Cloudera.  All  rights  reserved.  Not  to  be  reproduced  without  prior  wri>en  consent.   Java  MR  Job  Example:  Mapper   package com.cloudera.example; import java.io.IOException; import org.apache.hadoop.io.DoubleWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class StoreSalesMapper extends Mapper<LongWritable, Text, Text, DoubleWritable> { 1 2 3 4 5 6 7 8 9 10 Input  key  and  value  types   Output  key  and  value  types  
  • 14. 01-­‐14  ©  Copyright  2010-­‐2013  Cloudera.  All  rights  reserved.  Not  to  be  reproduced  without  prior  wri>en  consent.   Java  MR  Job  Example:  Mapper   /* * The map method is invoked once for each line of text in the * input data. The method receives a key of type LongWritable * (which corresponds to the byte offset in the current input * file), a value of type Text (representing the line of input * data), and a Context object (which allows us to print status * messages, among other things). */ @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { 11 12 13 14 15 16 17 18 19 20 21 22 23
  • 15. 01-­‐15  ©  Copyright  2010-­‐2013  Cloudera.  All  rights  reserved.  Not  to  be  reproduced  without  prior  wri>en  consent.   Java  MR  Job  Example:  Mapper   String line = value.toString(); // ignore empty lines if (line.trim().isEmpty()) { return; } String[] fields = line.split("t"); // ensure this line is not malformed if (fields.length != 4) { return; } 24 25 26 27 28 29 30 31 32 33 34 35 36 Convert  value  to  a  Java  String   Defensive  programming!   Split  record  into  fields   Even  more  defensive   programming!  
  • 16. 01-­‐16  ©  Copyright  2010-­‐2013  Cloudera.  All  rights  reserved.  Not  to  be  reproduced  without  prior  wri>en  consent.   Java  MR  Job  Example:  Mapper   String storeName = fields[0]; Double saleValue = Double.parseDouble(fields[2]); context.write(new Text(storeName), new DoubleWritable(saleValue)); } } 37 38 39 40 41 42 43 44 45 46 47 Output  key  and  value   Extract  based  on  posiCon  
  • 17. 01-­‐17  ©  Copyright  2010-­‐2013  Cloudera.  All  rights  reserved.  Not  to  be  reproduced  without  prior  wri>en  consent.   Java  MR  Job  Example:  Reducer   package com.cloudera.example; import java.io.IOException; import org.apache.hadoop.io.DoubleWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; public class SumReducer extends Reducer<Text, DoubleWritable, Text, DoubleWritable> { 1 2 3 4 5 6 7 8 9 10 Output  key  and  value  types   Input  key  and  value  types  
  • 18. 01-­‐18  ©  Copyright  2010-­‐2013  Cloudera.  All  rights  reserved.  Not  to  be  reproduced  without  prior  wri>en  consent.   Java  MR  Job  Example:  Reducer   /* * The reduce method is invoked once for each key received from * the shuffle and sort phase of the MapReduce framework. * The method receives a key of type Text (representing the key), * a set of values of type DoubleWritable, and a Context object. */ @Override public void reduce(Text key, Iterable<DoubleWritable> values, Context context) throws IOException, InterruptedException { 11 12 13 14 15 16 17 18 19
  • 19. 01-­‐19  ©  Copyright  2010-­‐2013  Cloudera.  All  rights  reserved.  Not  to  be  reproduced  without  prior  wri>en  consent.   Java  MR  Job  Example:  Reducer   // used to sum up the store sales double sum = 0; // add to it it for each new value received for (DoubleWritable value : values) { sum += value.get(); } // Our output is the event type (key) and the sum (value) context.write(key, new DoubleWritable(sum)); } } 20 21 22 23 24 25 26 27 28 29 30 31 Output  key  and  value  
  • 20. 01-­‐20  ©  Copyright  2010-­‐2013  Cloudera.  All  rights  reserved.  Not  to  be  reproduced  without  prior  wri>en  consent.   Java  MR  Job  Example:  Driver   package com.cloudera.example; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.DoubleWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.Job; // The driver is just a regular Java class with a "main" method public class StoreSales { public static void main(String[] args) throws Exception { 1 2 3 4 5 6 7 8 9 10 11 12 13
  • 21. 01-­‐21  ©  Copyright  2010-­‐2013  Cloudera.  All  rights  reserved.  Not  to  be  reproduced  without  prior  wri>en  consent.   Java  MR  Job  Example:  Driver   // validate command line arguments (we require the user // to specify the HDFS paths to use for the job; see below) if (args.length != 2) { System.out.printf("Usage: Driver <input dir> <output dir>n"); System.exit(-1); } // Instantiate a Job object for our job's configuration. Job job = new Job(); // configure input and output paths based on supplied arguments FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); 14 15 16 17 18 19 20 21 22 23 24 25 26
  • 22. 01-­‐22  ©  Copyright  2010-­‐2013  Cloudera.  All  rights  reserved.  Not  to  be  reproduced  without  prior  wri>en  consent.   Java  MR  Job  Example:  Driver   // tells Hadoop to copy the JAR containing this class // to cluster nodes, as required to run this job job.setJarByClass(StoreSales.class); // give the job a descriptive name. This is optional, but // helps us identify this job on a busy cluster job.setJobName("Store Sale Aggregator"); // Specify which classes to use for the Mapper and Reducer job.setMapperClass(StoreSalesMapper.class); job.setReducerClass(SumReducer.class); 27 28 29 30 31 32 33 34 35 36 37
  • 23. 01-­‐23  ©  Copyright  2010-­‐2013  Cloudera.  All  rights  reserved.  Not  to  be  reproduced  without  prior  wri>en  consent.   Java  MR  Job  Example:  Driver   // specify the Mapper's output key and value classes job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(DoubleWritable.class); // specify the job's output key and value classes job.setOutputKeyClass(Text.class); job.setOutputValueClass(DoubleWritable.class); // start the MapReduce job and wait for it to finish. // if it finishes successfully, return 0; otherwise 1. boolean success = job.waitForCompletion(true); System.exit(success ? 0 : 1); } } 38 39 40 41 42 43 44 45 46 47 48 49 50 51
  • 24. 01-­‐24  ©  Copyright  2010-­‐2013  Cloudera.  All  rights  reserved.  Not  to  be  reproduced  without  prior  wri>en  consent.   § And  now…  the  program  actually  running  on  a  pseudo-­‐distributed  cluster   Demo  
  • 25. 01-­‐25  ©  Copyright  2010-­‐2013  Cloudera.  All  rights  reserved.  Not  to  be  reproduced  without  prior  wri>en  consent.   § Obviously  there’s  much  more  to  the  Hadoop  API  than  this   – ParCConers   – Combiners   – Custom  Writables,  custom  WritableComparables   – DistributedCache   – Counters   – Etc.,  etc.,  etc   § …but  even  with  just  this  amount  of  knowledge,  you  could  write  real-­‐world   Hadoop  applica;ons   Conclusion  
  • 26. 01-­‐26  ©  Copyright  2010-­‐2013  Cloudera.  All  rights  reserved.  Not  to  be  reproduced  without  prior  wri>en  consent.   § Helps  companies  profit  from  all  their  data   – Founded  by  experts  from  Facebook,  Google,  Oracle,  and  Yahoo   § We  offer  products  and  services  for  large-­‐scale  data  analysis   – Socware  (CDH  distribuCon  and  Cloudera  Manager)   – ConsulCng  and  support  services   – Training  and  cerCficaCon   § Want  to  aRend  a  training  course?  Use  the  code  Nashville_15  for  15%  off   any  Cloudera-­‐delivered  class   About  Cloudera  
  • 27. 01-­‐27  ©  Copyright  2010-­‐2013  Cloudera.  All  rights  reserved.  Not  to  be  reproduced  without  prior  wri>en  consent.