BIG DATA TESTING
By QA InfoTech
Scenario
OMG!! Did he just asked me to catch rats in a
place full of snakes
3
Agenda
1. What is Big Data
2. Characteristic of Big Data
3. Meaning of BIG DATA to “US”
4. Hadoop
6. Submitting a Map Reduce Job
What is BIG DATA?
• ‘Big Data’ is similar to ‘small data’, but bigger in size
• Big Data generates value from the storage and processing of very large
quantities of digital information that cannot be analyzed with traditional
computing techniques.
• Walmart handles more than 1 million customer transactions every hour.
• Facebook handles 40 billion photos from its user base.
• Decoding the human genome originally took 10years to process; now it can
be achieved in one week.
Three Characteristics of Big Data V3s
Volume
• Data quantity
Velocity
• Data Speed
Variety
• Data Types
What BIG DATA TESTING mean to Testers?
 Take into consideration these 3 perspectives:
• Data
• Infrastructure
• Validation Tools
Now the questions comes what technology is
needed for handling BIG DATA ?
1.HADOOP
Hadoop & Its Components
• Hadoop is an open-source software framework for storing and processing big data
in a distributed fashion on large clusters of commodity hardware. Essentially, it
accomplishes two tasks: massive data storage and faster processing.
Source: http://www.trieuvan.com/apache/hadoop/common/
How is Hadoop Helping?
• HDFS: Java based distributed FS that can run and store all kinds of data
• Map Reduce: A software programming model for processing large set of
data in parallel
• YARN: A resource management framework for scheduling and handling
resource requests from distributed applications
This is our Input File : Input Sampleset.txt
11
Map Reduce Program For Max Temperature :
Driver Class
Job job = new Job();
job.setJarByClass(MaxTemperatureDriver.class);
job.setJobName("Max Temperature");
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(MaxTemperatureMapper.class);
job.setReducerClass(MaxTemperatureReducer.class);
12
Mapper Class
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
String year = line.substring(15, 19);
int airTemperature;
if (line.charAt(87) == '+') { // parseInt doesn't like leading plus
// signs
airTemperature = Integer.parseInt(line.substring(88, 92));
} else {
airTemperature = Integer.parseInt(line.substring(87, 92));
}
13
Reducer Class
@Override
public void reduce(Text key, Iterable<IntWritable> values,
Context context)
throws IOException, InterruptedException {
int maxValue = Integer.MIN_VALUE;
for (IntWritable value : values) {
maxValue = Math.max(maxValue, value.get());
}
context.write(key, new IntWritable(maxValue));
}
}
14
Thank You
For more information, please:
• Contact us at info@qainfotech.com
• Visit us at www.qainfotech.com
• Read our blog at www.qainfotech.com/blog
• Follow us on Twitter at www.twitter.com/qainfotech
USA
Office
International
Headquarters
Noida
Uttar Pradesh, India
Farmington Hills
Michigan, U.S.A.

Big Data Testing

  • 1.
  • 2.
  • 3.
    OMG!! Did hejust asked me to catch rats in a place full of snakes 3
  • 4.
    Agenda 1. What isBig Data 2. Characteristic of Big Data 3. Meaning of BIG DATA to “US” 4. Hadoop 6. Submitting a Map Reduce Job
  • 5.
    What is BIGDATA? • ‘Big Data’ is similar to ‘small data’, but bigger in size • Big Data generates value from the storage and processing of very large quantities of digital information that cannot be analyzed with traditional computing techniques. • Walmart handles more than 1 million customer transactions every hour. • Facebook handles 40 billion photos from its user base. • Decoding the human genome originally took 10years to process; now it can be achieved in one week.
  • 6.
    Three Characteristics ofBig Data V3s Volume • Data quantity Velocity • Data Speed Variety • Data Types
  • 7.
    What BIG DATATESTING mean to Testers?  Take into consideration these 3 perspectives: • Data • Infrastructure • Validation Tools
  • 8.
    Now the questionscomes what technology is needed for handling BIG DATA ? 1.HADOOP
  • 9.
    Hadoop & ItsComponents • Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity hardware. Essentially, it accomplishes two tasks: massive data storage and faster processing. Source: http://www.trieuvan.com/apache/hadoop/common/
  • 10.
    How is HadoopHelping? • HDFS: Java based distributed FS that can run and store all kinds of data • Map Reduce: A software programming model for processing large set of data in parallel • YARN: A resource management framework for scheduling and handling resource requests from distributed applications
  • 11.
    This is ourInput File : Input Sampleset.txt 11
  • 12.
    Map Reduce ProgramFor Max Temperature : Driver Class Job job = new Job(); job.setJarByClass(MaxTemperatureDriver.class); job.setJobName("Max Temperature"); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setMapperClass(MaxTemperatureMapper.class); job.setReducerClass(MaxTemperatureReducer.class); 12
  • 13.
    Mapper Class @Override public voidmap(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String year = line.substring(15, 19); int airTemperature; if (line.charAt(87) == '+') { // parseInt doesn't like leading plus // signs airTemperature = Integer.parseInt(line.substring(88, 92)); } else { airTemperature = Integer.parseInt(line.substring(87, 92)); } 13
  • 14.
    Reducer Class @Override public voidreduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int maxValue = Integer.MIN_VALUE; for (IntWritable value : values) { maxValue = Math.max(maxValue, value.get()); } context.write(key, new IntWritable(maxValue)); } } 14
  • 15.
    Thank You For moreinformation, please: • Contact us at info@qainfotech.com • Visit us at www.qainfotech.com • Read our blog at www.qainfotech.com/blog • Follow us on Twitter at www.twitter.com/qainfotech USA Office International Headquarters Noida Uttar Pradesh, India Farmington Hills Michigan, U.S.A.

Editor's Notes

  • #2 Image Source -blogs.forrester.com
  • #7 Acco.to IBM