SlideShare a Scribd company logo
1 of 15
BIG DATA TESTING
By QA InfoTech
Scenario
OMG!! Did he just asked me to catch rats in a
place full of snakes
3
Agenda
1. What is Big Data
2. Characteristic of Big Data
3. Meaning of BIG DATA to “US”
4. Hadoop
6. Submitting a Map Reduce Job
What is BIG DATA?
• ‘Big Data’ is similar to ‘small data’, but bigger in size
• Big Data generates value from the storage and processing of very large
quantities of digital information that cannot be analyzed with traditional
computing techniques.
• Walmart handles more than 1 million customer transactions every hour.
• Facebook handles 40 billion photos from its user base.
• Decoding the human genome originally took 10years to process; now it can
be achieved in one week.
Three Characteristics of Big Data V3s
Volume
• Data quantity
Velocity
• Data Speed
Variety
• Data Types
What BIG DATA TESTING mean to Testers?
 Take into consideration these 3 perspectives:
• Data
• Infrastructure
• Validation Tools
Now the questions comes what technology is
needed for handling BIG DATA ?
1.HADOOP
Hadoop & Its Components
• Hadoop is an open-source software framework for storing and processing big data
in a distributed fashion on large clusters of commodity hardware. Essentially, it
accomplishes two tasks: massive data storage and faster processing.
Source: http://www.trieuvan.com/apache/hadoop/common/
How is Hadoop Helping?
• HDFS: Java based distributed FS that can run and store all kinds of data
• Map Reduce: A software programming model for processing large set of
data in parallel
• YARN: A resource management framework for scheduling and handling
resource requests from distributed applications
This is our Input File : Input Sampleset.txt
11
Map Reduce Program For Max Temperature :
Driver Class
Job job = new Job();
job.setJarByClass(MaxTemperatureDriver.class);
job.setJobName("Max Temperature");
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(MaxTemperatureMapper.class);
job.setReducerClass(MaxTemperatureReducer.class);
12
Mapper Class
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
String year = line.substring(15, 19);
int airTemperature;
if (line.charAt(87) == '+') { // parseInt doesn't like leading plus
// signs
airTemperature = Integer.parseInt(line.substring(88, 92));
} else {
airTemperature = Integer.parseInt(line.substring(87, 92));
}
13
Reducer Class
@Override
public void reduce(Text key, Iterable<IntWritable> values,
Context context)
throws IOException, InterruptedException {
int maxValue = Integer.MIN_VALUE;
for (IntWritable value : values) {
maxValue = Math.max(maxValue, value.get());
}
context.write(key, new IntWritable(maxValue));
}
}
14
Thank You
For more information, please:
• Contact us at info@qainfotech.com
• Visit us at www.qainfotech.com
• Read our blog at www.qainfotech.com/blog
• Follow us on Twitter at www.twitter.com/qainfotech
USA
Office
International
Headquarters
Noida
Uttar Pradesh, India
Farmington Hills
Michigan, U.S.A.

More Related Content

What's hot

the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World DistilledRTTS
 
Hadoop testing workshop - july 2013
Hadoop testing workshop - july 2013Hadoop testing workshop - july 2013
Hadoop testing workshop - july 2013Ophir Cohen
 
Whitepaper: Volume Testing Thick Clients and Databases
Whitepaper:  Volume Testing Thick Clients and DatabasesWhitepaper:  Volume Testing Thick Clients and Databases
Whitepaper: Volume Testing Thick Clients and DatabasesRTTS
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...RTTS
 
Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg
Big Data - Hadoop and MapReduce for QA and testing by Aditya GargBig Data - Hadoop and MapReduce for QA and testing by Aditya Garg
Big Data - Hadoop and MapReduce for QA and testing by Aditya GargQA or the Highway
 
An introduction to QuerySurge webinar
An introduction to QuerySurge webinarAn introduction to QuerySurge webinar
An introduction to QuerySurge webinarRTTS
 
Completing the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessCompleting the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessRTTS
 
Testistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex Black
Testistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex BlackTestistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex Black
Testistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex BlackTurkish Testing Board
 
Big data testing (1)
Big data testing (1)Big data testing (1)
Big data testing (1)vodqancr
 
Big Data Computing Architecture
Big Data Computing ArchitectureBig Data Computing Architecture
Big Data Computing ArchitectureGang Tao
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Data Con LA
 
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit AgarwalSuccinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit AgarwalSpark Summit
 
Testing the Data Warehouse―Big Data, Big Problems
Testing the Data Warehouse―Big Data, Big ProblemsTesting the Data Warehouse―Big Data, Big Problems
Testing the Data Warehouse―Big Data, Big ProblemsTechWell
 
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & RedshiftIntroduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & RedshiftDataKitchen
 
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku
 
Building Data Science into Organizations: Field Experience
Building Data Science into Organizations: Field ExperienceBuilding Data Science into Organizations: Field Experience
Building Data Science into Organizations: Field ExperienceDatabricks
 

What's hot (20)

the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World Distilled
 
Hadoop testing workshop - july 2013
Hadoop testing workshop - july 2013Hadoop testing workshop - july 2013
Hadoop testing workshop - july 2013
 
TESTING IN BIG DATA WORLD
TESTING IN BIG DATA  WORLDTESTING IN BIG DATA  WORLD
TESTING IN BIG DATA WORLD
 
Whitepaper: Volume Testing Thick Clients and Databases
Whitepaper:  Volume Testing Thick Clients and DatabasesWhitepaper:  Volume Testing Thick Clients and Databases
Whitepaper: Volume Testing Thick Clients and Databases
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
 
Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg
Big Data - Hadoop and MapReduce for QA and testing by Aditya GargBig Data - Hadoop and MapReduce for QA and testing by Aditya Garg
Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg
 
An introduction to QuerySurge webinar
An introduction to QuerySurge webinarAn introduction to QuerySurge webinar
An introduction to QuerySurge webinar
 
Completing the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessCompleting the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = Success
 
Testistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex Black
Testistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex BlackTestistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex Black
Testistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex Black
 
Big data testing (1)
Big data testing (1)Big data testing (1)
Big data testing (1)
 
Automated Analytics at Scale
Automated Analytics at ScaleAutomated Analytics at Scale
Automated Analytics at Scale
 
A Continuously Deployed Hadoop Analytics Platform?
A Continuously Deployed Hadoop Analytics Platform?A Continuously Deployed Hadoop Analytics Platform?
A Continuously Deployed Hadoop Analytics Platform?
 
Big Data Computing Architecture
Big Data Computing ArchitectureBig Data Computing Architecture
Big Data Computing Architecture
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
 
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit AgarwalSuccinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Testing the Data Warehouse―Big Data, Big Problems
Testing the Data Warehouse―Big Data, Big ProblemsTesting the Data Warehouse―Big Data, Big Problems
Testing the Data Warehouse―Big Data, Big Problems
 
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & RedshiftIntroduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
 
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
 
Building Data Science into Organizations: Field Experience
Building Data Science into Organizations: Field ExperienceBuilding Data Science into Organizations: Field Experience
Building Data Science into Organizations: Field Experience
 

Viewers also liked

How to perform Analytics testing on your website and tools
How to perform Analytics testing on your website and toolsHow to perform Analytics testing on your website and tools
How to perform Analytics testing on your website and toolsMayank Solanki
 
Data Driven Design - Web Analytics & Testing for Designers (Web Directions So...
Data Driven Design - Web Analytics & Testing for Designers (Web Directions So...Data Driven Design - Web Analytics & Testing for Designers (Web Directions So...
Data Driven Design - Web Analytics & Testing for Designers (Web Directions So...Luke Stevens
 
Divide and-conquer approach towards data analytics testing
Divide and-conquer approach towards data analytics testingDivide and-conquer approach towards data analytics testing
Divide and-conquer approach towards data analytics testingKokilaRudresh
 
Data Analytics-testing spectrum
Data Analytics-testing spectrumData Analytics-testing spectrum
Data Analytics-testing spectrumVodqaBLR
 
The What, Why and How of (Web) Analytics Testing (Web, IoT, Big Data)
The What, Why and How of (Web) Analytics Testing (Web, IoT, Big Data)The What, Why and How of (Web) Analytics Testing (Web, IoT, Big Data)
The What, Why and How of (Web) Analytics Testing (Web, IoT, Big Data)Anand Bagmar
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurgeRTTS
 

Viewers also liked (6)

How to perform Analytics testing on your website and tools
How to perform Analytics testing on your website and toolsHow to perform Analytics testing on your website and tools
How to perform Analytics testing on your website and tools
 
Data Driven Design - Web Analytics & Testing for Designers (Web Directions So...
Data Driven Design - Web Analytics & Testing for Designers (Web Directions So...Data Driven Design - Web Analytics & Testing for Designers (Web Directions So...
Data Driven Design - Web Analytics & Testing for Designers (Web Directions So...
 
Divide and-conquer approach towards data analytics testing
Divide and-conquer approach towards data analytics testingDivide and-conquer approach towards data analytics testing
Divide and-conquer approach towards data analytics testing
 
Data Analytics-testing spectrum
Data Analytics-testing spectrumData Analytics-testing spectrum
Data Analytics-testing spectrum
 
The What, Why and How of (Web) Analytics Testing (Web, IoT, Big Data)
The What, Why and How of (Web) Analytics Testing (Web, IoT, Big Data)The What, Why and How of (Web) Analytics Testing (Web, IoT, Big Data)
The What, Why and How of (Web) Analytics Testing (Web, IoT, Big Data)
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
 

Similar to Big Data Testing

Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceCsaba Toth
 
Analysing of big data using map reduce
Analysing of big data using map reduceAnalysing of big data using map reduce
Analysing of big data using map reducePaladion Networks
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big DataOmnia Safaan
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introductionSandeep Singh
 
Big data | Hadoop | components of hadoop |Rahul Gulab Sing
Big data | Hadoop | components of hadoop |Rahul Gulab SingBig data | Hadoop | components of hadoop |Rahul Gulab Sing
Big data | Hadoop | components of hadoop |Rahul Gulab SingRahul Singh
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystemnallagangus
 
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xModule 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xNPN Training
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Deanna Kosaraju
 
Hadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFSHadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFSpraveen bhat
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introductionsaisreealekhya
 
HADOOP_2_0_YARN_Arch - Copy.pptx
HADOOP_2_0_YARN_Arch - Copy.pptxHADOOP_2_0_YARN_Arch - Copy.pptx
HADOOP_2_0_YARN_Arch - Copy.pptxBirajPaul6
 

Similar to Big Data Testing (20)

Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
 
Analysing of big data using map reduce
Analysing of big data using map reduceAnalysing of big data using map reduce
Analysing of big data using map reduce
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Big data
Big dataBig data
Big data
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big Data
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
Big data | Hadoop | components of hadoop |Rahul Gulab Sing
Big data | Hadoop | components of hadoop |Rahul Gulab SingBig data | Hadoop | components of hadoop |Rahul Gulab Sing
Big data | Hadoop | components of hadoop |Rahul Gulab Sing
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xModule 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
 
Big data applications
Big data applicationsBig data applications
Big data applications
 
Hadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFSHadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFS
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
 
TSE_Pres12.pptx
TSE_Pres12.pptxTSE_Pres12.pptx
TSE_Pres12.pptx
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
HADOOP_2_0_YARN_Arch - Copy.pptx
HADOOP_2_0_YARN_Arch - Copy.pptxHADOOP_2_0_YARN_Arch - Copy.pptx
HADOOP_2_0_YARN_Arch - Copy.pptx
 

Recently uploaded

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Recently uploaded (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

Big Data Testing

  • 1. BIG DATA TESTING By QA InfoTech
  • 3. OMG!! Did he just asked me to catch rats in a place full of snakes 3
  • 4. Agenda 1. What is Big Data 2. Characteristic of Big Data 3. Meaning of BIG DATA to “US” 4. Hadoop 6. Submitting a Map Reduce Job
  • 5. What is BIG DATA? • ‘Big Data’ is similar to ‘small data’, but bigger in size • Big Data generates value from the storage and processing of very large quantities of digital information that cannot be analyzed with traditional computing techniques. • Walmart handles more than 1 million customer transactions every hour. • Facebook handles 40 billion photos from its user base. • Decoding the human genome originally took 10years to process; now it can be achieved in one week.
  • 6. Three Characteristics of Big Data V3s Volume • Data quantity Velocity • Data Speed Variety • Data Types
  • 7. What BIG DATA TESTING mean to Testers?  Take into consideration these 3 perspectives: • Data • Infrastructure • Validation Tools
  • 8. Now the questions comes what technology is needed for handling BIG DATA ? 1.HADOOP
  • 9. Hadoop & Its Components • Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity hardware. Essentially, it accomplishes two tasks: massive data storage and faster processing. Source: http://www.trieuvan.com/apache/hadoop/common/
  • 10. How is Hadoop Helping? • HDFS: Java based distributed FS that can run and store all kinds of data • Map Reduce: A software programming model for processing large set of data in parallel • YARN: A resource management framework for scheduling and handling resource requests from distributed applications
  • 11. This is our Input File : Input Sampleset.txt 11
  • 12. Map Reduce Program For Max Temperature : Driver Class Job job = new Job(); job.setJarByClass(MaxTemperatureDriver.class); job.setJobName("Max Temperature"); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setMapperClass(MaxTemperatureMapper.class); job.setReducerClass(MaxTemperatureReducer.class); 12
  • 13. Mapper Class @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String year = line.substring(15, 19); int airTemperature; if (line.charAt(87) == '+') { // parseInt doesn't like leading plus // signs airTemperature = Integer.parseInt(line.substring(88, 92)); } else { airTemperature = Integer.parseInt(line.substring(87, 92)); } 13
  • 14. Reducer Class @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int maxValue = Integer.MIN_VALUE; for (IntWritable value : values) { maxValue = Math.max(maxValue, value.get()); } context.write(key, new IntWritable(maxValue)); } } 14
  • 15. Thank You For more information, please: • Contact us at info@qainfotech.com • Visit us at www.qainfotech.com • Read our blog at www.qainfotech.com/blog • Follow us on Twitter at www.twitter.com/qainfotech USA Office International Headquarters Noida Uttar Pradesh, India Farmington Hills Michigan, U.S.A.

Editor's Notes

  1. Image Source -blogs.forrester.com
  2. Acco.to IBM