Big Data - Hadoop
and MapReduce -
new age tools for aid
to testing and QA
by Aditya Garg
Confidential | Copyright © QAAgility Technologies
Aditya Garg @Adigindia
 Co-Founder and Director QAAgility.com
 Co-founder & Steering Committee Member of Agile Testing
Alliance – run meetup groups across multiple cities
 Co-creator and licensed trainer of Agile Testing Alliance’s
certifications CP-BAT, CP-MAT, CP-AAT, CP-SAT
 Co-Author of a book on Selenium Co-Author of a book on Selenium
 Love Cooking Indian Dishes – From Rajasthan
 Tasting (Testing) World food
 Travelling and meeting testers
(Get inspired and may be inspire a few)
@adigindia
https://www.linkedin.com/in/adigarg
Big Data - Hadoop and
MapReduce - new age tools
for aid to testing and QA
Topic for the presentation
for aid to testing and QA
What is this
Confidential | Copyright © QA Agility Technologies
1. How to test Big Data
applications ?
2. How can QA and Testing
What are we going to discuss ?
2. How can QA and Testing
team use Big Data tools
for their testing needs ?
1. How to test Big Data
applications ?
2. How can QA and Testing
What are we going to discuss ?
2. How can QA and Testing
team use Big Data tools
for their testing needs ?
What is Big Data ?
Is it just too much Hype or
Confidential | Copyright © QA Agility Technologies
Is it just too much Hype or
reality ?
Let us start with what
exactly is BigData
Confidential | Copyright © QA Agility Technologies
Which Search Engine do you use ?
http://searchstorage.techtarget.com/definition
all-that
How much data does Google store ?
https://www.cirrusinsight.com/blog/how-much-data-does-google-store
http://searchstorage.techtarget.com/definition
/Kilo-mega-giga-tera-peta-and-all
On Search Engines – Anyone using DuckDuckGo?
Data Explosion
Key Points in Big Data
1.Volume – Data Explosion
2.Velocity
3.Variety
4.Veracity
Key Points in Big Data
Ref: IBM.com
Definition
Big datais the term for a collection
of data sets so large and complex
that it becomes difficult to
process using on-hand database
management tools or traditional
Ref: goo.gl/iWZhjJ
management tools or traditional
data processing applications. The
challenges include capture,
curation, storage, search,
sharing, transfer, analysis, and
visualization.
http://www.forbes.com/sites/gilpress/2014/09/03/12-big-data-definitions-whats-
yours/#379879e621a9
Big Data Application
1. Finance
2. Insurance
3. Health Care
4. Agriculture
5. Defense5. Defense
6. Manufacturing
7. Aero Space
8. Oil and Gas
9. Advertisement and Marketing
10.Election Campaigns
11. List goes on --- applicability across industries
Big Data Application
http://www.forbes.com/sites/bernardmarr/2016/02/03/how-the-super-bowl-uses-big-data-to-
change-the-game/?
Big Data Application
http://andrewshamlet.com/2015/12/03/who-will-win-the-2016-us-presidential-nominations/
Ref:
http://www.
Big Data Application
http://www.
forbes.com/s
ites/bernard
marr/2016/0
2/02/this-is-
why-
dictators-
love-big-
data/2/#4d4
13e005844
Lets go back to definition
Big datais the term for a collection
of data sets so large and complex that
it becomes difficult to process using
on-hand database management
tools or traditional data processingtools or traditional data processing
applications. The challenges include
capture, curation, storage, search,
sharing, transfer, analysis, and
visualization.
Tools solving Big Data
Challenge
Confidential | Copyright © QA Agility Technologies
Tool solving the Big Data Challenge
Hadoop – Key components HDFS and MR
*Source Udacity
1. Sqoop takes data from
regular RDBMS and
puts it into HDFS
2. Flume ingests data
into HDFS as it is
generated by external
systems
3. HBASE is real time
Hadoop Ecosystem
*Source Udacity
3. HBASE is real time
database on top of
HDFS
4. Hue is a graphical
front end to the
cluster
5. Oozie is workflow
management tool
6. Mahout is Machine
Learning library
HDFS
• HDFS stands for Hadoop Distributed File
System, which is the storage system used
by Hadoop. The following is a high-level
architecture that explains how HDFSarchitecture that explains how HDFS
works.
Map Reduce
Ref: Emanuele Della Valle
@manudellavalle
Understanding MapReduce
Demo – Word Count
Confidential | Copyright © QA Agility Technologies
Demo – Word Count
Given an input file, count
unique words
WordCount – Map Reduce
Reference : http://wearecloud.cz/media/files/prezentace-biz/Big%20Data%20v%20Cloudu.ppt
How can QA and Testing
team use Big Data tools
Confidential | Copyright © QA Agility Technologies
team use Big Data tools
for their testing needs ?
Problem Statement and
Solution using Hadoop
and MapReduce
Confidential | Copyright © QA Agility Technologies
and MapReduce
Problem Statement and
Solution using Hadoop
and MapReduce
Confidential | Copyright © QA Agility Technologies
and MapReduce
MTBT – Multicast Tick by Tick Adapter
Input was exchange feed – Output given to HFT Engine
Legacy Adaptor (3rd Party)
connects to the TAP – and
converts to a format which
can be used by HFT
MTBT - Adaptor
Exchange TAP
– Co-location
servers listen
to it at high
speed
can be used by HFT
Platforms (Algorithmic
Trading Platforms)
New Adaptor – being made
Inhouse – to increase the
speed by 10 Times
HFT
Engine
MTBT - Adaptor
Input Output
Output over time
MTBT - Adaptor GOAL
--------------------------------------------------
1. Testing of Fast & dynamic nature of
multicast TBT, it is in micro seconds
and on an average around 20,000
data points/sec & on expiry/
volatility day, it goes upto 40,000
MTBT – Testing Objective
Input Output
Output over time
volatility day, it goes upto 40,000
data points/ sec.
2. To check if there is any packet drop.
3. To test the generated fresh &
accurate order book upto level 20
(configurable)
MTBT - Adaptor
S
a
m
p
l
e
S
a
m
p
l
e
S
a
m
p
l
e
S
a
m
p
l
e
S
a
m
p
l
e
MTBT – Testing Strategy - Sampling
Input Output
Output over time
Do A Reverse
Comparison
MTBT - Adaptor Challenges
--------------------------------------------------
1. Manually next to impossible
2. Even few seconds samples were
running into large MegaBytes (MB)
files
3. Manually impossible to compare
MTBT – Challenges
Input Output
Output over time
3. Manually impossible to compare
the legacy records with the New
code processed records
4. Daily processed data ran into 150
Giga Bytes (GB) plus files
MTBT - Adaptor BIG DATA Problem
--------------------------------------------------
1. LARGE 150 GB files (legacy and New
applications) – VOLUME
2. Testing to compare the output and
measure the functional
MTBT – It was a BIG DATA Testing
problem
Input Output
Output over time
measure the functional
effectiveness in real time data
environment – VELOCITY
3. Packet drops may happen –
(VERACITY)
4. Variety was not there – except the
format of the output file generated
was not in similar format – the
content/information was there
MTBT – SOLUTION
1 Reduce LEGACY MTBT - Output file into a standard format
2 Reduce NEW INHOUSE MTBT output file into a standard format
3 Compare the two files
4 Generate Report
DEMO
Confidential | Copyright © QA Agility Technologies
1. Record by Record Comparison being
done on 8 GB normal Linux server in
less than 2 hours
2. Automated report generation
3. Automated Result shared with
Outcome
Confidential | Copyright © QA Agility Technologies
3. Automated Result shared with
stakeholders
4. Used again for regression testing and
for NFT testing
5. Huge Benefits to the client (Time and
Money both)
QA team can use the tools in multiple scenarios
1. Beta Testing
2. Repeated execution effectiveness –
applying analytics ( R)
3. Capturing Customer feedback and
Other scenarios – Big Data Tool
implementation
Confidential | Copyright © QA Agility Technologies
3. Capturing Customer feedback and
channeling the same for smarter test
execution
4. Extracting relevant information from
repeated regression cycles from QC
5. Adding intelligence on the data generated
by the testing team
Other Way to use Big
Data (BETA TESTING)
Confidential | Copyright © QA Agility Technologies
Challenge – Tweet on
@qaagility
@adigindia
Other Way to use Big
Data
Confidential | Copyright © QA Agility Technologies
- Effective Regression Testing
- Effective Sanity Testing
Thank you and Jai Hind
Questions ?
@adigIndia@adigIndia
@AgileTA
#GTR2016
If Interested – Please attend a One day
workshop on Big Data (Saturday 27
Feb : 9 to 6 PM)
• Hadoop and Mapreduce• Hadoop and Mapreduce
• VM setup
• JDK, Eclipse and Hadoop
installation
• Map Reduce examples
Contact
Please contact us at info@QAAgility.com
Confidential | Copyright © QAAgility Technologies
MUMBAI
711, Rupa Solitaire
MBP, Mahape
Navi Mumbai-400701
DENMARK
1 Lindebo 7 Lej - 42,
2630 Tasstrup,
Copenhagen
+45.7164.0278
denmark@qaagility.com
USA
200 E Campus View Blvd.
Suite 200, Columbus, OH

Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

  • 1.
    Big Data -Hadoop and MapReduce - new age tools for aid to testing and QA by Aditya Garg Confidential | Copyright © QAAgility Technologies
  • 2.
    Aditya Garg @Adigindia Co-Founder and Director QAAgility.com  Co-founder & Steering Committee Member of Agile Testing Alliance – run meetup groups across multiple cities  Co-creator and licensed trainer of Agile Testing Alliance’s certifications CP-BAT, CP-MAT, CP-AAT, CP-SAT  Co-Author of a book on Selenium Co-Author of a book on Selenium  Love Cooking Indian Dishes – From Rajasthan  Tasting (Testing) World food  Travelling and meeting testers (Get inspired and may be inspire a few) @adigindia https://www.linkedin.com/in/adigarg
  • 3.
    Big Data -Hadoop and MapReduce - new age tools for aid to testing and QA Topic for the presentation for aid to testing and QA
  • 4.
    What is this Confidential| Copyright © QA Agility Technologies
  • 5.
    1. How totest Big Data applications ? 2. How can QA and Testing What are we going to discuss ? 2. How can QA and Testing team use Big Data tools for their testing needs ?
  • 6.
    1. How totest Big Data applications ? 2. How can QA and Testing What are we going to discuss ? 2. How can QA and Testing team use Big Data tools for their testing needs ?
  • 7.
    What is BigData ? Is it just too much Hype or Confidential | Copyright © QA Agility Technologies Is it just too much Hype or reality ?
  • 8.
    Let us startwith what exactly is BigData Confidential | Copyright © QA Agility Technologies
  • 9.
    Which Search Enginedo you use ? http://searchstorage.techtarget.com/definition all-that How much data does Google store ? https://www.cirrusinsight.com/blog/how-much-data-does-google-store http://searchstorage.techtarget.com/definition /Kilo-mega-giga-tera-peta-and-all
  • 10.
    On Search Engines– Anyone using DuckDuckGo?
  • 12.
  • 13.
    Key Points inBig Data 1.Volume – Data Explosion 2.Velocity 3.Variety 4.Veracity
  • 14.
    Key Points inBig Data Ref: IBM.com
  • 15.
    Definition Big datais theterm for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional Ref: goo.gl/iWZhjJ management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization. http://www.forbes.com/sites/gilpress/2014/09/03/12-big-data-definitions-whats- yours/#379879e621a9
  • 16.
    Big Data Application 1.Finance 2. Insurance 3. Health Care 4. Agriculture 5. Defense5. Defense 6. Manufacturing 7. Aero Space 8. Oil and Gas 9. Advertisement and Marketing 10.Election Campaigns 11. List goes on --- applicability across industries
  • 17.
  • 18.
  • 19.
  • 20.
    Lets go backto definition Big datais the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processingtools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization.
  • 21.
    Tools solving BigData Challenge Confidential | Copyright © QA Agility Technologies
  • 22.
    Tool solving theBig Data Challenge
  • 23.
    Hadoop – Keycomponents HDFS and MR *Source Udacity
  • 24.
    1. Sqoop takesdata from regular RDBMS and puts it into HDFS 2. Flume ingests data into HDFS as it is generated by external systems 3. HBASE is real time Hadoop Ecosystem *Source Udacity 3. HBASE is real time database on top of HDFS 4. Hue is a graphical front end to the cluster 5. Oozie is workflow management tool 6. Mahout is Machine Learning library
  • 25.
    HDFS • HDFS standsfor Hadoop Distributed File System, which is the storage system used by Hadoop. The following is a high-level architecture that explains how HDFSarchitecture that explains how HDFS works.
  • 26.
    Map Reduce Ref: EmanueleDella Valle @manudellavalle
  • 27.
    Understanding MapReduce Demo –Word Count Confidential | Copyright © QA Agility Technologies Demo – Word Count Given an input file, count unique words
  • 28.
    WordCount – MapReduce Reference : http://wearecloud.cz/media/files/prezentace-biz/Big%20Data%20v%20Cloudu.ppt
  • 29.
    How can QAand Testing team use Big Data tools Confidential | Copyright © QA Agility Technologies team use Big Data tools for their testing needs ?
  • 30.
    Problem Statement and Solutionusing Hadoop and MapReduce Confidential | Copyright © QA Agility Technologies and MapReduce
  • 31.
    Problem Statement and Solutionusing Hadoop and MapReduce Confidential | Copyright © QA Agility Technologies and MapReduce
  • 32.
    MTBT – MulticastTick by Tick Adapter Input was exchange feed – Output given to HFT Engine Legacy Adaptor (3rd Party) connects to the TAP – and converts to a format which can be used by HFT MTBT - Adaptor Exchange TAP – Co-location servers listen to it at high speed can be used by HFT Platforms (Algorithmic Trading Platforms) New Adaptor – being made Inhouse – to increase the speed by 10 Times HFT Engine
  • 33.
    MTBT - Adaptor InputOutput Output over time
  • 34.
    MTBT - AdaptorGOAL -------------------------------------------------- 1. Testing of Fast & dynamic nature of multicast TBT, it is in micro seconds and on an average around 20,000 data points/sec & on expiry/ volatility day, it goes upto 40,000 MTBT – Testing Objective Input Output Output over time volatility day, it goes upto 40,000 data points/ sec. 2. To check if there is any packet drop. 3. To test the generated fresh & accurate order book upto level 20 (configurable)
  • 35.
    MTBT - Adaptor S a m p l e S a m p l e S a m p l e S a m p l e S a m p l e MTBT– Testing Strategy - Sampling Input Output Output over time Do A Reverse Comparison
  • 36.
    MTBT - AdaptorChallenges -------------------------------------------------- 1. Manually next to impossible 2. Even few seconds samples were running into large MegaBytes (MB) files 3. Manually impossible to compare MTBT – Challenges Input Output Output over time 3. Manually impossible to compare the legacy records with the New code processed records 4. Daily processed data ran into 150 Giga Bytes (GB) plus files
  • 37.
    MTBT - AdaptorBIG DATA Problem -------------------------------------------------- 1. LARGE 150 GB files (legacy and New applications) – VOLUME 2. Testing to compare the output and measure the functional MTBT – It was a BIG DATA Testing problem Input Output Output over time measure the functional effectiveness in real time data environment – VELOCITY 3. Packet drops may happen – (VERACITY) 4. Variety was not there – except the format of the output file generated was not in similar format – the content/information was there
  • 38.
    MTBT – SOLUTION 1Reduce LEGACY MTBT - Output file into a standard format 2 Reduce NEW INHOUSE MTBT output file into a standard format 3 Compare the two files 4 Generate Report
  • 39.
    DEMO Confidential | Copyright© QA Agility Technologies
  • 40.
    1. Record byRecord Comparison being done on 8 GB normal Linux server in less than 2 hours 2. Automated report generation 3. Automated Result shared with Outcome Confidential | Copyright © QA Agility Technologies 3. Automated Result shared with stakeholders 4. Used again for regression testing and for NFT testing 5. Huge Benefits to the client (Time and Money both)
  • 41.
    QA team canuse the tools in multiple scenarios 1. Beta Testing 2. Repeated execution effectiveness – applying analytics ( R) 3. Capturing Customer feedback and Other scenarios – Big Data Tool implementation Confidential | Copyright © QA Agility Technologies 3. Capturing Customer feedback and channeling the same for smarter test execution 4. Extracting relevant information from repeated regression cycles from QC 5. Adding intelligence on the data generated by the testing team
  • 42.
    Other Way touse Big Data (BETA TESTING) Confidential | Copyright © QA Agility Technologies Challenge – Tweet on @qaagility @adigindia
  • 43.
    Other Way touse Big Data Confidential | Copyright © QA Agility Technologies - Effective Regression Testing - Effective Sanity Testing
  • 44.
    Thank you andJai Hind Questions ? @adigIndia@adigIndia @AgileTA #GTR2016
  • 45.
    If Interested –Please attend a One day workshop on Big Data (Saturday 27 Feb : 9 to 6 PM) • Hadoop and Mapreduce• Hadoop and Mapreduce • VM setup • JDK, Eclipse and Hadoop installation • Map Reduce examples
  • 46.
    Contact Please contact usat info@QAAgility.com Confidential | Copyright © QAAgility Technologies MUMBAI 711, Rupa Solitaire MBP, Mahape Navi Mumbai-400701 DENMARK 1 Lindebo 7 Lej - 42, 2630 Tasstrup, Copenhagen +45.7164.0278 denmark@qaagility.com USA 200 E Campus View Blvd. Suite 200, Columbus, OH