SlideShare a Scribd company logo
11©ThoughtWorks 2019 Commercial in Confidence
Insights Into Big Data
Testing
VodQA 2019
AGENDA
• Introduction to Big Data Application
• Testing Aspects
• Various Types of Tests
• Automation Tools/Framework
• Testing Challenges
2
3
What is Big Data ???
©ThoughtWorks 2019 Commercial in Confidence
4
Mammoth of Data
?
Is Any Data Big ?
©ThoughtWorks 2019 Commercial in Confidence
V’s Of Big Data
Volume,
Velocity,
Variety &
Veracity
5©ThoughtWorks 2019 Commercial in Confidence
6
Big Data Applications
©ThoughtWorks 2019 Commercial in Confidence
7
8
Hadoop is one of the solutions to Big Data Problem
Testing depends on kind of tools being used in Big Data Application
BigDataEcosystem
9
Big Data Application
Workflow
©ThoughtWorks 2019 Commercial in Confidence
Typically has following stages
Reporting &
Analysis
Extract/Query
the output
from HDFS
Perform
MapReduce/
spark
operations
Loading Source
Data Files into
HDFS
©ThoughtWorks 2019 Commercial in Confidence
Big Data Testing Aspects
1
1
©ThoughtWorks 2019 Commercial in Confidence
©ThoughtWorks 2019 Commercial in Confidence
1. Validation Of Data
2. Structured or unstructured data consideration
3. Optimal Test Environment
4. Availability of Hadoop centric testing tools
5. Performing Non Functional Testing
6. Efficient test data set
7. Hive Internal External tables
12
Few things to consider during testing
In this case data is structured
Big Data Application Must Have Tests
1
3
Unit Test Hive Query Validator
Hive Test Integration Test
Oozie Test Functional Test
©ThoughtWorks 2019 Commercial in Confidence
1
4
©ThoughtWorks 2019 Commercial in Confidence
Automation tools
& Framework
1
5
©ThoughtWorks 2019 Commercial in Confidence
Unit Testing Framework
1
6
Mockito : Java Based Mocking Framework
Worker Bee : Framework to perform tasks with
Apache HIVE
Junit : Unit Testing Framework
©ThoughtWorks 2019 Commercial in Confidence
MOCKITO
MOCK FRAMEWORK
● Mocks external dependencies
● Insert mocks into code under test
● Execute the code
● Validate if code executed as
expected
● When ThenReturn Rule
17©ThoughtWorks 2019 Commercial in Confidence
Creating Database
& table
Using new operator
& tables using
havingTable(Class)
Generate Migration
Files
Using Migration Genrator
Setup Test Data
Verify Result &
Execute Queries
Execute function and assert
result
WORKFLOW
WORKER BEE
HIVE TEST FRAMEWORK
18©ThoughtWorks 2019 Commercial in Confidence
● Define schema of database & table
● Querybuilder at disposal
● Go with TDD
● Run migrations against test table
Let's Understand
1. Create Database and Table
public static final BaseBall db = new BaseBall();
db.havingTable(Batting.tb);
2. Define Columns and types as per need
public static final Column playerId = HavingColumn(tb, "player_id", Column.Type.STRING);
1. Create Rows (Dataset)
private static Row<Batting> lowestRun
= Batting.tb.getNewRow()
.set(Batting.playerId, PLAYER_1_ID)
.set(Batting.year, 1990)
1. Call the script to logic using Execute
List<Row<Table>> years = repo.execute(BaseBall.highestScoreForEachYear());
1. Verify data using Assert
assertThat(years.size(), is(1));
1
9
©ThoughtWorks 2019 Commercial in Confidence
Functional Test
20
Verification Of End To
End Workflows
Verification Of Data
Setup
Verification Of Reports
Functional Test
Pipeline Smoke &
Regression Pipeline
Selenium
Cucumber
Junit
Dedicated Cluster for Automation
FRAMEWORK INFORMATION
21©ThoughtWorks 2019 Commercial in Confidence
Let's Understand
End To End workflows is called
2
2
Data Set up
Excel contains
tabular data
Data is entered as
Text tables
Table File conversions
As per table, text
table is converted
to Parquet table or
Avro Table
Data Verification
Using Cucumber’s
Data Table
Front End Validation
Selenium
©ThoughtWorks 2019 Commercial in Confidence
Oozie Test
23
● Worker Bee
● Oozie Client
● Junit
Oozie Framework
RUNS ON CLUSTER
❏ This directly works on any workflow
❏ Feedback cycle is quick
❏ Submits jobs and track its
completion
❏ Easier to set up test data
❏ Helpful in debugging production
issues
Does the following
24©ThoughtWorks 2019 Commercial in Confidence
Tools and framework
Let’s Understand,
At Workflow Level
Once the test is completed, drop the
table using Query Generator
25
Create Table Class
row
method
Use Create method
create(Table Name)
Call oozie
workflow &
set properties
Verify table
headers,
Records
Verify
Result
Insert
Records
Run
Workflow
Identify
Test table
Create
Schema
Columns &
Partitions
©ThoughtWorks 2019 Commercial in Confidence
Data Set Up
Testing Challenges
2
6
©ThoughtWorks 2019 Commercial in Confidence
Challenges
• Big Data Pipelines can take huge amount of time
• Importance for cluster environment for testing
• Support for different file formats (like parquet, avro,
text)
• To test only few workflows
• To maintain test data set
2
7
Challenges
• Testing Migration scenarios...in case if file format is
preferred
• Cluster performance issues
• Hive-impala service issues
• Incorporating schema changes in Automation test set
and table set
• Managing partition format as well
2
8
Issues Caught In Testing
Spark has known issue if a partition
exists but not directory, then it
throws exception
Spark 1.6 Issue
Sorting Issues on columns,
importance of secondary sort
Report Columns
Query performance decreases
Small number of large files
in HDFS
29©ThoughtWorks 2019 Commercial in Confidence
©ThoughtWorks 2019 Commercial in Confidence
PRIYANKA RAWAT
QUALITY ANALYST
prawat@thoughtworks.com | thoughtworks.com
30
THANK YOU

More Related Content

What's hot

[Webinar] Automating Developer Workspace Construction for the Nuxeo Platform ...
[Webinar] Automating Developer Workspace Construction for the Nuxeo Platform ...[Webinar] Automating Developer Workspace Construction for the Nuxeo Platform ...
[Webinar] Automating Developer Workspace Construction for the Nuxeo Platform ...
Nuxeo
 
HTML5
HTML5HTML5
Developing Goobi: An Open-Source Workflow Tracking Tool for Digitization Proj...
Developing Goobi: An Open-Source Workflow Tracking Tool for Digitization Proj...Developing Goobi: An Open-Source Workflow Tracking Tool for Digitization Proj...
Developing Goobi: An Open-Source Workflow Tracking Tool for Digitization Proj...
intranda GmbH
 
Codemotion tech pills - Continuous performance
Codemotion tech pills  - Continuous performanceCodemotion tech pills  - Continuous performance
Codemotion tech pills - Continuous performance
Bert Jan Schrijver
 
Building application in a "Microfrontends" way - Matthias Lauf *XConf Manchester
Building application in a "Microfrontends" way - Matthias Lauf *XConf ManchesterBuilding application in a "Microfrontends" way - Matthias Lauf *XConf Manchester
Building application in a "Microfrontends" way - Matthias Lauf *XConf Manchester
Thoughtworks
 
Amsterdam JUG - Continuous performance
Amsterdam JUG - Continuous performanceAmsterdam JUG - Continuous performance
Amsterdam JUG - Continuous performance
Bert Jan Schrijver
 
Agile Software Architecture
Agile Software ArchitectureAgile Software Architecture
Agile Software Architecturecesarioramos
 
DeTesters meetup november 2018 - Continuous performance: load testing with G...
DeTesters meetup november 2018  - Continuous performance: load testing with G...DeTesters meetup november 2018  - Continuous performance: load testing with G...
DeTesters meetup november 2018 - Continuous performance: load testing with G...
Bert Jan Schrijver
 
FrontendOps - Giamir Buoncristiani
FrontendOps - Giamir BuoncristianiFrontendOps - Giamir Buoncristiani
FrontendOps - Giamir Buoncristiani
Thoughtworks
 
OutSystems Tips and Tricks
OutSystems Tips and TricksOutSystems Tips and Tricks
OutSystems Tips and Tricks
OutSystems
 
Goobi UK user meeting: Extensions for Goobi with plugins
Goobi UK user meeting: Extensions for Goobi with plugins Goobi UK user meeting: Extensions for Goobi with plugins
Goobi UK user meeting: Extensions for Goobi with plugins
intranda GmbH
 
A differnt Type of Supermarket Delivery
A differnt Type of Supermarket DeliveryA differnt Type of Supermarket Delivery
A differnt Type of Supermarket Delivery
Thoughtworks
 
ViveLab
ViveLabViveLab
ViveLab
LabSharegroup
 
通往測試最高殿堂的旅程 - GTAC 2016
通往測試最高殿堂的旅程 - GTAC 2016通往測試最高殿堂的旅程 - GTAC 2016
通往測試最高殿堂的旅程 - GTAC 2016
Chloe Chen
 
Intratech fact sheet 2015
Intratech fact sheet 2015Intratech fact sheet 2015
Intratech fact sheet 2015
Taiheon Choi
 
From 0 to cloud in 60 seconds
From 0 to cloud in 60 secondsFrom 0 to cloud in 60 seconds
From 0 to cloud in 60 secondsSafe Swiss Cloud
 
microXchg 2019: "Creating an Effective Developer Experience for Cloud-Native ...
microXchg 2019: "Creating an Effective Developer Experience for Cloud-Native ...microXchg 2019: "Creating an Effective Developer Experience for Cloud-Native ...
microXchg 2019: "Creating an Effective Developer Experience for Cloud-Native ...
Daniel Bryant
 
Continuous performance: Load testing for developers with gatling @ Utrecht JUG
Continuous performance: Load testing for developers with gatling @ Utrecht JUGContinuous performance: Load testing for developers with gatling @ Utrecht JUG
Continuous performance: Load testing for developers with gatling @ Utrecht JUG
Tim van Eijndhoven
 
Using VSTS to Deploy BizTalk Server Solutions by Johan Hedberg
Using VSTS to Deploy BizTalk Server Solutions by Johan HedbergUsing VSTS to Deploy BizTalk Server Solutions by Johan Hedberg
Using VSTS to Deploy BizTalk Server Solutions by Johan Hedberg
Adam Walhout
 
Exponential value driven refactoring
 Exponential value driven refactoring Exponential value driven refactoring
Exponential value driven refactoring
Lorenzo Cassulo
 

What's hot (20)

[Webinar] Automating Developer Workspace Construction for the Nuxeo Platform ...
[Webinar] Automating Developer Workspace Construction for the Nuxeo Platform ...[Webinar] Automating Developer Workspace Construction for the Nuxeo Platform ...
[Webinar] Automating Developer Workspace Construction for the Nuxeo Platform ...
 
HTML5
HTML5HTML5
HTML5
 
Developing Goobi: An Open-Source Workflow Tracking Tool for Digitization Proj...
Developing Goobi: An Open-Source Workflow Tracking Tool for Digitization Proj...Developing Goobi: An Open-Source Workflow Tracking Tool for Digitization Proj...
Developing Goobi: An Open-Source Workflow Tracking Tool for Digitization Proj...
 
Codemotion tech pills - Continuous performance
Codemotion tech pills  - Continuous performanceCodemotion tech pills  - Continuous performance
Codemotion tech pills - Continuous performance
 
Building application in a "Microfrontends" way - Matthias Lauf *XConf Manchester
Building application in a "Microfrontends" way - Matthias Lauf *XConf ManchesterBuilding application in a "Microfrontends" way - Matthias Lauf *XConf Manchester
Building application in a "Microfrontends" way - Matthias Lauf *XConf Manchester
 
Amsterdam JUG - Continuous performance
Amsterdam JUG - Continuous performanceAmsterdam JUG - Continuous performance
Amsterdam JUG - Continuous performance
 
Agile Software Architecture
Agile Software ArchitectureAgile Software Architecture
Agile Software Architecture
 
DeTesters meetup november 2018 - Continuous performance: load testing with G...
DeTesters meetup november 2018  - Continuous performance: load testing with G...DeTesters meetup november 2018  - Continuous performance: load testing with G...
DeTesters meetup november 2018 - Continuous performance: load testing with G...
 
FrontendOps - Giamir Buoncristiani
FrontendOps - Giamir BuoncristianiFrontendOps - Giamir Buoncristiani
FrontendOps - Giamir Buoncristiani
 
OutSystems Tips and Tricks
OutSystems Tips and TricksOutSystems Tips and Tricks
OutSystems Tips and Tricks
 
Goobi UK user meeting: Extensions for Goobi with plugins
Goobi UK user meeting: Extensions for Goobi with plugins Goobi UK user meeting: Extensions for Goobi with plugins
Goobi UK user meeting: Extensions for Goobi with plugins
 
A differnt Type of Supermarket Delivery
A differnt Type of Supermarket DeliveryA differnt Type of Supermarket Delivery
A differnt Type of Supermarket Delivery
 
ViveLab
ViveLabViveLab
ViveLab
 
通往測試最高殿堂的旅程 - GTAC 2016
通往測試最高殿堂的旅程 - GTAC 2016通往測試最高殿堂的旅程 - GTAC 2016
通往測試最高殿堂的旅程 - GTAC 2016
 
Intratech fact sheet 2015
Intratech fact sheet 2015Intratech fact sheet 2015
Intratech fact sheet 2015
 
From 0 to cloud in 60 seconds
From 0 to cloud in 60 secondsFrom 0 to cloud in 60 seconds
From 0 to cloud in 60 seconds
 
microXchg 2019: "Creating an Effective Developer Experience for Cloud-Native ...
microXchg 2019: "Creating an Effective Developer Experience for Cloud-Native ...microXchg 2019: "Creating an Effective Developer Experience for Cloud-Native ...
microXchg 2019: "Creating an Effective Developer Experience for Cloud-Native ...
 
Continuous performance: Load testing for developers with gatling @ Utrecht JUG
Continuous performance: Load testing for developers with gatling @ Utrecht JUGContinuous performance: Load testing for developers with gatling @ Utrecht JUG
Continuous performance: Load testing for developers with gatling @ Utrecht JUG
 
Using VSTS to Deploy BizTalk Server Solutions by Johan Hedberg
Using VSTS to Deploy BizTalk Server Solutions by Johan HedbergUsing VSTS to Deploy BizTalk Server Solutions by Johan Hedberg
Using VSTS to Deploy BizTalk Server Solutions by Johan Hedberg
 
Exponential value driven refactoring
 Exponential value driven refactoring Exponential value driven refactoring
Exponential value driven refactoring
 

Similar to vodQA Pune (2019) - Insights into big data testing

Agile, qa and data projects geek night 2020
Agile, qa and data projects   geek night 2020Agile, qa and data projects   geek night 2020
Agile, qa and data projects geek night 2020
Balvinder Hira
 
“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentatio...
“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentatio...“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentatio...
“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentatio...
Edge AI and Vision Alliance
 
CNCF On-Demand Webinar_ LitmusChaos Project Updates.pdf
CNCF On-Demand Webinar_ LitmusChaos Project Updates.pdfCNCF On-Demand Webinar_ LitmusChaos Project Updates.pdf
CNCF On-Demand Webinar_ LitmusChaos Project Updates.pdf
LibbySchulze
 
Dell Boomi AtomSphere - A presentation by RapidValue Solutions
Dell Boomi AtomSphere  - A presentation by RapidValue SolutionsDell Boomi AtomSphere  - A presentation by RapidValue Solutions
Dell Boomi AtomSphere - A presentation by RapidValue Solutions
RapidValue
 
Continuous Delivery for Machine Learning
Continuous Delivery for Machine LearningContinuous Delivery for Machine Learning
Continuous Delivery for Machine Learning
Thoughtworks
 
451 Research: Data Is the Key to Friction in DevOps
451 Research: Data Is the Key to Friction in DevOps451 Research: Data Is the Key to Friction in DevOps
451 Research: Data Is the Key to Friction in DevOps
Delphix
 
SOA Knowledge Kit, Developer Productivity and Performance Comparison Analysis
SOA Knowledge Kit, Developer Productivity  and Performance Comparison AnalysisSOA Knowledge Kit, Developer Productivity  and Performance Comparison Analysis
SOA Knowledge Kit, Developer Productivity and Performance Comparison Analysis
Clever Moe
 
Dell Boomi Integration with Salesforce
Dell Boomi Integration with SalesforceDell Boomi Integration with Salesforce
Dell Boomi Integration with Salesforce
Nagarjuna Kaipu
 
DevOps for Big Data - Data 360 2014 Conference
DevOps for Big Data - Data 360 2014 ConferenceDevOps for Big Data - Data 360 2014 Conference
DevOps for Big Data - Data 360 2014 Conference
Grid Dynamics
 
Curiosity and Testery Present: Hitting the right test coverage for CI/CD
Curiosity and Testery Present: Hitting the right test coverage for CI/CDCuriosity and Testery Present: Hitting the right test coverage for CI/CD
Curiosity and Testery Present: Hitting the right test coverage for CI/CD
Curiosity Software Ireland
 
Big Data Benchmarking, Tomas Pariente Lobo, Open Expo Europe, 20/06/2019
Big Data Benchmarking, Tomas Pariente Lobo, Open Expo Europe, 20/06/2019Big Data Benchmarking, Tomas Pariente Lobo, Open Expo Europe, 20/06/2019
Big Data Benchmarking, Tomas Pariente Lobo, Open Expo Europe, 20/06/2019
DataBench
 
SB Support System
SB Support SystemSB Support System
SB Support System
Rocket Software
 
StarWest 2019 - End to end testing: Stupid or Legit?
StarWest 2019 - End to end testing: Stupid or Legit?StarWest 2019 - End to end testing: Stupid or Legit?
StarWest 2019 - End to end testing: Stupid or Legit?
mabl
 
Big Query Basics
Big Query BasicsBig Query Basics
Big Query Basics
Ido Green
 
Performance Testing
Performance TestingPerformance Testing
Performance Testing
vodQA
 
Pivotal Platform - December Release A First Look
Pivotal Platform - December Release A First LookPivotal Platform - December Release A First Look
Pivotal Platform - December Release A First Look
VMware Tanzu
 
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
Agile Testing Alliance
 
The Benefits of Upgrading
The Benefits of UpgradingThe Benefits of Upgrading
The Benefits of Upgrading
Anthony D'Ugo
 
XRAY for Jira
XRAY for JiraXRAY for Jira
XRAY for Jira
Mike Brosnan
 

Similar to vodQA Pune (2019) - Insights into big data testing (20)

Agile, qa and data projects geek night 2020
Agile, qa and data projects   geek night 2020Agile, qa and data projects   geek night 2020
Agile, qa and data projects geek night 2020
 
“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentatio...
“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentatio...“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentatio...
“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentatio...
 
CNCF On-Demand Webinar_ LitmusChaos Project Updates.pdf
CNCF On-Demand Webinar_ LitmusChaos Project Updates.pdfCNCF On-Demand Webinar_ LitmusChaos Project Updates.pdf
CNCF On-Demand Webinar_ LitmusChaos Project Updates.pdf
 
Dell Boomi AtomSphere - A presentation by RapidValue Solutions
Dell Boomi AtomSphere  - A presentation by RapidValue SolutionsDell Boomi AtomSphere  - A presentation by RapidValue Solutions
Dell Boomi AtomSphere - A presentation by RapidValue Solutions
 
Continuous Delivery for Machine Learning
Continuous Delivery for Machine LearningContinuous Delivery for Machine Learning
Continuous Delivery for Machine Learning
 
451 Research: Data Is the Key to Friction in DevOps
451 Research: Data Is the Key to Friction in DevOps451 Research: Data Is the Key to Friction in DevOps
451 Research: Data Is the Key to Friction in DevOps
 
SOA Knowledge Kit, Developer Productivity and Performance Comparison Analysis
SOA Knowledge Kit, Developer Productivity  and Performance Comparison AnalysisSOA Knowledge Kit, Developer Productivity  and Performance Comparison Analysis
SOA Knowledge Kit, Developer Productivity and Performance Comparison Analysis
 
Dell Boomi Integration with Salesforce
Dell Boomi Integration with SalesforceDell Boomi Integration with Salesforce
Dell Boomi Integration with Salesforce
 
DevOps for Big Data - Data 360 2014 Conference
DevOps for Big Data - Data 360 2014 ConferenceDevOps for Big Data - Data 360 2014 Conference
DevOps for Big Data - Data 360 2014 Conference
 
Curiosity and Testery Present: Hitting the right test coverage for CI/CD
Curiosity and Testery Present: Hitting the right test coverage for CI/CDCuriosity and Testery Present: Hitting the right test coverage for CI/CD
Curiosity and Testery Present: Hitting the right test coverage for CI/CD
 
Big Data Benchmarking, Tomas Pariente Lobo, Open Expo Europe, 20/06/2019
Big Data Benchmarking, Tomas Pariente Lobo, Open Expo Europe, 20/06/2019Big Data Benchmarking, Tomas Pariente Lobo, Open Expo Europe, 20/06/2019
Big Data Benchmarking, Tomas Pariente Lobo, Open Expo Europe, 20/06/2019
 
SB Support System
SB Support SystemSB Support System
SB Support System
 
StarWest 2019 - End to end testing: Stupid or Legit?
StarWest 2019 - End to end testing: Stupid or Legit?StarWest 2019 - End to end testing: Stupid or Legit?
StarWest 2019 - End to end testing: Stupid or Legit?
 
Big Query Basics
Big Query BasicsBig Query Basics
Big Query Basics
 
Performance Testing
Performance TestingPerformance Testing
Performance Testing
 
Pivotal Platform - December Release A First Look
Pivotal Platform - December Release A First LookPivotal Platform - December Release A First Look
Pivotal Platform - December Release A First Look
 
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
 
The Benefits of Upgrading
The Benefits of UpgradingThe Benefits of Upgrading
The Benefits of Upgrading
 
Technical Without Code
Technical Without CodeTechnical Without Code
Technical Without Code
 
XRAY for Jira
XRAY for JiraXRAY for Jira
XRAY for Jira
 

More from vodQA

Testing Strategy in Micro Frontend architecture
Testing Strategy in Micro Frontend architectureTesting Strategy in Micro Frontend architecture
Testing Strategy in Micro Frontend architecture
vodQA
 
Api testing libraries using java script an overview
Api testing libraries using java script   an overviewApi testing libraries using java script   an overview
Api testing libraries using java script an overview
vodQA
 
Testing face authentication on mobile
Testing face authentication on mobileTesting face authentication on mobile
Testing face authentication on mobile
vodQA
 
Testing cna
Testing cnaTesting cna
Testing cna
vodQA
 
Etl engine testing with scala
Etl engine testing with scalaEtl engine testing with scala
Etl engine testing with scala
vodQA
 
EDA for QAs
EDA for QAsEDA for QAs
EDA for QAs
vodQA
 
vodQA Pune (2019) - Browser automation using dev tools
vodQA Pune (2019) - Browser automation using dev toolsvodQA Pune (2019) - Browser automation using dev tools
vodQA Pune (2019) - Browser automation using dev tools
vodQA
 
vodQA Pune (2019) - Augmented reality overview and testing challenges
vodQA Pune (2019) - Augmented reality overview and testing challengesvodQA Pune (2019) - Augmented reality overview and testing challenges
vodQA Pune (2019) - Augmented reality overview and testing challenges
vodQA
 
vodQA Pune (2019) - Performance testing cloud deployments
vodQA Pune (2019) - Performance testing cloud deploymentsvodQA Pune (2019) - Performance testing cloud deployments
vodQA Pune (2019) - Performance testing cloud deployments
vodQA
 
vodQA Pune (2019) - Jenkins pipeline As code
vodQA Pune (2019) - Jenkins pipeline As codevodQA Pune (2019) - Jenkins pipeline As code
vodQA Pune (2019) - Jenkins pipeline As code
vodQA
 
vodQA(Pune) 2018 - Consumer driven contract testing using pact
vodQA(Pune) 2018 - Consumer driven contract testing using pactvodQA(Pune) 2018 - Consumer driven contract testing using pact
vodQA(Pune) 2018 - Consumer driven contract testing using pact
vodQA
 
vodQA(Pune) 2018 - Visual testing of web apps in headless environment manis...
vodQA(Pune) 2018 - Visual testing of web apps in headless environment   manis...vodQA(Pune) 2018 - Visual testing of web apps in headless environment   manis...
vodQA(Pune) 2018 - Visual testing of web apps in headless environment manis...
vodQA
 
vodQA(Pune) 2018 - Enhancing the capabilities of testing team preparing for...
vodQA(Pune) 2018 - Enhancing the capabilities of testing team   preparing for...vodQA(Pune) 2018 - Enhancing the capabilities of testing team   preparing for...
vodQA(Pune) 2018 - Enhancing the capabilities of testing team preparing for...
vodQA
 
vodQA(Pune) 2018 - QAing the security way
vodQA(Pune) 2018 - QAing the security wayvodQA(Pune) 2018 - QAing the security way
vodQA(Pune) 2018 - QAing the security way
vodQA
 
vodQA(Pune) 2018 - Docker in Testing
vodQA(Pune) 2018 - Docker in TestingvodQA(Pune) 2018 - Docker in Testing
vodQA(Pune) 2018 - Docker in Testing
vodQA
 
Mobile automation using appium.pptx
Mobile automation using appium.pptxMobile automation using appium.pptx
Mobile automation using appium.pptx
vodQA
 
An approach to app security - For beginners
An approach to app security - For beginnersAn approach to app security - For beginners
An approach to app security - For beginners
vodQA
 
Retrospective
RetrospectiveRetrospective
Retrospective
vodQA
 
Whys and Hows of Automation
Whys and Hows of AutomationWhys and Hows of Automation
Whys and Hows of Automation
vodQA
 
Test Automation Pyramid
Test Automation PyramidTest Automation Pyramid
Test Automation Pyramid
vodQA
 

More from vodQA (20)

Testing Strategy in Micro Frontend architecture
Testing Strategy in Micro Frontend architectureTesting Strategy in Micro Frontend architecture
Testing Strategy in Micro Frontend architecture
 
Api testing libraries using java script an overview
Api testing libraries using java script   an overviewApi testing libraries using java script   an overview
Api testing libraries using java script an overview
 
Testing face authentication on mobile
Testing face authentication on mobileTesting face authentication on mobile
Testing face authentication on mobile
 
Testing cna
Testing cnaTesting cna
Testing cna
 
Etl engine testing with scala
Etl engine testing with scalaEtl engine testing with scala
Etl engine testing with scala
 
EDA for QAs
EDA for QAsEDA for QAs
EDA for QAs
 
vodQA Pune (2019) - Browser automation using dev tools
vodQA Pune (2019) - Browser automation using dev toolsvodQA Pune (2019) - Browser automation using dev tools
vodQA Pune (2019) - Browser automation using dev tools
 
vodQA Pune (2019) - Augmented reality overview and testing challenges
vodQA Pune (2019) - Augmented reality overview and testing challengesvodQA Pune (2019) - Augmented reality overview and testing challenges
vodQA Pune (2019) - Augmented reality overview and testing challenges
 
vodQA Pune (2019) - Performance testing cloud deployments
vodQA Pune (2019) - Performance testing cloud deploymentsvodQA Pune (2019) - Performance testing cloud deployments
vodQA Pune (2019) - Performance testing cloud deployments
 
vodQA Pune (2019) - Jenkins pipeline As code
vodQA Pune (2019) - Jenkins pipeline As codevodQA Pune (2019) - Jenkins pipeline As code
vodQA Pune (2019) - Jenkins pipeline As code
 
vodQA(Pune) 2018 - Consumer driven contract testing using pact
vodQA(Pune) 2018 - Consumer driven contract testing using pactvodQA(Pune) 2018 - Consumer driven contract testing using pact
vodQA(Pune) 2018 - Consumer driven contract testing using pact
 
vodQA(Pune) 2018 - Visual testing of web apps in headless environment manis...
vodQA(Pune) 2018 - Visual testing of web apps in headless environment   manis...vodQA(Pune) 2018 - Visual testing of web apps in headless environment   manis...
vodQA(Pune) 2018 - Visual testing of web apps in headless environment manis...
 
vodQA(Pune) 2018 - Enhancing the capabilities of testing team preparing for...
vodQA(Pune) 2018 - Enhancing the capabilities of testing team   preparing for...vodQA(Pune) 2018 - Enhancing the capabilities of testing team   preparing for...
vodQA(Pune) 2018 - Enhancing the capabilities of testing team preparing for...
 
vodQA(Pune) 2018 - QAing the security way
vodQA(Pune) 2018 - QAing the security wayvodQA(Pune) 2018 - QAing the security way
vodQA(Pune) 2018 - QAing the security way
 
vodQA(Pune) 2018 - Docker in Testing
vodQA(Pune) 2018 - Docker in TestingvodQA(Pune) 2018 - Docker in Testing
vodQA(Pune) 2018 - Docker in Testing
 
Mobile automation using appium.pptx
Mobile automation using appium.pptxMobile automation using appium.pptx
Mobile automation using appium.pptx
 
An approach to app security - For beginners
An approach to app security - For beginnersAn approach to app security - For beginners
An approach to app security - For beginners
 
Retrospective
RetrospectiveRetrospective
Retrospective
 
Whys and Hows of Automation
Whys and Hows of AutomationWhys and Hows of Automation
Whys and Hows of Automation
 
Test Automation Pyramid
Test Automation PyramidTest Automation Pyramid
Test Automation Pyramid
 

Recently uploaded

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 

Recently uploaded (20)

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 

vodQA Pune (2019) - Insights into big data testing

  • 1. 11©ThoughtWorks 2019 Commercial in Confidence Insights Into Big Data Testing VodQA 2019
  • 2. AGENDA • Introduction to Big Data Application • Testing Aspects • Various Types of Tests • Automation Tools/Framework • Testing Challenges 2
  • 3. 3 What is Big Data ??? ©ThoughtWorks 2019 Commercial in Confidence
  • 4. 4 Mammoth of Data ? Is Any Data Big ? ©ThoughtWorks 2019 Commercial in Confidence
  • 5. V’s Of Big Data Volume, Velocity, Variety & Veracity 5©ThoughtWorks 2019 Commercial in Confidence
  • 6. 6 Big Data Applications ©ThoughtWorks 2019 Commercial in Confidence
  • 7. 7
  • 8. 8 Hadoop is one of the solutions to Big Data Problem Testing depends on kind of tools being used in Big Data Application BigDataEcosystem
  • 9. 9 Big Data Application Workflow ©ThoughtWorks 2019 Commercial in Confidence
  • 10. Typically has following stages Reporting & Analysis Extract/Query the output from HDFS Perform MapReduce/ spark operations Loading Source Data Files into HDFS ©ThoughtWorks 2019 Commercial in Confidence
  • 11. Big Data Testing Aspects 1 1 ©ThoughtWorks 2019 Commercial in Confidence
  • 12. ©ThoughtWorks 2019 Commercial in Confidence 1. Validation Of Data 2. Structured or unstructured data consideration 3. Optimal Test Environment 4. Availability of Hadoop centric testing tools 5. Performing Non Functional Testing 6. Efficient test data set 7. Hive Internal External tables 12 Few things to consider during testing In this case data is structured
  • 13. Big Data Application Must Have Tests 1 3 Unit Test Hive Query Validator Hive Test Integration Test Oozie Test Functional Test ©ThoughtWorks 2019 Commercial in Confidence
  • 15. Automation tools & Framework 1 5 ©ThoughtWorks 2019 Commercial in Confidence
  • 16. Unit Testing Framework 1 6 Mockito : Java Based Mocking Framework Worker Bee : Framework to perform tasks with Apache HIVE Junit : Unit Testing Framework ©ThoughtWorks 2019 Commercial in Confidence
  • 17. MOCKITO MOCK FRAMEWORK ● Mocks external dependencies ● Insert mocks into code under test ● Execute the code ● Validate if code executed as expected ● When ThenReturn Rule 17©ThoughtWorks 2019 Commercial in Confidence
  • 18. Creating Database & table Using new operator & tables using havingTable(Class) Generate Migration Files Using Migration Genrator Setup Test Data Verify Result & Execute Queries Execute function and assert result WORKFLOW WORKER BEE HIVE TEST FRAMEWORK 18©ThoughtWorks 2019 Commercial in Confidence ● Define schema of database & table ● Querybuilder at disposal ● Go with TDD ● Run migrations against test table
  • 19. Let's Understand 1. Create Database and Table public static final BaseBall db = new BaseBall(); db.havingTable(Batting.tb); 2. Define Columns and types as per need public static final Column playerId = HavingColumn(tb, "player_id", Column.Type.STRING); 1. Create Rows (Dataset) private static Row<Batting> lowestRun = Batting.tb.getNewRow() .set(Batting.playerId, PLAYER_1_ID) .set(Batting.year, 1990) 1. Call the script to logic using Execute List<Row<Table>> years = repo.execute(BaseBall.highestScoreForEachYear()); 1. Verify data using Assert assertThat(years.size(), is(1)); 1 9
  • 20. ©ThoughtWorks 2019 Commercial in Confidence Functional Test 20
  • 21. Verification Of End To End Workflows Verification Of Data Setup Verification Of Reports Functional Test Pipeline Smoke & Regression Pipeline Selenium Cucumber Junit Dedicated Cluster for Automation FRAMEWORK INFORMATION 21©ThoughtWorks 2019 Commercial in Confidence
  • 22. Let's Understand End To End workflows is called 2 2 Data Set up Excel contains tabular data Data is entered as Text tables Table File conversions As per table, text table is converted to Parquet table or Avro Table Data Verification Using Cucumber’s Data Table Front End Validation Selenium
  • 23. ©ThoughtWorks 2019 Commercial in Confidence Oozie Test 23
  • 24. ● Worker Bee ● Oozie Client ● Junit Oozie Framework RUNS ON CLUSTER ❏ This directly works on any workflow ❏ Feedback cycle is quick ❏ Submits jobs and track its completion ❏ Easier to set up test data ❏ Helpful in debugging production issues Does the following 24©ThoughtWorks 2019 Commercial in Confidence Tools and framework
  • 25. Let’s Understand, At Workflow Level Once the test is completed, drop the table using Query Generator 25 Create Table Class row method Use Create method create(Table Name) Call oozie workflow & set properties Verify table headers, Records Verify Result Insert Records Run Workflow Identify Test table Create Schema Columns & Partitions ©ThoughtWorks 2019 Commercial in Confidence Data Set Up
  • 26. Testing Challenges 2 6 ©ThoughtWorks 2019 Commercial in Confidence
  • 27. Challenges • Big Data Pipelines can take huge amount of time • Importance for cluster environment for testing • Support for different file formats (like parquet, avro, text) • To test only few workflows • To maintain test data set 2 7
  • 28. Challenges • Testing Migration scenarios...in case if file format is preferred • Cluster performance issues • Hive-impala service issues • Incorporating schema changes in Automation test set and table set • Managing partition format as well 2 8
  • 29. Issues Caught In Testing Spark has known issue if a partition exists but not directory, then it throws exception Spark 1.6 Issue Sorting Issues on columns, importance of secondary sort Report Columns Query performance decreases Small number of large files in HDFS 29©ThoughtWorks 2019 Commercial in Confidence
  • 30. ©ThoughtWorks 2019 Commercial in Confidence PRIYANKA RAWAT QUALITY ANALYST prawat@thoughtworks.com | thoughtworks.com 30 THANK YOU