SlideShare a Scribd company logo
1 of 56
TESTING BIG DATA
SOLUTIONS FAST AND
FURIOUSLY
ABOUT ME
Dmitriy Sobko
Lead Software Test
Automation Engineer
EPAM
dmitriy.sobko@gmail.com
AGENDA
• Big Data
• BI / ETL
• Cloud
• Pipeline example
• Testing concepts
• Framework example
First, we had data. Now
we have big data.
The more data there is,
the more you know about
things and the sharper
your decisions become
WHAT IS BIG DATA
BUSINESS INTELLIGENCE (BI)
• Know your data to make better
decisions
• Set of practices, architectures
and technologies for
gathering, processing and
analyzing the data
BI. CLOSER VIEW
• Daily transactions and correspondences are
recorded
• Records are collected in databases
• Data are processed and transformed into
usable information
• Information is analyzed to generate insight
ETL
• Extracts data from the multiple
and disparate source systems
such as records databases
• Transforms this data into usable
information for decision makers
• Loads the data into data
warehouses, from which end-
users can readily extract usable
data for query and analysis
MOVING TO
CLOUD
https://www.alooma.com/blog/best-practices-for-migrating-data-from-on-prem-to-cloud
Worldwide Cloud IT Infrastructure Market Forecast
INPUT CSV
STAGING TABLE
TARGET TABLE
REPORT
Amount of Spotify’s Delivered Events over time
https://labs.spotify.com/2016/02/25/spotifys-event-delivery-the-road-to-the-cloud-part-i/
TEST TYPES
Accuracy Testing
Completeness Testing
Data Validation Testing
Metadata Testing
Performance Testing
DWHACCURACY TESTING
It checks whether the data is accurately transformed
and loaded from the source to the data warehouse
DWHCOMPLETENESS TESTING
This verifies whether all the data from the source are
loaded into the data warehouse
DATA VALIDATION TESTING
This assesses whether the values of the data post-
transformation are the same as their expected values
with respect to the source values
METADATA TESTING
This checks whether data retains its integrity up to the
metadata level — that is, its length, indexes,
constraints, and type
PERFORMANCE TESTING
• How long it takes to process streaming data and batch
data
• How long reports/datamarts/data feeds are calculated
• SLA
TEST APPROACHES
• Test on real data
• Test code with mocks/stubs
TEST ON REAL DATA
DWHTEST ON MOCKS/STUBS
MIXTURE OF
BOTH
APPROACHES
UNIT TESTS
"WordCount" should "work" in {
JobTest[com.spotify.scio.examples.WordCount.type]
.args("--input=in.txt", "--output=out.txt")
.input(TextIO("in.txt"), inData)
.output(TextIO("out.txt")) {
coll => coll should
containInAnyOrder(expected) ()
}
.run()
}
Check that method correctly process input data file
INTEGRATION TESTS
val stream = testStreamOf[GameActionInfo]
.advanceWatermarkTo(bTime) // add some elements ahead of
the watermark
.addElements( event(blue1, 3, Duration.standardSeconds(3)),
event(blue2, 2, Duration.standardMinutes(1)),
event(red1, 3, Duration.standardSeconds(22))
) // The watermark advances slightly, but not past the end of
the window
.advanceWatermarkTo(bTime.plus(Duration.standardMinutes(3))
)
Check that method correctly read data from streaming pipeline
ACCEPTANCE TESTS
• Make each test self-sufficient and
independent
• Rely on data contract, not
implementation
• Assert data as fully as possible
TESTS SHOULD BE
•Stable
•Resistant to constant
code changes
•Fast
•Extensible
•Easily supported
TECHNOLOGY
STACK
KOTLIN
Kotlin is a general purpose, open
source, statically typed “pragmatic”
programming language for the JVM
that combines object-oriented and
functional programming features.
It is focused on interoperability, safety,
clarity, and tooling support.
SPRING
Spring Boot makes it easy to create
stand-alone, production-grade Spring
based applications that you can “just
run”.
The same for testing frameworks -
you can get started with minimum
fuss and with very little pre-
configuration.
CUCUMBER
Cucumber is a software tool to run
automated tests written in a behavior-
driven development (BDD) style.
Central to the Cucumber BDD
approach is its plain language parser
called Gherkin. It allows expected
software behaviors to be specified in
a logical language that customers can
understand.
GRADLE
Gradle is an open-source build
automation tool focused on flexibility
and performance.
Gradle build scripts are written using
a Groovy or Kotlin DSL.
COURGETTE TEST RUNNER
Courgette Test Runner is an
extension of Cucumber-JVM with
added capabilities to run Cucumber
tests in parallel on a feature level or
on a scenario level.
CODE
HOW AUTOTEST LOOKS LIKE
Feature: River project test feature
Scenario: Check Alpha feed
Given I check Alpha name field is correct
And I check Alpha views field is correct
And I check Alpha xViews field is correct
And I check Alpha yViews field is correct
And I check Alpha otherViews field is correct
And I check Alpha reportDate field is correct
Scenario: Check Beta feed
Given I check Beta passName field is correct
And I check Beta views field is correct
And I check Beta channelName field is correct
And I check Beta reportDate field is correct
HOW CODE LOOKS LIKE
@Given("^I check Alpha views field is correct$")
fun assertAlphaViewsField() {
service.checkAlphaViewsField()
}
fun checkAlphaViewsField() =
execCheckCountQuery(ALPHA_VIEWS_FIELD)
HOW RUNNER LOOKS LIKE
@RunWith(Courgette::class)
@CourgetteOptions(threads = 4,
runLevel = CourgetteRunLevel.FEATURE,
rerunFailedScenarios = false,
cucumberOptions = CucumberOptions(features =
arrayOf("resources/features"),
glue = arrayOf("com.dsobko.test"),
tags = arrayOf("@Ready", "~@Bug"),
plugin = arrayOf("pretty",
"html:build/cucumber-report")))
object CucumberFeaturesRunner
TEST REPORT
ALTERNATIVE SOLUTIONS
LINKS
https://labs.spotify.com/2016/03/10/spotifys-event-
delivery-the-road-to-the-cloud-part-iii/
https://kotlinlang.org/
https://spring.io/projects/spring-boot
https://cucumber.io/
THANKS

More Related Content

Similar to Testing Big Data solutions fast and furiously

Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)Neotys_Partner
 
How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP TestingRTTS
 
How to create custom dashboards in Elastic Search / Kibana with Performance V...
How to create custom dashboards in Elastic Search / Kibana with Performance V...How to create custom dashboards in Elastic Search / Kibana with Performance V...
How to create custom dashboards in Elastic Search / Kibana with Performance V...PerformanceVision (previously SecurActive)
 
Modernizing Testing as Apps Re-Architect
Modernizing Testing as Apps Re-ArchitectModernizing Testing as Apps Re-Architect
Modernizing Testing as Apps Re-ArchitectDevOps.com
 
StarWest 2019 - End to end testing: Stupid or Legit?
StarWest 2019 - End to end testing: Stupid or Legit?StarWest 2019 - End to end testing: Stupid or Legit?
StarWest 2019 - End to end testing: Stupid or Legit?mabl
 
Software Quality and Test Strategies for Ruby and Rails Applications
Software Quality and Test Strategies for Ruby and Rails ApplicationsSoftware Quality and Test Strategies for Ruby and Rails Applications
Software Quality and Test Strategies for Ruby and Rails ApplicationsBhavin Javia
 
Cerberus : Framework for Manual and Automated Testing (Web Application)
Cerberus : Framework for Manual and Automated Testing (Web Application)Cerberus : Framework for Manual and Automated Testing (Web Application)
Cerberus : Framework for Manual and Automated Testing (Web Application)CIVEL Benoit
 
Cerberus_Presentation1
Cerberus_Presentation1Cerberus_Presentation1
Cerberus_Presentation1CIVEL Benoit
 
The Magic Of Application Lifecycle Management In Vs Public
The Magic Of Application Lifecycle Management In Vs PublicThe Magic Of Application Lifecycle Management In Vs Public
The Magic Of Application Lifecycle Management In Vs PublicDavid Solivan
 
Measure() or die()
Measure() or die() Measure() or die()
Measure() or die() LivePerson
 
Beginners overview of automated testing with Rspec
Beginners overview of automated testing with RspecBeginners overview of automated testing with Rspec
Beginners overview of automated testing with Rspecjeffrey1ross
 
The Mechanics of Testing Large Data Pipelines
The Mechanics of Testing Large Data PipelinesThe Mechanics of Testing Large Data Pipelines
The Mechanics of Testing Large Data PipelinesC4Media
 
Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...DataWorks Summit
 
Neotys PAC 2018 - Tingting Zong
Neotys PAC 2018 - Tingting ZongNeotys PAC 2018 - Tingting Zong
Neotys PAC 2018 - Tingting ZongNeotys_Partner
 
Implement Test Harness For Streaming Data Pipelines
Implement Test Harness For Streaming Data PipelinesImplement Test Harness For Streaming Data Pipelines
Implement Test Harness For Streaming Data PipelinesKnoldus Inc.
 
ShwetaKumar_ETLBITesting_3.7yr_faridabad
ShwetaKumar_ETLBITesting_3.7yr_faridabadShwetaKumar_ETLBITesting_3.7yr_faridabad
ShwetaKumar_ETLBITesting_3.7yr_faridabadshweta kumar
 

Similar to Testing Big Data solutions fast and furiously (20)

Resume sailaja
Resume sailajaResume sailaja
Resume sailaja
 
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
 
How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP Testing
 
Test Automation for Data Warehouses
Test Automation for Data Warehouses Test Automation for Data Warehouses
Test Automation for Data Warehouses
 
How to create custom dashboards in Elastic Search / Kibana with Performance V...
How to create custom dashboards in Elastic Search / Kibana with Performance V...How to create custom dashboards in Elastic Search / Kibana with Performance V...
How to create custom dashboards in Elastic Search / Kibana with Performance V...
 
Modernizing Testing as Apps Re-Architect
Modernizing Testing as Apps Re-ArchitectModernizing Testing as Apps Re-Architect
Modernizing Testing as Apps Re-Architect
 
StarWest 2019 - End to end testing: Stupid or Legit?
StarWest 2019 - End to end testing: Stupid or Legit?StarWest 2019 - End to end testing: Stupid or Legit?
StarWest 2019 - End to end testing: Stupid or Legit?
 
Software Quality and Test Strategies for Ruby and Rails Applications
Software Quality and Test Strategies for Ruby and Rails ApplicationsSoftware Quality and Test Strategies for Ruby and Rails Applications
Software Quality and Test Strategies for Ruby and Rails Applications
 
Cerberus : Framework for Manual and Automated Testing (Web Application)
Cerberus : Framework for Manual and Automated Testing (Web Application)Cerberus : Framework for Manual and Automated Testing (Web Application)
Cerberus : Framework for Manual and Automated Testing (Web Application)
 
Cerberus_Presentation1
Cerberus_Presentation1Cerberus_Presentation1
Cerberus_Presentation1
 
The Magic Of Application Lifecycle Management In Vs Public
The Magic Of Application Lifecycle Management In Vs PublicThe Magic Of Application Lifecycle Management In Vs Public
The Magic Of Application Lifecycle Management In Vs Public
 
Measure() or die()
Measure() or die()Measure() or die()
Measure() or die()
 
Measure() or die()
Measure() or die() Measure() or die()
Measure() or die()
 
Beginners overview of automated testing with Rspec
Beginners overview of automated testing with RspecBeginners overview of automated testing with Rspec
Beginners overview of automated testing with Rspec
 
The Mechanics of Testing Large Data Pipelines
The Mechanics of Testing Large Data PipelinesThe Mechanics of Testing Large Data Pipelines
The Mechanics of Testing Large Data Pipelines
 
Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...
 
Neotys PAC 2018 - Tingting Zong
Neotys PAC 2018 - Tingting ZongNeotys PAC 2018 - Tingting Zong
Neotys PAC 2018 - Tingting Zong
 
Implement Test Harness For Streaming Data Pipelines
Implement Test Harness For Streaming Data PipelinesImplement Test Harness For Streaming Data Pipelines
Implement Test Harness For Streaming Data Pipelines
 
ShwetaKumar_ETLBITesting_3.7yr_faridabad
ShwetaKumar_ETLBITesting_3.7yr_faridabadShwetaKumar_ETLBITesting_3.7yr_faridabad
ShwetaKumar_ETLBITesting_3.7yr_faridabad
 
Taming the shrew Power BI
Taming the shrew Power BITaming the shrew Power BI
Taming the shrew Power BI
 

More from Katherine Golovinova

Contract-based Testing Approach as a Tool for Shift Lef
Contract-based Testing Approach as a Tool for Shift LefContract-based Testing Approach as a Tool for Shift Lef
Contract-based Testing Approach as a Tool for Shift LefKatherine Golovinova
 
Speed up application testing with azure container instances
Speed up application testing with azure container instancesSpeed up application testing with azure container instances
Speed up application testing with azure container instancesKatherine Golovinova
 
Analyzing application activities with KSQL and Elasticsearch
Analyzing application activities with KSQL and ElasticsearchAnalyzing application activities with KSQL and Elasticsearch
Analyzing application activities with KSQL and ElasticsearchKatherine Golovinova
 
"Fast & Fail in real life of DevTestSecOps"
"Fast & Fail in real life of DevTestSecOps""Fast & Fail in real life of DevTestSecOps"
"Fast & Fail in real life of DevTestSecOps"Katherine Golovinova
 
Geodistributed databases - what, how, and why?
Geodistributed databases - what, how, and why?Geodistributed databases - what, how, and why?
Geodistributed databases - what, how, and why?Katherine Golovinova
 
COSMOS DB - geodistributed database for anyone
COSMOS DB - geodistributed database for anyoneCOSMOS DB - geodistributed database for anyone
COSMOS DB - geodistributed database for anyoneKatherine Golovinova
 
Migrating from a monolith to microservices – is it worth it?
Migrating from a monolith to microservices – is it worth it?Migrating from a monolith to microservices – is it worth it?
Migrating from a monolith to microservices – is it worth it?Katherine Golovinova
 
Azure Functions - the evolution of microservices platform or marketing gibber...
Azure Functions - the evolution of microservices platform or marketing gibber...Azure Functions - the evolution of microservices platform or marketing gibber...
Azure Functions - the evolution of microservices platform or marketing gibber...Katherine Golovinova
 
Gatling and Page Object: a way to performance testing
Gatling and Page Object: a way to performance testingGatling and Page Object: a way to performance testing
Gatling and Page Object: a way to performance testingKatherine Golovinova
 
Automation of Security scanning easy or cheese
Automation of Security scanning easy or cheeseAutomation of Security scanning easy or cheese
Automation of Security scanning easy or cheeseKatherine Golovinova
 
Gradle plugins for Test Automation
Gradle plugins for Test AutomationGradle plugins for Test Automation
Gradle plugins for Test AutomationKatherine Golovinova
 
Automation world under the DevTestSecOps umbrella
Automation world under the DevTestSecOps umbrellaAutomation world under the DevTestSecOps umbrella
Automation world under the DevTestSecOps umbrellaKatherine Golovinova
 
"Disaster Recovery in Azure" by Viktor Kocherha
"Disaster Recovery in Azure" by Viktor Kocherha"Disaster Recovery in Azure" by Viktor Kocherha
"Disaster Recovery in Azure" by Viktor KocherhaKatherine Golovinova
 
"Certified Kubernetes Administrator Exam – how it was" by Andrii Fedenishin
"Certified Kubernetes Administrator Exam – how it was" by Andrii Fedenishin"Certified Kubernetes Administrator Exam – how it was" by Andrii Fedenishin
"Certified Kubernetes Administrator Exam – how it was" by Andrii FedenishinKatherine Golovinova
 
"Modern CI/CD" by Dmytro Batiievskyi
"Modern CI/CD" by Dmytro Batiievskyi"Modern CI/CD" by Dmytro Batiievskyi
"Modern CI/CD" by Dmytro BatiievskyiKatherine Golovinova
 
EPAM DevOps community meetup: Building CI/CD for microservice architecture
EPAM DevOps community meetup: Building CI/CD for microservice architectureEPAM DevOps community meetup: Building CI/CD for microservice architecture
EPAM DevOps community meetup: Building CI/CD for microservice architectureKatherine Golovinova
 
EPAM DevOps community meetup: Designing bare metal Kubernetes clusters
EPAM DevOps community meetup: Designing bare metal Kubernetes clustersEPAM DevOps community meetup: Designing bare metal Kubernetes clusters
EPAM DevOps community meetup: Designing bare metal Kubernetes clustersKatherine Golovinova
 
Hosting Microservices in Microsoft Azure
Hosting Microservices in Microsoft AzureHosting Microservices in Microsoft Azure
Hosting Microservices in Microsoft AzureKatherine Golovinova
 
Infrastructure as Code for Azure: ARM or Terraform?
Infrastructure as Code for Azure: ARM or Terraform?Infrastructure as Code for Azure: ARM or Terraform?
Infrastructure as Code for Azure: ARM or Terraform?Katherine Golovinova
 
Azure IoT Hub: what is it and why we select other solution (production projec...
Azure IoT Hub: what is it and why we select other solution (production projec...Azure IoT Hub: what is it and why we select other solution (production projec...
Azure IoT Hub: what is it and why we select other solution (production projec...Katherine Golovinova
 

More from Katherine Golovinova (20)

Contract-based Testing Approach as a Tool for Shift Lef
Contract-based Testing Approach as a Tool for Shift LefContract-based Testing Approach as a Tool for Shift Lef
Contract-based Testing Approach as a Tool for Shift Lef
 
Speed up application testing with azure container instances
Speed up application testing with azure container instancesSpeed up application testing with azure container instances
Speed up application testing with azure container instances
 
Analyzing application activities with KSQL and Elasticsearch
Analyzing application activities with KSQL and ElasticsearchAnalyzing application activities with KSQL and Elasticsearch
Analyzing application activities with KSQL and Elasticsearch
 
"Fast & Fail in real life of DevTestSecOps"
"Fast & Fail in real life of DevTestSecOps""Fast & Fail in real life of DevTestSecOps"
"Fast & Fail in real life of DevTestSecOps"
 
Geodistributed databases - what, how, and why?
Geodistributed databases - what, how, and why?Geodistributed databases - what, how, and why?
Geodistributed databases - what, how, and why?
 
COSMOS DB - geodistributed database for anyone
COSMOS DB - geodistributed database for anyoneCOSMOS DB - geodistributed database for anyone
COSMOS DB - geodistributed database for anyone
 
Migrating from a monolith to microservices – is it worth it?
Migrating from a monolith to microservices – is it worth it?Migrating from a monolith to microservices – is it worth it?
Migrating from a monolith to microservices – is it worth it?
 
Azure Functions - the evolution of microservices platform or marketing gibber...
Azure Functions - the evolution of microservices platform or marketing gibber...Azure Functions - the evolution of microservices platform or marketing gibber...
Azure Functions - the evolution of microservices platform or marketing gibber...
 
Gatling and Page Object: a way to performance testing
Gatling and Page Object: a way to performance testingGatling and Page Object: a way to performance testing
Gatling and Page Object: a way to performance testing
 
Automation of Security scanning easy or cheese
Automation of Security scanning easy or cheeseAutomation of Security scanning easy or cheese
Automation of Security scanning easy or cheese
 
Gradle plugins for Test Automation
Gradle plugins for Test AutomationGradle plugins for Test Automation
Gradle plugins for Test Automation
 
Automation world under the DevTestSecOps umbrella
Automation world under the DevTestSecOps umbrellaAutomation world under the DevTestSecOps umbrella
Automation world under the DevTestSecOps umbrella
 
"Disaster Recovery in Azure" by Viktor Kocherha
"Disaster Recovery in Azure" by Viktor Kocherha"Disaster Recovery in Azure" by Viktor Kocherha
"Disaster Recovery in Azure" by Viktor Kocherha
 
"Certified Kubernetes Administrator Exam – how it was" by Andrii Fedenishin
"Certified Kubernetes Administrator Exam – how it was" by Andrii Fedenishin"Certified Kubernetes Administrator Exam – how it was" by Andrii Fedenishin
"Certified Kubernetes Administrator Exam – how it was" by Andrii Fedenishin
 
"Modern CI/CD" by Dmytro Batiievskyi
"Modern CI/CD" by Dmytro Batiievskyi"Modern CI/CD" by Dmytro Batiievskyi
"Modern CI/CD" by Dmytro Batiievskyi
 
EPAM DevOps community meetup: Building CI/CD for microservice architecture
EPAM DevOps community meetup: Building CI/CD for microservice architectureEPAM DevOps community meetup: Building CI/CD for microservice architecture
EPAM DevOps community meetup: Building CI/CD for microservice architecture
 
EPAM DevOps community meetup: Designing bare metal Kubernetes clusters
EPAM DevOps community meetup: Designing bare metal Kubernetes clustersEPAM DevOps community meetup: Designing bare metal Kubernetes clusters
EPAM DevOps community meetup: Designing bare metal Kubernetes clusters
 
Hosting Microservices in Microsoft Azure
Hosting Microservices in Microsoft AzureHosting Microservices in Microsoft Azure
Hosting Microservices in Microsoft Azure
 
Infrastructure as Code for Azure: ARM or Terraform?
Infrastructure as Code for Azure: ARM or Terraform?Infrastructure as Code for Azure: ARM or Terraform?
Infrastructure as Code for Azure: ARM or Terraform?
 
Azure IoT Hub: what is it and why we select other solution (production projec...
Azure IoT Hub: what is it and why we select other solution (production projec...Azure IoT Hub: what is it and why we select other solution (production projec...
Azure IoT Hub: what is it and why we select other solution (production projec...
 

Recently uploaded

Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxhumanexperienceaaa
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 

Recently uploaded (20)

Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 

Testing Big Data solutions fast and furiously

  • 1. TESTING BIG DATA SOLUTIONS FAST AND FURIOUSLY
  • 2.
  • 3. ABOUT ME Dmitriy Sobko Lead Software Test Automation Engineer EPAM dmitriy.sobko@gmail.com
  • 4. AGENDA • Big Data • BI / ETL • Cloud • Pipeline example • Testing concepts • Framework example
  • 5. First, we had data. Now we have big data. The more data there is, the more you know about things and the sharper your decisions become WHAT IS BIG DATA
  • 6.
  • 7. BUSINESS INTELLIGENCE (BI) • Know your data to make better decisions • Set of practices, architectures and technologies for gathering, processing and analyzing the data
  • 8. BI. CLOSER VIEW • Daily transactions and correspondences are recorded • Records are collected in databases • Data are processed and transformed into usable information • Information is analyzed to generate insight
  • 9. ETL • Extracts data from the multiple and disparate source systems such as records databases • Transforms this data into usable information for decision makers • Loads the data into data warehouses, from which end- users can readily extract usable data for query and analysis
  • 11.
  • 12.
  • 13.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 24. Amount of Spotify’s Delivered Events over time https://labs.spotify.com/2016/02/25/spotifys-event-delivery-the-road-to-the-cloud-part-i/
  • 25.
  • 26. TEST TYPES Accuracy Testing Completeness Testing Data Validation Testing Metadata Testing Performance Testing
  • 27. DWHACCURACY TESTING It checks whether the data is accurately transformed and loaded from the source to the data warehouse
  • 28. DWHCOMPLETENESS TESTING This verifies whether all the data from the source are loaded into the data warehouse
  • 29. DATA VALIDATION TESTING This assesses whether the values of the data post- transformation are the same as their expected values with respect to the source values
  • 30. METADATA TESTING This checks whether data retains its integrity up to the metadata level — that is, its length, indexes, constraints, and type
  • 31. PERFORMANCE TESTING • How long it takes to process streaming data and batch data • How long reports/datamarts/data feeds are calculated • SLA
  • 32.
  • 33.
  • 34. TEST APPROACHES • Test on real data • Test code with mocks/stubs
  • 35. TEST ON REAL DATA
  • 38. UNIT TESTS "WordCount" should "work" in { JobTest[com.spotify.scio.examples.WordCount.type] .args("--input=in.txt", "--output=out.txt") .input(TextIO("in.txt"), inData) .output(TextIO("out.txt")) { coll => coll should containInAnyOrder(expected) () } .run() } Check that method correctly process input data file
  • 39. INTEGRATION TESTS val stream = testStreamOf[GameActionInfo] .advanceWatermarkTo(bTime) // add some elements ahead of the watermark .addElements( event(blue1, 3, Duration.standardSeconds(3)), event(blue2, 2, Duration.standardMinutes(1)), event(red1, 3, Duration.standardSeconds(22)) ) // The watermark advances slightly, but not past the end of the window .advanceWatermarkTo(bTime.plus(Duration.standardMinutes(3)) ) Check that method correctly read data from streaming pipeline
  • 40. ACCEPTANCE TESTS • Make each test self-sufficient and independent • Rely on data contract, not implementation • Assert data as fully as possible
  • 41. TESTS SHOULD BE •Stable •Resistant to constant code changes •Fast •Extensible •Easily supported
  • 43. KOTLIN Kotlin is a general purpose, open source, statically typed “pragmatic” programming language for the JVM that combines object-oriented and functional programming features. It is focused on interoperability, safety, clarity, and tooling support.
  • 44. SPRING Spring Boot makes it easy to create stand-alone, production-grade Spring based applications that you can “just run”. The same for testing frameworks - you can get started with minimum fuss and with very little pre- configuration.
  • 45. CUCUMBER Cucumber is a software tool to run automated tests written in a behavior- driven development (BDD) style. Central to the Cucumber BDD approach is its plain language parser called Gherkin. It allows expected software behaviors to be specified in a logical language that customers can understand.
  • 46. GRADLE Gradle is an open-source build automation tool focused on flexibility and performance. Gradle build scripts are written using a Groovy or Kotlin DSL.
  • 47. COURGETTE TEST RUNNER Courgette Test Runner is an extension of Cucumber-JVM with added capabilities to run Cucumber tests in parallel on a feature level or on a scenario level.
  • 48. CODE
  • 49. HOW AUTOTEST LOOKS LIKE Feature: River project test feature Scenario: Check Alpha feed Given I check Alpha name field is correct And I check Alpha views field is correct And I check Alpha xViews field is correct And I check Alpha yViews field is correct And I check Alpha otherViews field is correct And I check Alpha reportDate field is correct Scenario: Check Beta feed Given I check Beta passName field is correct And I check Beta views field is correct And I check Beta channelName field is correct And I check Beta reportDate field is correct
  • 50. HOW CODE LOOKS LIKE @Given("^I check Alpha views field is correct$") fun assertAlphaViewsField() { service.checkAlphaViewsField() } fun checkAlphaViewsField() = execCheckCountQuery(ALPHA_VIEWS_FIELD)
  • 51. HOW RUNNER LOOKS LIKE @RunWith(Courgette::class) @CourgetteOptions(threads = 4, runLevel = CourgetteRunLevel.FEATURE, rerunFailedScenarios = false, cucumberOptions = CucumberOptions(features = arrayOf("resources/features"), glue = arrayOf("com.dsobko.test"), tags = arrayOf("@Ready", "~@Bug"), plugin = arrayOf("pretty", "html:build/cucumber-report"))) object CucumberFeaturesRunner
  • 54.