SlideShare a Scribd company logo
1 of 17
Phil Watt Oral
Thesis Presentation
Why is Test-Driven Development so Hard in
Analytics or Data Focussed Projects?
University of Melbourne, ISYS90111_2019_TM4
Outline
INTRODUCTION TO
THE PROBLEM SPACE
REVIEW OF THE
LITERATURE
METHODOLOGY RESULTS DISCUSSION AND
FURTHER WORK
Why is Test-Driven Development (TDD) so
hard to adopt within Data and Analytics
projects?
TDD is an established best practice in
software development, promising
benefit such as:
Reduced Cycle Time
Improved Developer Productivity
Reduced Production Defects
Observation that analytics and data
projects mostly do not use TDD, based
on:
Analytics/data management consulting and
delivery experience in 19 countries and 5
continents;
Working across hundreds of projects in this
domain
Concept validated with eight informal
interviews.
Purpose to shape research direction, before formal
data gathering began
Interviews with analytics leaders across 5 industry
segments:
•Two Chief Data Officers
•2 Enterprise Architects managing large analytics
programmes
•2 Heads of data engineering
•2 Analytics programme leaders in large
enterprises
•1 Advanced Analytics practice leader in a large
professional services organisations
Recognised
Challenges
from the
Literature
Review
• Software testing is focussed on program code: analytics is
focussed on data and information
• Analytics data volumes drive a testing context several
orders of magnitude greater than most software tests
• The valid combination of scenarios in general software
testing is limited, but for analytics can be virtually
unlimited
• Data warehousing testing continues after production
deployment (notably regression testing even when code is
not changed for some given data), unlike general software
testing
• Analytics outputs can be non-deterministic, especially for
predictions and Machine Learning use cases
• The combination of these reasons drives up the cost of
TDD and test automation in analytics
• Because of these constraints, developer or project
discipline may slip, providing lower test coverage and
increased defects/cycle time
Methodolo
gy
Mixed methods
Formal Interviews
A 6-page briefing pack supplied to
interviewees two weeks before the
interview
Audio or video recorded then
transcribed
Short online survey
Invitation only
Two questions
•Which of the challenges in the
previous slide do you recognise?
•How difficult were these
challenges to overcome?
Synthesis and Analysis
SURVEY
RESULTS
Who
Responded
?
Recognised Challenges
0
2
4
6
8
10
12
14
16
Testing
focused on
data, not
software
Analytics data
volumes drive
much large
testing
context
Limited valid
testing
scenarios for
software
testing, but
unlimited for
data
Data
Warehouse
Testing
continues in
production
Analytics tests
can be non-
deterministic
Combination
of these
reasons drives
up TDD costs
for analytics
Combination
of reasons can
drive poor
habits in
developers or
project
managers
Other
challenges
Other
challenge
s
• DWH can have complex logic related to delta processing,
historical delta etc which makes it even more difficult to
automate [testing]. Multiple source systems which can
inject a different type of data due to their own changes
make it even more complex.
• Capability to handle end-to-end complexity of
development task is rare
• 1. People with a software background may not
understand analytics. 2. DW bugs not fixed post
deployment. 3. DW not tested for other purposes. eg.
Marketing analytics.
• Dev Teams / Leaders don't think of testing in this way
• Analysts and Data Scientists rarely have the personality
or training to do TDD effectively.
Difficulty With Each Challenge
Testing focused on
data, not software
Analytics data
volumes drive
much large testing
context
Limited valid
testing scenarios
for software
testing, but
unlimited for data
Data Warehouse
Testing continues
in production
Analytics tests can
be non-
deterministic
Combination of
these reasons
drives up TDD
costs for analytics
Combination of
reasons can drive
poor habits in
developers or
project managers
INTERVIEW
RESULTS
About the
interviewe
es
14 individuals
12 with strong analytics
domain experience
• 4 Data Scientists
• 2 Data Engineers
• 4 Enterprise Analytics
Architects
• 2 Programme Managers
2 control interviews with
software engineering
backgrounds
5 Industry sectors
1 Public Sector
7 Professional Services (each
with experience across
multiple sectors)
2 Financial Services
1 Telco
1 Media
Interview Highlights
TDD advocates (n=4) stressed
the importance of ‘habit
forming’ to drive adoption
and benefits realisation
Everyone (n=14) recognised
the theoretical benefits of
TDD in Analytics
8 said benefits were subject to the
expected duration of a project– e.g.
one-off pieces of work would not
benefit
Some disagreement between
Data Scientists (n=4)
1 agnostic
2 relied on manual testing, arguing
that their work was mainly one-off
jobs
1 strongly advocated forming good
habits early, adding that test scope
could be limited for one off jobs, but
was still needed
Interviewee commentary
about the Recognised
Challenges (slide 4) was
broadly in line with the survey
results
All interviewees were invited to
complete the survey - 10 responded
8 survey respondents not interviewed,
but were invited to respond through
my LinkedIn network
DISCUSSION &
FURTHER WORK
Discussio
n
There is strong agreement between
survey respondents and
interviewees that TDD for analytics
is different and more complex than
for traditional software engineering
Although opinions vary on
why, there are some core
reasons identified
Some support for the idea that TDD
is best applied for longer term
projects, but should be avoided
when they are of short duration
Like the heuristic model from
Sambinelli et al. (2018) for
general software projects
A minority of interviewees stress
that TDD is always the right thing
for analytics, but success depends
upon:
Early, strong habit forming
around TDD practices
Careful design of the scope of
TDD
I find this minority view compelling
But this may be confirmation
bias on my part
Further work
With more time I would improve the
accuracy of the transcriptions, to enable
better text analytics and concept
matching across interviews
A range of Test Automation case
studies over a matrix of scenarios
Where TDD is used extensively
Where other test automation is used instead of
TDD
Where manual testing is used
For project durations that are short, medium or
long
For systems that are simple through to complex
Analysis of the impact of other factors
that could drive productivity, cycle time
and quality:
Frameworks
Low-code development tools
Open Source vs proprietary tools
Reference
s
Professional, viewed 8 September 2019, <https://learning-oreilly-
com.ezp.lib.unimelb.edu.au/library/view/agile-analytics-a/9780321669575/ch07.html>.
• Dzakovic, M 2016, ‘Industrial Application of Automated Regression Testing in Test-Driven
ETL Development - IEEE Conference Publication’, in 2016 IEEE International Conference on
Software Maintenance and Evolution (ICSME), Institute of Electrical and Electronics Engineers,
viewed 8 September 2019, <https://ieeexplore-ieee-
org.ezp.lib.unimelb.edu.au/document/7816512?arnumber=7816512&SID=EBSCO:edseee>.
• Golfarelli, M & Rizzi, S 2009, ‘A comprehensive approach to data warehouse
testing’, Proceeding of the ACM twelfth international workshop on Data warehousing and
OLAP - DOLAP ’09, viewed 7 September 2019, <https://dl-acm-
org.ezp.lib.unimelb.edu.au/citation.cfm?id=1651295>.
• Ivo, AAS, Guerra, EM, Porto, SM, Choma, J & Quiles, MG 2018, ‘An approach for applying
Test-Driven Development (TDD) in the development of randomized algorithms’, Journal of
Software Engineering Research and Development, vol. 6, no. 1, viewed 13 September 2019,
<https://doaj.org/article/8be2f4e3709747e68c04537838b3b314?>.
• Krawatzeck, R, Tetzner, A & Dinter, B 2015, An Evaluation of Open Source Unit Testing Tools
Suitable for Data Warehouse Testing, p. 22.
• Rencberoglu, E 2019, ‘Fundamental Techniques of Feature Engineering for Machine
Learning’, Towards Data Science, April, Towards Data Science, viewed 28 September 2019,
<https://towardsdatascience.com/feature-engineering-for-machine-learning-
3a5e293a5114>.
• Sambinelli, F, Ursini, EL, Borges, MAF & Martins, PS 2018, ‘Modeling and Performance
Analysis of Scrumban with Test-Driven Development Using Discrete Event and Fuzzy Logic -
IEEE Conference Publication’, in 2018 6th International Conference in Software Engineering
Research and Innovation (CONISOFT), IEEE, viewed 14 September 2019,
<https://ieeexplore-ieee-
org.ezp.lib.unimelb.edu.au/document/8645924?arnumber=8645924&SID=EBSCO:edseee>.
• Schutte, S, Ariyachandra, T & Frolick, M 2011, ‘Test-Driven Development of Data
Warehouses’, International Journal of Business Intelligence Research, vol. 2, no. 1, pp. 64–
73, viewed 8 September 2019,

More Related Content

What's hot

Lionel Briand ICSM 2011 Keynote
Lionel Briand ICSM 2011 KeynoteLionel Briand ICSM 2011 Keynote
Lionel Briand ICSM 2011 KeynoteICSM 2011
 
Survey Based Reviewof Elicitation Problems
Survey Based Reviewof Elicitation ProblemsSurvey Based Reviewof Elicitation Problems
Survey Based Reviewof Elicitation ProblemsIJERA Editor
 
Exploratory testing STEW 2016
Exploratory testing STEW 2016Exploratory testing STEW 2016
Exploratory testing STEW 2016Per Runeson
 
Design Thinking for Requirements Engineering
Design Thinking for Requirements EngineeringDesign Thinking for Requirements Engineering
Design Thinking for Requirements EngineeringDaniel Mendez
 
1010 guide–a simple framework for
1010 guide–a simple framework for1010 guide–a simple framework for
1010 guide–a simple framework forijseajournal
 
An Exploratory Study on Technology Transfer in Software Engineering
An Exploratory Study on Technology Transfer in Software EngineeringAn Exploratory Study on Technology Transfer in Software Engineering
An Exploratory Study on Technology Transfer in Software EngineeringDaniel Mendez
 
Pathways to Technology Transfer and Adoption: Achievements and Challenges
Pathways to Technology Transfer and Adoption: Achievements and ChallengesPathways to Technology Transfer and Adoption: Achievements and Challenges
Pathways to Technology Transfer and Adoption: Achievements and ChallengesTao Xie
 
Model-Based Software Engineering: A Multiple-Case Study on Challenges and Dev...
Model-Based Software Engineering: A Multiple-Case Study on Challenges and Dev...Model-Based Software Engineering: A Multiple-Case Study on Challenges and Dev...
Model-Based Software Engineering: A Multiple-Case Study on Challenges and Dev...Rodi Jolak
 
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...Ali Ouni
 
Software engineering fundamental
Software engineering fundamentalSoftware engineering fundamental
Software engineering fundamentalDr.Bechoo Lal
 
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven ResearchISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven ResearchTao Xie
 
Comparison of release engineering practices in a large mature company and a s...
Comparison of release engineering practices in a large mature company and a s...Comparison of release engineering practices in a large mature company and a s...
Comparison of release engineering practices in a large mature company and a s...Eero Laukkanen
 
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...Tao Xie
 
ICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code Review
ICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code ReviewICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code Review
ICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code ReviewAli Ouni
 
Bottom-up Adoption of Continuous Delivery in a Stage-gate Managed Software Or...
Bottom-up Adoption of Continuous Delivery in a Stage-gate Managed Software Or...Bottom-up Adoption of Continuous Delivery in a Stage-gate Managed Software Or...
Bottom-up Adoption of Continuous Delivery in a Stage-gate Managed Software Or...Eero Laukkanen
 
Paul Gerrard - Advancing Testing Using Axioms - EuroSTAR 2010
Paul Gerrard - Advancing Testing Using Axioms - EuroSTAR 2010Paul Gerrard - Advancing Testing Using Axioms - EuroSTAR 2010
Paul Gerrard - Advancing Testing Using Axioms - EuroSTAR 2010TEST Huddle
 
ICPC 2015 - MIP Introduction
ICPC 2015 - MIP IntroductionICPC 2015 - MIP Introduction
ICPC 2015 - MIP IntroductionRocco Oliveto
 
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcess
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcessEvolvea Frameworkfor SelectingPrime Software DevelopmentProcess
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcessIJMER
 
Research paperV1
Research paperV1Research paperV1
Research paperV1expertexh
 

What's hot (20)

Lionel Briand ICSM 2011 Keynote
Lionel Briand ICSM 2011 KeynoteLionel Briand ICSM 2011 Keynote
Lionel Briand ICSM 2011 Keynote
 
Survey Based Reviewof Elicitation Problems
Survey Based Reviewof Elicitation ProblemsSurvey Based Reviewof Elicitation Problems
Survey Based Reviewof Elicitation Problems
 
Exploratory testing STEW 2016
Exploratory testing STEW 2016Exploratory testing STEW 2016
Exploratory testing STEW 2016
 
Design Thinking for Requirements Engineering
Design Thinking for Requirements EngineeringDesign Thinking for Requirements Engineering
Design Thinking for Requirements Engineering
 
1010 guide–a simple framework for
1010 guide–a simple framework for1010 guide–a simple framework for
1010 guide–a simple framework for
 
An Exploratory Study on Technology Transfer in Software Engineering
An Exploratory Study on Technology Transfer in Software EngineeringAn Exploratory Study on Technology Transfer in Software Engineering
An Exploratory Study on Technology Transfer in Software Engineering
 
Pathways to Technology Transfer and Adoption: Achievements and Challenges
Pathways to Technology Transfer and Adoption: Achievements and ChallengesPathways to Technology Transfer and Adoption: Achievements and Challenges
Pathways to Technology Transfer and Adoption: Achievements and Challenges
 
Model-Based Software Engineering: A Multiple-Case Study on Challenges and Dev...
Model-Based Software Engineering: A Multiple-Case Study on Challenges and Dev...Model-Based Software Engineering: A Multiple-Case Study on Challenges and Dev...
Model-Based Software Engineering: A Multiple-Case Study on Challenges and Dev...
 
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...
 
Software engineering fundamental
Software engineering fundamentalSoftware engineering fundamental
Software engineering fundamental
 
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven ResearchISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
 
Comparison of release engineering practices in a large mature company and a s...
Comparison of release engineering practices in a large mature company and a s...Comparison of release engineering practices in a large mature company and a s...
Comparison of release engineering practices in a large mature company and a s...
 
A systemic routine of thinking for engineers
A systemic routine of thinking for engineersA systemic routine of thinking for engineers
A systemic routine of thinking for engineers
 
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
 
ICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code Review
ICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code ReviewICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code Review
ICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code Review
 
Bottom-up Adoption of Continuous Delivery in a Stage-gate Managed Software Or...
Bottom-up Adoption of Continuous Delivery in a Stage-gate Managed Software Or...Bottom-up Adoption of Continuous Delivery in a Stage-gate Managed Software Or...
Bottom-up Adoption of Continuous Delivery in a Stage-gate Managed Software Or...
 
Paul Gerrard - Advancing Testing Using Axioms - EuroSTAR 2010
Paul Gerrard - Advancing Testing Using Axioms - EuroSTAR 2010Paul Gerrard - Advancing Testing Using Axioms - EuroSTAR 2010
Paul Gerrard - Advancing Testing Using Axioms - EuroSTAR 2010
 
ICPC 2015 - MIP Introduction
ICPC 2015 - MIP IntroductionICPC 2015 - MIP Introduction
ICPC 2015 - MIP Introduction
 
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcess
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcessEvolvea Frameworkfor SelectingPrime Software DevelopmentProcess
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcess
 
Research paperV1
Research paperV1Research paperV1
Research paperV1
 

Similar to Why is Test Driven Development for Analytics or Data Projects so Hard?

Mastering BDD - Eran Kinsbruner Workshop Quest 2018
Mastering BDD - Eran Kinsbruner Workshop Quest 2018Mastering BDD - Eran Kinsbruner Workshop Quest 2018
Mastering BDD - Eran Kinsbruner Workshop Quest 2018Perfecto Mobile
 
Reducing Time Spent On Requirements
Reducing Time Spent On RequirementsReducing Time Spent On Requirements
Reducing Time Spent On RequirementsByron Workman
 
Agile Requirements Engineering Practices: An Empirical Study
Agile Requirements Engineering Practices: An Empirical StudyAgile Requirements Engineering Practices: An Empirical Study
Agile Requirements Engineering Practices: An Empirical StudyAsanka Dilruk
 
The challenge of putting software sustainability research into practice
The challenge of putting software sustainability research into practiceThe challenge of putting software sustainability research into practice
The challenge of putting software sustainability research into practiceGreen Software Development
 
Adopting Cloud Testing for Continuous Delivery
Adopting Cloud Testing for Continuous DeliveryAdopting Cloud Testing for Continuous Delivery
Adopting Cloud Testing for Continuous DeliverySOASTA
 
Robert Mc Geachy Common Pitfalls Agile
Robert Mc Geachy Common Pitfalls AgileRobert Mc Geachy Common Pitfalls Agile
Robert Mc Geachy Common Pitfalls AgileRobert McGeachy
 
Why is Test Driven Development so hard to implement in an analytics platform?
Why is Test Driven Development so hard to implement in an analytics platform?Why is Test Driven Development so hard to implement in an analytics platform?
Why is Test Driven Development so hard to implement in an analytics platform?Phil Watt
 
Agile And Open Development
Agile And Open DevelopmentAgile And Open Development
Agile And Open DevelopmentRoss Gardler
 
Research-Based Innovation with Industry: Project Experience and Lessons Learned
Research-Based Innovation with Industry: Project Experience and Lessons LearnedResearch-Based Innovation with Industry: Project Experience and Lessons Learned
Research-Based Innovation with Industry: Project Experience and Lessons LearnedLionel Briand
 
Agile software development and challenges
Agile software development and challengesAgile software development and challenges
Agile software development and challengeseSAT Publishing House
 
Adopting Cloud Testing for Continuous Delivery, with the premier global provi...
Adopting Cloud Testing for Continuous Delivery, with the premier global provi...Adopting Cloud Testing for Continuous Delivery, with the premier global provi...
Adopting Cloud Testing for Continuous Delivery, with the premier global provi...SOASTA
 
The 't' in tel software development for tel research problems, pitfalls, and ...
The 't' in tel software development for tel research problems, pitfalls, and ...The 't' in tel software development for tel research problems, pitfalls, and ...
The 't' in tel software development for tel research problems, pitfalls, and ...Roland Klemke
 
Agile software development and challenges
Agile software development and challengesAgile software development and challenges
Agile software development and challengeseSAT Journals
 
Integrating the users logic into Requirements Engineering
Integrating the users logic into Requirements EngineeringIntegrating the users logic into Requirements Engineering
Integrating the users logic into Requirements EngineeringSofia Ouhbi
 
Performance Testing Cloud-Based Systems
Performance Testing Cloud-Based SystemsPerformance Testing Cloud-Based Systems
Performance Testing Cloud-Based SystemsTechWell
 
Pm soln9416141129710
Pm soln9416141129710Pm soln9416141129710
Pm soln9416141129710Nikhil Todkar
 

Similar to Why is Test Driven Development for Analytics or Data Projects so Hard? (20)

Mastering BDD - Eran Kinsbruner Workshop Quest 2018
Mastering BDD - Eran Kinsbruner Workshop Quest 2018Mastering BDD - Eran Kinsbruner Workshop Quest 2018
Mastering BDD - Eran Kinsbruner Workshop Quest 2018
 
fe.docx
fe.docxfe.docx
fe.docx
 
Reducing Time Spent On Requirements
Reducing Time Spent On RequirementsReducing Time Spent On Requirements
Reducing Time Spent On Requirements
 
Agile Requirements Engineering Practices: An Empirical Study
Agile Requirements Engineering Practices: An Empirical StudyAgile Requirements Engineering Practices: An Empirical Study
Agile Requirements Engineering Practices: An Empirical Study
 
The challenge of putting software sustainability research into practice
The challenge of putting software sustainability research into practiceThe challenge of putting software sustainability research into practice
The challenge of putting software sustainability research into practice
 
Adopting Cloud Testing for Continuous Delivery
Adopting Cloud Testing for Continuous DeliveryAdopting Cloud Testing for Continuous Delivery
Adopting Cloud Testing for Continuous Delivery
 
Robert Mc Geachy Common Pitfalls Agile
Robert Mc Geachy Common Pitfalls AgileRobert Mc Geachy Common Pitfalls Agile
Robert Mc Geachy Common Pitfalls Agile
 
Why is Test Driven Development so hard to implement in an analytics platform?
Why is Test Driven Development so hard to implement in an analytics platform?Why is Test Driven Development so hard to implement in an analytics platform?
Why is Test Driven Development so hard to implement in an analytics platform?
 
Agile And Open Development
Agile And Open DevelopmentAgile And Open Development
Agile And Open Development
 
Research-Based Innovation with Industry: Project Experience and Lessons Learned
Research-Based Innovation with Industry: Project Experience and Lessons LearnedResearch-Based Innovation with Industry: Project Experience and Lessons Learned
Research-Based Innovation with Industry: Project Experience and Lessons Learned
 
Agile software development and challenges
Agile software development and challengesAgile software development and challenges
Agile software development and challenges
 
PROCESS MODELS.ppt
PROCESS MODELS.pptPROCESS MODELS.ppt
PROCESS MODELS.ppt
 
Adopting Cloud Testing for Continuous Delivery, with the premier global provi...
Adopting Cloud Testing for Continuous Delivery, with the premier global provi...Adopting Cloud Testing for Continuous Delivery, with the premier global provi...
Adopting Cloud Testing for Continuous Delivery, with the premier global provi...
 
The 't' in tel software development for tel research problems, pitfalls, and ...
The 't' in tel software development for tel research problems, pitfalls, and ...The 't' in tel software development for tel research problems, pitfalls, and ...
The 't' in tel software development for tel research problems, pitfalls, and ...
 
Agile software development and challenges
Agile software development and challengesAgile software development and challenges
Agile software development and challenges
 
Integrating the users logic into Requirements Engineering
Integrating the users logic into Requirements EngineeringIntegrating the users logic into Requirements Engineering
Integrating the users logic into Requirements Engineering
 
Performance Testing Cloud-Based Systems
Performance Testing Cloud-Based SystemsPerformance Testing Cloud-Based Systems
Performance Testing Cloud-Based Systems
 
Pm soln9416141129710
Pm soln9416141129710Pm soln9416141129710
Pm soln9416141129710
 
MTech- Viva_Voce
MTech- Viva_VoceMTech- Viva_Voce
MTech- Viva_Voce
 
Test management
Test managementTest management
Test management
 

Recently uploaded

ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 

Recently uploaded (20)

ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 

Why is Test Driven Development for Analytics or Data Projects so Hard?

  • 1. Phil Watt Oral Thesis Presentation Why is Test-Driven Development so Hard in Analytics or Data Focussed Projects? University of Melbourne, ISYS90111_2019_TM4
  • 2. Outline INTRODUCTION TO THE PROBLEM SPACE REVIEW OF THE LITERATURE METHODOLOGY RESULTS DISCUSSION AND FURTHER WORK
  • 3. Why is Test-Driven Development (TDD) so hard to adopt within Data and Analytics projects? TDD is an established best practice in software development, promising benefit such as: Reduced Cycle Time Improved Developer Productivity Reduced Production Defects Observation that analytics and data projects mostly do not use TDD, based on: Analytics/data management consulting and delivery experience in 19 countries and 5 continents; Working across hundreds of projects in this domain Concept validated with eight informal interviews. Purpose to shape research direction, before formal data gathering began Interviews with analytics leaders across 5 industry segments: •Two Chief Data Officers •2 Enterprise Architects managing large analytics programmes •2 Heads of data engineering •2 Analytics programme leaders in large enterprises •1 Advanced Analytics practice leader in a large professional services organisations
  • 4. Recognised Challenges from the Literature Review • Software testing is focussed on program code: analytics is focussed on data and information • Analytics data volumes drive a testing context several orders of magnitude greater than most software tests • The valid combination of scenarios in general software testing is limited, but for analytics can be virtually unlimited • Data warehousing testing continues after production deployment (notably regression testing even when code is not changed for some given data), unlike general software testing • Analytics outputs can be non-deterministic, especially for predictions and Machine Learning use cases • The combination of these reasons drives up the cost of TDD and test automation in analytics • Because of these constraints, developer or project discipline may slip, providing lower test coverage and increased defects/cycle time
  • 5. Methodolo gy Mixed methods Formal Interviews A 6-page briefing pack supplied to interviewees two weeks before the interview Audio or video recorded then transcribed Short online survey Invitation only Two questions •Which of the challenges in the previous slide do you recognise? •How difficult were these challenges to overcome? Synthesis and Analysis
  • 8. Recognised Challenges 0 2 4 6 8 10 12 14 16 Testing focused on data, not software Analytics data volumes drive much large testing context Limited valid testing scenarios for software testing, but unlimited for data Data Warehouse Testing continues in production Analytics tests can be non- deterministic Combination of these reasons drives up TDD costs for analytics Combination of reasons can drive poor habits in developers or project managers Other challenges
  • 9. Other challenge s • DWH can have complex logic related to delta processing, historical delta etc which makes it even more difficult to automate [testing]. Multiple source systems which can inject a different type of data due to their own changes make it even more complex. • Capability to handle end-to-end complexity of development task is rare • 1. People with a software background may not understand analytics. 2. DW bugs not fixed post deployment. 3. DW not tested for other purposes. eg. Marketing analytics. • Dev Teams / Leaders don't think of testing in this way • Analysts and Data Scientists rarely have the personality or training to do TDD effectively.
  • 10. Difficulty With Each Challenge Testing focused on data, not software Analytics data volumes drive much large testing context Limited valid testing scenarios for software testing, but unlimited for data Data Warehouse Testing continues in production Analytics tests can be non- deterministic Combination of these reasons drives up TDD costs for analytics Combination of reasons can drive poor habits in developers or project managers
  • 12. About the interviewe es 14 individuals 12 with strong analytics domain experience • 4 Data Scientists • 2 Data Engineers • 4 Enterprise Analytics Architects • 2 Programme Managers 2 control interviews with software engineering backgrounds 5 Industry sectors 1 Public Sector 7 Professional Services (each with experience across multiple sectors) 2 Financial Services 1 Telco 1 Media
  • 13. Interview Highlights TDD advocates (n=4) stressed the importance of ‘habit forming’ to drive adoption and benefits realisation Everyone (n=14) recognised the theoretical benefits of TDD in Analytics 8 said benefits were subject to the expected duration of a project– e.g. one-off pieces of work would not benefit Some disagreement between Data Scientists (n=4) 1 agnostic 2 relied on manual testing, arguing that their work was mainly one-off jobs 1 strongly advocated forming good habits early, adding that test scope could be limited for one off jobs, but was still needed Interviewee commentary about the Recognised Challenges (slide 4) was broadly in line with the survey results All interviewees were invited to complete the survey - 10 responded 8 survey respondents not interviewed, but were invited to respond through my LinkedIn network
  • 15. Discussio n There is strong agreement between survey respondents and interviewees that TDD for analytics is different and more complex than for traditional software engineering Although opinions vary on why, there are some core reasons identified Some support for the idea that TDD is best applied for longer term projects, but should be avoided when they are of short duration Like the heuristic model from Sambinelli et al. (2018) for general software projects A minority of interviewees stress that TDD is always the right thing for analytics, but success depends upon: Early, strong habit forming around TDD practices Careful design of the scope of TDD I find this minority view compelling But this may be confirmation bias on my part
  • 16. Further work With more time I would improve the accuracy of the transcriptions, to enable better text analytics and concept matching across interviews A range of Test Automation case studies over a matrix of scenarios Where TDD is used extensively Where other test automation is used instead of TDD Where manual testing is used For project durations that are short, medium or long For systems that are simple through to complex Analysis of the impact of other factors that could drive productivity, cycle time and quality: Frameworks Low-code development tools Open Source vs proprietary tools
  • 17. Reference s Professional, viewed 8 September 2019, <https://learning-oreilly- com.ezp.lib.unimelb.edu.au/library/view/agile-analytics-a/9780321669575/ch07.html>. • Dzakovic, M 2016, ‘Industrial Application of Automated Regression Testing in Test-Driven ETL Development - IEEE Conference Publication’, in 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME), Institute of Electrical and Electronics Engineers, viewed 8 September 2019, <https://ieeexplore-ieee- org.ezp.lib.unimelb.edu.au/document/7816512?arnumber=7816512&SID=EBSCO:edseee>. • Golfarelli, M & Rizzi, S 2009, ‘A comprehensive approach to data warehouse testing’, Proceeding of the ACM twelfth international workshop on Data warehousing and OLAP - DOLAP ’09, viewed 7 September 2019, <https://dl-acm- org.ezp.lib.unimelb.edu.au/citation.cfm?id=1651295>. • Ivo, AAS, Guerra, EM, Porto, SM, Choma, J & Quiles, MG 2018, ‘An approach for applying Test-Driven Development (TDD) in the development of randomized algorithms’, Journal of Software Engineering Research and Development, vol. 6, no. 1, viewed 13 September 2019, <https://doaj.org/article/8be2f4e3709747e68c04537838b3b314?>. • Krawatzeck, R, Tetzner, A & Dinter, B 2015, An Evaluation of Open Source Unit Testing Tools Suitable for Data Warehouse Testing, p. 22. • Rencberoglu, E 2019, ‘Fundamental Techniques of Feature Engineering for Machine Learning’, Towards Data Science, April, Towards Data Science, viewed 28 September 2019, <https://towardsdatascience.com/feature-engineering-for-machine-learning- 3a5e293a5114>. • Sambinelli, F, Ursini, EL, Borges, MAF & Martins, PS 2018, ‘Modeling and Performance Analysis of Scrumban with Test-Driven Development Using Discrete Event and Fuzzy Logic - IEEE Conference Publication’, in 2018 6th International Conference in Software Engineering Research and Innovation (CONISOFT), IEEE, viewed 14 September 2019, <https://ieeexplore-ieee- org.ezp.lib.unimelb.edu.au/document/8645924?arnumber=8645924&SID=EBSCO:edseee>. • Schutte, S, Ariyachandra, T & Frolick, M 2011, ‘Test-Driven Development of Data Warehouses’, International Journal of Business Intelligence Research, vol. 2, no. 1, pp. 64– 73, viewed 8 September 2019,