SlideShare a Scribd company logo
1 of 139
TESTING
FOOTGUNS;
10 WAYS TO SHOOT YOURSELF IN THE FOOT
WITH TESTS
Shai Geva | Principal dev @CodiumAI | @shai_ge
Shai Geva
~20 years making software
Love testing
Principal dev, codium.ai
GENERATING MEANINGFUL
TESTS
FOR BUSY DEVS
Return
On
Investment
Tests
PROPERTIES
STRENGTH
Tests
PROPERTIES
STRENGTH
MAINTAINABILITY
Tests
PROPERTIES
PERFORMANCE
STRENGTH
MAINTAINABILITY
Tests
PROPERTIES
THERE ARE
NO TESTS
1
IF IT DOESN’T FAIL
IT DOESN’T PASS
2
3
TESTING
SINGLE
FACT
BEHAVIOR
BookStore
test_user_can_edit_their_own_book
test_user_can_edit_their_own_book test_edit_book
UNDERSTAND
SINGLE TEST
test_user_can_edit_their_own_book test_edit_book
UNDERSTAND
SINGLE TEST
test_user_can_edit_their_own_book test_edit_book
UNDERSTAND
SINGLE TEST
DEBUG
test_user_can_edit_their_own_book test_edit_book
UNDERSTAND
SINGLE TEST
DEBUG
test_user_can_edit_their_own_book test_edit_book
4 UNCLEAR LANGUAGE
test_ (): ...
test_ (): ...
GUIDELINES
GUIDELINES
SINGLE FACT
BEHAVIOR
GUIDELINES
SINGLE FACT
BEHAVIOR
DECISIVE
LANGUAGE
GUIDELINES
SINGLE FACT
BEHAVIOR
DECISIVE
LANGUAGE
SPECIFIC
EXPLICIT
test_edit_book():
test_edit_book_works_correctly():
test_user_should_be_able_to_edit_their_own_book():
test_user_should_be_able_to_edit_their_own_book():
test_user_can_edit_their_own_book():
THE DEVIL IS IN THE
5
DETAIL
S
def test_my_parser():
data = Path(PATH_TO_DATA_FILE).read_text()
parsed_data = parser_under_test(data)
assert parsed_data.total_books == 3
def test_my_parser():
data = Path(PATH_TO_DATA_FILE).read_text()
parsed_data = parser_under_test(data)
assert parsed_data.total_books == 3
def test_my_parser():
data = Path(PATH_TO_DATA_FILE).read_text()
parsed_data = parser_under_test(data)
assert parsed_data.total_books == 3
???
def test_my_parser():
data = “””
{
< JSON with the data >
}
“””
parsed_data = parser_under_test(data)
assert parsed_data.total_books == 3
def test_foobar():
setup = some_thing(with_something_else)
more_data = SomeObj.read(a_path)
combined = “,“.join([setup, more_data])
prep_1 = MoreThings.do(combined, 3)
the_actual_action = foobar(prep_1)
sub_res = the_actual_action[3]
thing_to_assert = json.parse(sub_result)[“key“]
assert thing_to_assert == 3
def test_foobar():
setup = some_thing(with_something_else)
more_data = SomeObj.read(a_path)
combined = “,”.join([setup, more_data])
prep_1 = MoreThings.do(combined, 3)
the_actual_action = foobar(prep_1)
sub_res = the_actual_action[3]
thing_to_assert = json.parse(sub_result)[“key”]
assert thing_to_assert == 3
def test_foobar():
prep_1 = setup_prep_1()
the_actual_action = foobar(prep_1)
# Extract the important key
sub_res = the_actual_action[3]
thing_to_assert = json.parse(sub_result)[“key“]
assert thing_to_assert == 3
def test_foobar():
prep_1 = setup_prep_1()
the_actual_action = foobar(prep_1)
# Extract the important key
sub_res = the_actual_action[3]
thing_to_assert = json.parse(sub_result)[“key“]
assert thing_to_assert == 3
6
THE TESTS ARE
NOT ISOLATED
7
IMPROPER
TEST SCOPE
Book Store MySQL
BEHAVIOR TEST IMPLEMENTATION TEST
def test_editing_description_sets_correct_value():
BEHAVIOR TEST IMPLEMENTATION TEST
BEHAVIOR TEST IMPLEMENTATION TEST
# Create book
# Edit book
# Get updated description
# Assert correctness
# Create book
# Edit book
# Get updated description
# Assert correctness
def test_editing_description_sets_correct_value():
# Create book - API
requests.post(...
# Edit book - API
requests.post(...
# Get updated description - API
new_desc = requests.get(…
# Assert correctness
assert new_desc == ...
# Create book - DB
# Edit book - API
# Get updated description - DB
# Assert correctness
BEHAVIOR TEST IMPLEMENTATION TEST
def test_editing_description_sets_correct_value():
# Create book - DB
DbBook.creat_new(...
# Edit book - API
requests.post(...
# Get updated description - DB
new_desc = DbBook.query_one(…
# Assert correctness
assert new_desc == ...
BEHAVIOR TEST IMPLEMENTATION TEST
# Create book - API
requests.post(...
# Edit book - API
requests.post(...
# Get updated description - API
new_desc = requests.get(…
# Assert correctness
assert new_desc == ...
def test_editing_description_sets_correct_value():
BEHAVIOR TEST IMPLEMENTATION TEST
# Create book - DB
DbBook.creat_new(...
# Edit book - API
requests.post(...
# Get updated description - DB
new_desc = DbBook.query_one(…
# Assert correctness
assert new_desc == ...
# Create book - API
requests.post(...
# Edit book - API
requests.post(...
# Get updated description - API
new_desc = requests.get(…
# Assert correctness
assert new_desc == ...
def test_editing_description_sets_correct_value():
WHAT
COHESIVE
HOW
INCOHESIVE
SO?
BOOK TABLE
DESCRIPTION DESC_ID ID
DESCRIPTION TABLE
VALUE ID
BOOK TABLE
DESCRIPTION ID …
BOOK TABLE
DESCRIPTION ID …
BOOK TABLE
DESCRIPTION DESC_ID ID
DESCRIPTION TABLE
VALUE ID
BOOK TABLE
DESCRIPTION DESC_ID ID
DESCRIPTION TABLE
VALUE ID
Book Store
/edit-book
/new-book
/get-book
BOOK TABLE
DESCRIPTION DESC_ID ID
DESCRIPTION TABLE
VALUE ID
Book Store
/edit-book
/new-book
/get-book
BEHAVIOR
TEST
BOOK TABLE
DESCRIPTION DESC_ID ID
DESCRIPTION TABLE
VALUE ID
Book Store
/edit-book
/new-book
/get-book
BEHAVIOR
TEST
BOOK TABLE
DESCRIPTION DESC_ID ID
DESCRIPTION TABLE
VALUE ID
Book Store
/edit-book
/new-book
/get-book
BEHAVIOR
TEST
BOOK TABLE
DESCRIPTION DESC_ID ID
DESCRIPTION TABLE
VALUE ID
Book Store
/edit-book
/new-book
/get-book
IMPLEMENTATION
TEST
BOOK TABLE
DESCRIPTION DESC_ID ID
DESCRIPTION TABLE
VALUE ID
Book Store
/edit-book
/new-book
/get-book
IMPLEMENTATION
TEST
BOOK TABLE
DESCRIPTION DESC_ID ID
DESCRIPTION TABLE
VALUE ID
Book Store
/edit-book
/new-book
/get-book
IMPLEMENTATION
TEST
BOOK TABLE
DESCRIPTION DESC_ID ID
DESCRIPTION TABLE
VALUE ID
Book Store
/edit-book
/new-book
/get-book
BOOK TABLE
DESCRIPTION DESC_ID ID
DESCRIPTION TABLE
VALUE ID
Book Store
/edit-book
/new-book
/get-book
BEHAVIOR
TEST
BOOK TABLE
DESCRIPTION DESC_ID ID
DESCRIPTION TABLE
VALUE ID
Book Store
/edit-book
/new-book
/get-book
IMPLEMENTATION
TEST
BOOK TABLE
DESCRIPTION DESC_ID ID
DESCRIPTION TABLE
VALUE ID
Book Store
/edit-book
/new-book
/get-book
IMPLEMENTATION
TEST
COHESIVE
BEHAVIOR
INCOHESIVE
IMPLEMENTATION
COHESIVE
BEHAVIOR
INCOHESIVE
IMPLEMENTATION
PASSING == WORKS?
COHESIVE
BEHAVIOR
INCOHESIVE
IMPLEMENTATION
PASSING == WORKS?
CATCHES BUGS
AS EXPECTED?
COHESIVE
BEHAVIOR
INCOHESIVE
IMPLEMENTATION
PASSING == WORKS?
CATCHES BUGS
AS EXPECTED?
ONLY NECESSARY WORK?
COHESIVE
BEHAVIOR
INCOHESIVE
IMPLEMENTATION
PASSING == WORKS?
CATCHES BUGS
AS EXPECTED?
ONLY NECESSARY WORK?
HIGH CONFIDENCE?
SCARY
CHANGES
DRAMATIC
DIFFERENCE
BEHAVIOR
COHESIVE
8
TEST
DOUBLES
TEST DOUBLES == IMPLEMENTATION
REAL
TEST
DOUBLE
REAL
TEST
DOUBLE
REAL
TEST
DOUBLE
REAL
TEST
DOUBLE
REAL
TEST
DOUBLE
USE
WITH
CAUTIO
N
USE
WITH
BU
T
CAUTIO
N
HOW
?
DESIGN
db_fake = []
TEST
THE FAKE
WHY NOT BOTH?
db_fake
db_fake
db_fake
USE, AND VERIFY
CodiumAI
Main server
LLM
SERVICE
CodiumAI
Main server
LLM
SERVICE
CodiumAI
Main server
CodiumAI
Main server
CodiumAI
Main server
REAL
SERVICE
Footgun #: the tests are slow
SLOW
TESTS
9
SLOW TESTS: THE BOTTLENECK AND THE TIME
BOMB
SLOW TESTS: THE BOTTLENECK AND THE TIME
BOMB
>>> workday_hours = 10
>>> workday_hours = 10
>>> test_run_minutes = 5
>>> workday_hours = 10
>>> test_run_minutes = 5
>>> test_run_per_hour = 60 / 5
12
>>> workday_hours = 10
>>> test_run_minutes = 5
>>> test_run_per_hour = 60 / 5
12
>>> test_run_per_day = 12 * 10
120
>>> workday_hours = 10
>>> test_run_minutes = 5
>>> test_run_per_hour = 60 / 5
12
>>> test_run_per_day = 12 * 10
120
>>> workday_hours = 10
>>> workday_hours = 10
>>> test_run_hours = 2
>>> workday_hours = 10
>>> test_run_hours = 2
>>> test_run_per_day = 10 / 2
5
>>> workday_hours = 10
>>> test_run_hours = 2
>>> test_run_per_day = 10 / 2
5
>>> test_run_hours = 0.5
>>> workday_hours = 10
>>> test_run_hours = 0.5
>>> workday_hours = 10
THE SAME, BUT NOT AS COMMON
THAT TIME WHEN
THE BOMB
EXPLODED
TESTS ARE
+ TESTS ARE
SLOW
FLAKY
SITUATION EVEN WORSE
DEFUSE THE BOMB
HOW
TO
DEFUSE THE BOMB
PREPARED TO OPTIMIZE
HOW
TO
B
E
DEFUSE THE BOMB
PREPARED TO OPTIMIZE
CAN RUN IN PARALLEL
HOW
TO
B
E
TEST
S
DEFUSE THE BOMB
PREPARED TO OPTIMIZE
CAN RUN IN PARALLEL
ISOLATED TESTS
HOW
TO
B
E
TEST
S
WRIT
E
SLOW TESTS:
THE FEEDBACK LOOP AND
THE BUG FUNNEL
FAST = 3 seconds,
watch
FAST = 3 seconds,
watch
SLOW = 10 minutes, CI
LET’S BE FAST
HOW LONG FOR THE TESTS TO RUN?
HOW LONG TO CATCH A BUG?
ALL BUGS
FASTEST TESTS
EVEN
SLOWER
REALLY SLOW
LESS FAST TESTS
ALL BUGS
CI INTEGRATION
TESTS
ALL BUGS (10)
10
CI INTEGRATION
TESTS
ALL BUGS (10)
2
2
4
UTs FOR THIS
MODULE
CI UTs
CI INTEGRATION
TESTS
ALL LOCAL UTs 2
ALL BUGS (10)
2
2
4
CI UTs
ALL LOCAL UTs 2
2 SECOND WATCH-
MODE
CI INTEGRATION
TESTS
FEEDBACK LOOP:
TEST DOUBLES
TOOLS etc.!
>>> pytest-watch
Re-run tests on file changes
>>> pytest-testmon
Only run tests that can be impacted
by the code that changed
>>> unittest subTest Organizing test
output when you have large tests.
For pytest: pytest-subtests
>>> hypothesis 😍
Property-based testing - for getting
strong tests.
>>> coverage.py
Python test-coverage
>>> vcrpy
Record HTTP requests
10
WRONG
PRIORITIES
Tests
PROPERTIES
PERFORMANCE
STRENGTH
MAINTAINABILITY
STRENGTH
PERFORMANCE
STRENGTH
MAINTAINABILITY
WHY?
SLOW
SLOW
TOO
LONG
SLOW
TOO
LONG
FEWER
TESTS
SLOW
TOO
LONG
FEWER
TESTS WEAK
HARD TO
MAINTAIN EXPENSIVE
FEWER
TESTS WEAK
HARD TO
MAINTAIN SLOW
PERFORMANCE
STRENGTH
MAINTAINABILITY
THANK YOU
AND
SAFE
CODING!
Shai Geva
@shai_ge
Slides:
GENERATING
MEANINGFUL TESTS
FOR BUSY DEVS

More Related Content

What's hot

On-boarding with JanusGraph Performance
On-boarding with JanusGraph PerformanceOn-boarding with JanusGraph Performance
On-boarding with JanusGraph PerformanceChin Huang
 
[COSCUP 2020] How to use llvm frontend library-libtooling
[COSCUP 2020] How to use llvm frontend library-libtooling[COSCUP 2020] How to use llvm frontend library-libtooling
[COSCUP 2020] How to use llvm frontend library-libtoolingDouglas Chen
 
Building a Real-Time Feature Store at iFood
Building a Real-Time Feature Store at iFoodBuilding a Real-Time Feature Store at iFood
Building a Real-Time Feature Store at iFoodDatabricks
 
Python Programming with Google Colab
Python Programming with Google ColabPython Programming with Google Colab
Python Programming with Google Colabvadhaniseetharaman
 
The disruption called ChatGPT.docx
The disruption called ChatGPT.docxThe disruption called ChatGPT.docx
The disruption called ChatGPT.docxZubair Khan
 
Airflow at lyft for Airflow summit 2020 conference
Airflow at lyft for Airflow summit 2020 conferenceAirflow at lyft for Airflow summit 2020 conference
Airflow at lyft for Airflow summit 2020 conferenceTao Feng
 
PPt on Chat GPT New users.pptx
PPt on Chat GPT New users.pptxPPt on Chat GPT New users.pptx
PPt on Chat GPT New users.pptxMohdMansoorAli1
 
Apache Druid®: A Dance of Distributed Processes
 Apache Druid®: A Dance of Distributed Processes Apache Druid®: A Dance of Distributed Processes
Apache Druid®: A Dance of Distributed ProcessesImply
 
Copilot to Cover: Why AI can't replace developers with robots, but can make l...
Copilot to Cover: Why AI can't replace developers with robots, but can make l...Copilot to Cover: Why AI can't replace developers with robots, but can make l...
Copilot to Cover: Why AI can't replace developers with robots, but can make l...Andy Piper
 
DevOps for Databricks
DevOps for DatabricksDevOps for Databricks
DevOps for DatabricksDatabricks
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberXiang Fu
 
CICD using jenkins and Nomad
CICD using jenkins and NomadCICD using jenkins and Nomad
CICD using jenkins and NomadBram Vogelaar
 
Primeiros passos com a API do Zabbix - 3º Zabbix Meetup do Interior
Primeiros passos com a API do Zabbix - 3º Zabbix Meetup do InteriorPrimeiros passos com a API do Zabbix - 3º Zabbix Meetup do Interior
Primeiros passos com a API do Zabbix - 3º Zabbix Meetup do InteriorZabbix BR
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Airbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stackAirbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stackMichel Tricot
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkDataWorks Summit
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesFlink Forward
 
Streaming SQL for Data Engineers: The Next Big Thing?
Streaming SQL for Data Engineers: The Next Big Thing?Streaming SQL for Data Engineers: The Next Big Thing?
Streaming SQL for Data Engineers: The Next Big Thing?Yaroslav Tkachenko
 

What's hot (20)

On-boarding with JanusGraph Performance
On-boarding with JanusGraph PerformanceOn-boarding with JanusGraph Performance
On-boarding with JanusGraph Performance
 
[COSCUP 2020] How to use llvm frontend library-libtooling
[COSCUP 2020] How to use llvm frontend library-libtooling[COSCUP 2020] How to use llvm frontend library-libtooling
[COSCUP 2020] How to use llvm frontend library-libtooling
 
Building a Real-Time Feature Store at iFood
Building a Real-Time Feature Store at iFoodBuilding a Real-Time Feature Store at iFood
Building a Real-Time Feature Store at iFood
 
Python Programming with Google Colab
Python Programming with Google ColabPython Programming with Google Colab
Python Programming with Google Colab
 
The disruption called ChatGPT.docx
The disruption called ChatGPT.docxThe disruption called ChatGPT.docx
The disruption called ChatGPT.docx
 
Airflow at lyft for Airflow summit 2020 conference
Airflow at lyft for Airflow summit 2020 conferenceAirflow at lyft for Airflow summit 2020 conference
Airflow at lyft for Airflow summit 2020 conference
 
PPt on Chat GPT New users.pptx
PPt on Chat GPT New users.pptxPPt on Chat GPT New users.pptx
PPt on Chat GPT New users.pptx
 
Apache Druid®: A Dance of Distributed Processes
 Apache Druid®: A Dance of Distributed Processes Apache Druid®: A Dance of Distributed Processes
Apache Druid®: A Dance of Distributed Processes
 
Copilot to Cover: Why AI can't replace developers with robots, but can make l...
Copilot to Cover: Why AI can't replace developers with robots, but can make l...Copilot to Cover: Why AI can't replace developers with robots, but can make l...
Copilot to Cover: Why AI can't replace developers with robots, but can make l...
 
DevOps for Databricks
DevOps for DatabricksDevOps for Databricks
DevOps for Databricks
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ Uber
 
CICD using jenkins and Nomad
CICD using jenkins and NomadCICD using jenkins and Nomad
CICD using jenkins and Nomad
 
Primeiros passos com a API do Zabbix - 3º Zabbix Meetup do Interior
Primeiros passos com a API do Zabbix - 3º Zabbix Meetup do InteriorPrimeiros passos com a API do Zabbix - 3º Zabbix Meetup do Interior
Primeiros passos com a API do Zabbix - 3º Zabbix Meetup do Interior
 
Lost with data consistency
Lost with data consistencyLost with data consistency
Lost with data consistency
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Let's Build a Chatbot!
Let's Build a Chatbot!Let's Build a Chatbot!
Let's Build a Chatbot!
 
Airbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stackAirbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stack
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache Flink
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
 
Streaming SQL for Data Engineers: The Next Big Thing?
Streaming SQL for Data Engineers: The Next Big Thing?Streaming SQL for Data Engineers: The Next Big Thing?
Streaming SQL for Data Engineers: The Next Big Thing?
 

Similar to 10 ways to shoot yourself in the foot with tests - Shai Geva, PyConUS 2023

Unit Test Your Database
Unit Test Your DatabaseUnit Test Your Database
Unit Test Your DatabaseDavid Wheeler
 
Unit Testing and Behavior Driven Testing with AngularJS - Jesse Liberty | Fal...
Unit Testing and Behavior Driven Testing with AngularJS - Jesse Liberty | Fal...Unit Testing and Behavior Driven Testing with AngularJS - Jesse Liberty | Fal...
Unit Testing and Behavior Driven Testing with AngularJS - Jesse Liberty | Fal...FalafelSoftware
 
Developing a Culture of Quality Code (Midwest PHP 2020)
Developing a Culture of Quality Code (Midwest PHP 2020)Developing a Culture of Quality Code (Midwest PHP 2020)
Developing a Culture of Quality Code (Midwest PHP 2020)Scott Keck-Warren
 
Testing C# and ASP.net using Ruby
Testing C# and ASP.net using RubyTesting C# and ASP.net using Ruby
Testing C# and ASP.net using RubyBen Hall
 
Introduction to testing
Introduction to testingIntroduction to testing
Introduction to testingManel Sellés
 
How To Test Everything
How To Test EverythingHow To Test Everything
How To Test Everythingnoelrap
 
Django’s nasal passage
Django’s nasal passageDjango’s nasal passage
Django’s nasal passageErik Rose
 
We Are All Testers Now: The Testing Pyramid and Front-End Development
We Are All Testers Now: The Testing Pyramid and Front-End DevelopmentWe Are All Testers Now: The Testing Pyramid and Front-End Development
We Are All Testers Now: The Testing Pyramid and Front-End DevelopmentAll Things Open
 
Test Coverage in Rails
Test Coverage in RailsTest Coverage in Rails
Test Coverage in RailsJames Gray
 
Do You Need That Validation? Let Me Call You Back About It
Do You Need That Validation? Let Me Call You Back About ItDo You Need That Validation? Let Me Call You Back About It
Do You Need That Validation? Let Me Call You Back About ItTobias Pfeiffer
 
RSpock Testing Framework for Ruby
RSpock Testing Framework for RubyRSpock Testing Framework for Ruby
RSpock Testing Framework for RubyBrice Argenson
 
Bdd for-dso-1227123516572504-8
Bdd for-dso-1227123516572504-8Bdd for-dso-1227123516572504-8
Bdd for-dso-1227123516572504-8Frédéric Delorme
 
Tdd for BT E2E test community
Tdd for BT E2E test communityTdd for BT E2E test community
Tdd for BT E2E test communityKerry Buckley
 
From 0 to 100: How we jump-started our frontend testing
From 0 to 100: How we jump-started our frontend testingFrom 0 to 100: How we jump-started our frontend testing
From 0 to 100: How we jump-started our frontend testingHenning Muszynski
 
Test Design Essentials for Great Test Automation - Titus
Test Design Essentials for Great Test Automation - TitusTest Design Essentials for Great Test Automation - Titus
Test Design Essentials for Great Test Automation - TitusSauce Labs
 

Similar to 10 ways to shoot yourself in the foot with tests - Shai Geva, PyConUS 2023 (20)

Testing in Django
Testing in DjangoTesting in Django
Testing in Django
 
Unit Test Your Database
Unit Test Your DatabaseUnit Test Your Database
Unit Test Your Database
 
The Boy Scout Rule
The Boy Scout RuleThe Boy Scout Rule
The Boy Scout Rule
 
Unit Testing and Behavior Driven Testing with AngularJS - Jesse Liberty | Fal...
Unit Testing and Behavior Driven Testing with AngularJS - Jesse Liberty | Fal...Unit Testing and Behavior Driven Testing with AngularJS - Jesse Liberty | Fal...
Unit Testing and Behavior Driven Testing with AngularJS - Jesse Liberty | Fal...
 
Developing a Culture of Quality Code (Midwest PHP 2020)
Developing a Culture of Quality Code (Midwest PHP 2020)Developing a Culture of Quality Code (Midwest PHP 2020)
Developing a Culture of Quality Code (Midwest PHP 2020)
 
TDD & BDD
TDD & BDDTDD & BDD
TDD & BDD
 
Testing C# and ASP.net using Ruby
Testing C# and ASP.net using RubyTesting C# and ASP.net using Ruby
Testing C# and ASP.net using Ruby
 
Why ruby
Why rubyWhy ruby
Why ruby
 
Introduction to testing
Introduction to testingIntroduction to testing
Introduction to testing
 
How To Test Everything
How To Test EverythingHow To Test Everything
How To Test Everything
 
Django’s nasal passage
Django’s nasal passageDjango’s nasal passage
Django’s nasal passage
 
We Are All Testers Now: The Testing Pyramid and Front-End Development
We Are All Testers Now: The Testing Pyramid and Front-End DevelopmentWe Are All Testers Now: The Testing Pyramid and Front-End Development
We Are All Testers Now: The Testing Pyramid and Front-End Development
 
Test Coverage in Rails
Test Coverage in RailsTest Coverage in Rails
Test Coverage in Rails
 
Do You Need That Validation? Let Me Call You Back About It
Do You Need That Validation? Let Me Call You Back About ItDo You Need That Validation? Let Me Call You Back About It
Do You Need That Validation? Let Me Call You Back About It
 
RSpock Testing Framework for Ruby
RSpock Testing Framework for RubyRSpock Testing Framework for Ruby
RSpock Testing Framework for Ruby
 
TDD
TDDTDD
TDD
 
Bdd for-dso-1227123516572504-8
Bdd for-dso-1227123516572504-8Bdd for-dso-1227123516572504-8
Bdd for-dso-1227123516572504-8
 
Tdd for BT E2E test community
Tdd for BT E2E test communityTdd for BT E2E test community
Tdd for BT E2E test community
 
From 0 to 100: How we jump-started our frontend testing
From 0 to 100: How we jump-started our frontend testingFrom 0 to 100: How we jump-started our frontend testing
From 0 to 100: How we jump-started our frontend testing
 
Test Design Essentials for Great Test Automation - Titus
Test Design Essentials for Great Test Automation - TitusTest Design Essentials for Great Test Automation - Titus
Test Design Essentials for Great Test Automation - Titus
 

Recently uploaded

%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...masabamasaba
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Hararemasabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburgmasabamasaba
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 

Recently uploaded (20)

%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 

10 ways to shoot yourself in the foot with tests - Shai Geva, PyConUS 2023

Editor's Notes

  1. Hi everyone, Thank you for coming. Today I’m going to talk about testing. Testing is great, But if we do it wrong, sometimes it’s not so great.
  2. Quick intro: My name is Shai Geva I’ve been in the industry for while now, mostly in hands-on engineering, but also in other roles like management and product.
  3. I’m a principal developer at codium.ai. So, my day job is creating tools that generate tests, which is pretty nice for someone who loves testing. Lots of cool stuff happening in this field, if you want to talk about it, come catch me after the talk.
  4. My purpose with this talk is to help you get a better ROI on testing work. I’ll talk about concrete things I’ve seen that can hurt that - either make us spend more time on testing work or make that work less effective. Naturally - not everything is going to be a good match for every team, and some things are going to need adjustments. So take the basic ideas, and see what applies to you.
  5. We’ll talk about different practices, different ways that we can work. These practices will affect us by changing properties of our tests. The main properties we’ll see:
  6. Strength - how good the tests are at catching bugs.
  7. Maintainability - How easy it is for us to deal with the tests as things change. And, dealing with changes is very important. As developers, dealing with change is one of the main things that we do - changes to requirements, changes to scale, and even things like changes to the team as new developers join.
  8. And Performance - How long does it take for the tests to run. This might sound like a lesser consideration because computers are fast, but as we’ll see, it matters.
  9. So, 10 ways to shoot yourself in the foot with tests
  10. Footgun 1: There are no tests It’s better to have some tests than to not have tests at all. Even if the tests are not well-written, and even if they seem like a drop in the sea. They still catch bugs and they are still an improvement. So if you don’t have tests yet - just start with something small, and slowly keep improving.
  11. Footgun 2: If it doesn’t fail, it doesn’t pass. Sometimes our tests lie to us. We have a test that is supposed to protect us from something - but it still happens. Obviously, this happens because the test didn’t actually check what we thought it does. Maybe we copy-pasted and forgot to change something. My suggestion here - when you write a test, always make it fail. For every assertion, do a tiny change either to the code or the test, and make sure it fails the way you expect. And only after you saw that it fails - consider it a passing test.
  12. Just like with product code, if we put too many things in the same place we get a mess. My rule of thumb is to try hard to test a single fact about the behavior of the code. And it helps if I use these specific words mentally.
  13. SINGLE. FACT. About the BEHAVIOR.
  14. Let’s say we have a book store and we’re testing the edit book functionality.
  15. For example, that’s a single fact about the behavior the code. user_can_edit_their_own_book.
  16. And, this is not a single fact test_edit_book It’s general How do they compare?
  17. Single fact test: It’s clear what the test checks. It’s clear that it only checks that.
  18. But, with the general test: we’ll need to read and understand all the test code to know.
  19. If the single-fact test fails, it’s clear what functionality stopped working. And because it’s small, it’ll be easy to debug it.
  20. If the general test fails, anything related to edit book might have failed. We’ll need to dig in. And it does a lot of things, so debugging might be a lot of work.
  21. Footgun 4: unclear language. The words we use make a big difference in how we think about the tests, and how easy it is to understand them.
  22. The guidelines I use for myself:
  23. First, like we said earlier - prefer to test a single fact about the behavior of code. This is not the language itself but it really sets the tone.
  24. We want to use decisive language
  25. And we want the language to be specific and explicit. A few examples of this:
  26. Using the same example: test_edit_book. Like we saw, this is hard to understand, so it’s not a very good choice.
  27. Adding things like “works” or “is correct” - most of the time, it’s just bloat. Doesn’t really help.
  28. test_user_should_be_able_to_edit_their_own_book That’s better. Much more specific .
  29. The only problem here is this indecisive language. It’s kind-of confusing, right? Why “should”? Are we not sure about this? Is this ever going to be NOT TRUE? So also, not optimal.
  30. And, again, this does sound like a fact. “User can edit their own book”, Decisive, specific, explicit. And I recommend to go with language like this.
  31. Footgun 5: The devil’s in the details. Tests that highlight too much or too little detail are more difficult to maintain.
  32. One problem is what I like to think of as non-locality. Here we’re testing some parser, and the data is in a file -
  33. so we read the data from the file before we move on with the test.
  34. The problem is that no matter what the test checks, it’s impossible to know if this is correct without going somewhere else and looking. Maybe a different file, but even if it’s a constant at the top of this file. Sometimes we can’t avoid this, but a lot of times we can.
  35. Maybe try something like this. It’s exactly the same test, but the data is local. If you can find some data sample that’s small enough to fit locally, the tests become much more maintainable.
  36. The other side of this is too-much-detail. There’s so much stuff. The important things
  37. It’s not easy to spot them
  38. And just organizing a little makes a big difference
  39. It’s not a lot of work, And it makes it much easier to understand what’s going on.
  40. Footgun 6: the tests are not isolated If your tests are not isolated, it means you sometimes get different behavior if you run only some of them or run them in a different order. And what I have to say about tests that are not isolated,
  41. Is DON’T. Just don’t do it. If you have 30 tests, and test number 24 fails because of some things that happened during test 8 and test 15. You’re not going to have a good time debugging. This gets really bad, and I can not stress enough how much I’m against this.
  42. Footgun #7: Improper test scope. This is the root cause for many testing problems. My approach here, is that we want to test a cohesive whole. Some complete, make-sense story. It’s very close to the notion of “testing implementation instead of behavior”, Just a phrasing that’s a little more general.
  43. Let’s say our Book Store is a web service, and it uses a DB.
  44. We’ll think about two alternative test suites - “behavior tests” and “implementation tests” And try to imagine what our life will look like if we would have chosen one test suite or the other.
  45. We’ll look at an almost identical test in both test suites. The test verifies that if we edit the description of a book, then it has really been updated. Pretty simple.
  46. Both tests have the same flow - Create a book Edit the book Get the updated description Make the assertion.
  47. The behavior test does everything through the external http API, IN THE SAME WAY things would be done in the actual system.
  48. The implementation test does some of the things at a lower level: It creates the book by directly creating a record in the database, and it also checks the updated description through the DB.
  49. So the behavior test only looks at the WHAT - It looks at things as they appear from outside. The implementation test also knows about HOW. It knows how the code will change the DB. Now, checking the implementation like this will USUALLY be equivalent to the behavior - but not always.
  50. But why does this matter to us? Let’s look at a possible scenario: We’ve had this test suite for a while, maybe even years. We’ve invested a lot in them, and we rely on them.
  51. Now, we’re making a change to optimize the database. We’re moving the description out of the Book table, and into a separate table.
  52. However, we’re not deleting the old field yet - we’ll do that later after all the data has moved to the new table. Now, we’re finished with everything else, and, it’s time to update the edit-book endpoint.
  53. Now, what if we just forgot to update the edit-book endpoint? Completely forgot. It now changes the wrong field in the database so behavior-wise, it doesn’t do anything. If this gets to production, then we created a major bug.
  54. If we chose behavior tests -
  55. The test only uses the external API. The behavior test…
  56. Does not care about implementation details. So if the behavior is wrong, the test will fail, just like it should. The regression bug was prevented. Everything’s ok.
  57. If we chose the implementation test -
  58. The test looks directly at old description field in the DB. When we run the test, the old description field will change, just like before, so the test will not fail. The regression bug was not prevented. And a major bug made it to production.
  59. It’s not ok.
  60. On the other side of this, what if we made the change correctly? Edit book now changes the new table instead of the old field. No bugs, everything’s fine.
  61. If we chose the behavior test - Everything behaves correctly, so the test will pass. We don’t need to do anything.
  62. If we chose the implementation test - The old field is not updated any more, so even though the code is correct, this test will fail. The distinction here is that the failure reason is not that the code is not correct. The test fails because it has become technically invalid. So, we have extra work - we need to figure out whether the failure is real or technical. And then we’ll need to update the test. Also - because we changed the test, we now have less confidence in it. We need to learn to trust it again.
  63. On large code bases, this can become a real pain. You have to update the tests, even if the code change has no bugs, and sometimes even if the test has nothing to do with the feature you worked on. You end up wasting hours and you hate the test suite.
  64. Summing up
  65. Cohesive, behavior tests - are closer to reality
  66. They are better at protecting us.
  67. The create less redundant work
  68. And we have higher confidence in them in the long run.
  69. One more thing worth mentioning: we looked at an example of a small, incremental change. But sometimes, we need to make BIG changes. SCARY changes. It happens less often but when it happens it’s a big deal. For example, in a lot of companies, at some point, the DB doesn’t deal with the scale well. We have stability issues, and we need to make a big change - maybe move the data to a different type of database. And that’s when tests are MOST important. And if we went with behavior level tests - everything will be fine. Those same tests that we’ve been running with for 3 years now - we don’t change them. When they pass, we’re done. But if we went with Implementation level tests - they all become technically invalid and they all fail. We will need to port all of them to use the new database, but more importantly: because we’re changing them - we’re not going to trust them enough. This might make the difference between a project that takes a few weeks, and a company-level event that drags out for months while the product has stability issues.
  70. So I cannot recommend enough. Test behavior. A cohesive whole.
  71. Footgun 8 - test doubles everywhere Sometimes, in a test, we switch a part of the system, a dependency, with an alternative implementation. These are called test doubles. Things like stubs, mocks and fakes. The main reason we use them is performance - if the real thing is too slow to run a lot of tests, we switch it with a fast test double. Test doubles can be useful, but…
  72. Test doubles are a re-implementation. They know the implementation details of the thing they’re replacing. Different types of test doubles do it differently, but this is what they do.
  73. The main problem this causes is correctness. The test double might not behave exactly like the real thing, and that makes the tests less accurate, less correct.
  74. And as times goes by,
  75. the real thing might be slowly changed,
  76. but the test double would stay the same, so it would drift further and further from reality.
  77. And, of course, this can hurt your foot. This is actually a flavor of the implementation vs. behavior problem. There are some differences, but essentially, it’s the same category of issues - tests that use test doubles are not as good at catching bugs, and sometimes they fail even though the code is correct, causing all that extra work.
  78. So, test doubles - use, with caution.
  79. The question is - how do we avoid the pitfalls? And I’ll suggest a couple of ideas
  80. First - code design. So important. Try to design so you can test a lot of functionality effectively, with fast unit tests, that don’t need test doubles. Not ALWAYS possible, but a lot of times it is.
  81. Another thing is to choose which test double you’re working with. And I suggest to mostly use fakes. A fake behaves like your dependency, but fast. For example, a fake database table can be an in-memory list of tuples, where each tuple is a row. In tests - it behaves the same way.
  82. We can make a fake more realiable by writing some tests, not for the code - but for the fake itself. For example, we can run the same operations against the fake and the real thing and verify we get the same results. It’ll never be 100% the same - we make tradeoffs in how much we are willing to invest in testing the fake.
  83. Sometimes, a reliable fake already exists. For example, if you’re using SQLite - Python actually has a built-in, in-memory implementation. So google it, maybe you’ll get lucky.
  84. An interesting thing you can do with fakes, is to run exactly the same test - once with a fake, and once with the real thing.
  85. For example, maybe we have 10 tests, and that’s too much to run against the real thing.
  86. So we run all 10 with the fake.
  87. And then, we choose the 2 most important ones, and we run them ALSO with the real thing. And this gives us some real world certainty.
  88. The essence of the idea is to use test doubles, but selectively verify their correctness until we get an acceptable tradeoff.
  89. Another way to “use test doubles and verify” is by caching recordings. We can record HTTP requests, DB actions, or anything else.
  90. For example, at CodiumAI, We have an HTTP service that calls another HTTP service - an AI layer that does code analysis and generation.
  91. So, in our tests, we record and save the HTTP interactions between the main server and the AI server.
  92. Locally, we run the tests against the recordings, so they are very fast.
  93. But - we also verify. In the cloud, we also run the tests against the real thing to make sure they are still valid.
  94. Footgun 9: Slow tests Yeah, slow tests are not fun. I’ll talk about two ways in which they are not fun.
  95. The first way is what I like to think of as the bottleneck and the time bomb.
  96. The bottleneck here is where the tests take so long to run, that we have a long queue of tasks waiting to be merged to the main branch.
  97. What kind of numbers are we’re talking about here? Assume we have, say, 10 work-hours each day.
  98. with tests that take 5 minutes.
  99. So, that’s 12 merges per hour,
  100. Which is 120 merges a day before the tests slow us down. For most teams, that virtually never happens, so a 5-minute test suite -
  101. not a bottleneck.
  102. On the other extreme, and it usually won’t get to that but just so it’s easy to imagine -
  103. if the test suite takes 2 HOURS,
  104. Then we can only merge 5 tasks to main each day.
  105. Whenever we want to wrap up a bunch of tasks quickly, maybe before a major version, the merge queue length becomes days. If the tests sometimes fail, then this can happen on any random day. It just doesn’t work. The team will probably just stop waiting for the tests to pass before merging, and spend a lot of time with the tests being broken. Now, we can SURVIVE this way. But it’s a lot of extra work and it’s really not what we want.
  106. And really, even with less extreme numbers, From what I see: With a 30 minutes test suite
  107. The same things happen. They happen less, but they happen.
  108. And, a few years ago I was actually part of a team where this happened. When the tests took 20 minutes, I understood it’s a time bomb and eventually things were going to get bad. But I didn’t have this clear phrasing of exactly how the slowness would be a problem. The bottleneck.
  109. Another problem there was that the tests were also flaky and we always needed to fix them, so it wasn’t clear to most people that slowness was the more urgent problem.
  110. After a while, we were getting all these problems every few weeks. Multi-day merge queues, everything was stuck. Real crisis mode. It only became ok after we did an expensive project and made the tests run in parallel. Tests would still break sometimes, but the queue got back to zero fast enough so it was not a crisis.
  111. The question is - what do we do about this?
  112. We don’t want premature optimization, so what we need on day 1 is to make sure that WHEN we want to optimize, it’s not going to be a very expensive project.
  113. And specifically, it should be possible to run the tests in parallel because that’s going to be the go-to solution.
  114. The only thing we need for that, is to remember the footgun about isolated tests. If the tests don’t affect each other, they can run in parallel. My advice is to consider this as a must-have.
  115. Another way that slow tests can hurt us is by making our feedback loop longer. The feedback loop is how fast we learn about bugs and understand what happened. And I’m talking about any type of bug here - anything from a typo to complex concurrency issues. The feedback loop is very important, And anything that makes it shorter is very good. Even a squiggly red line in the IDE.
  116. I usually aim for a setup where most of the time, I’m working in watch-mode, so the tests re-run every time a file changes, and I run a sub-set of the tests that finishes within 2 or 3 seconds. Being on the fast side is great. For example, if a test fails just a few seconds after I wrote the code - I instantly understand what’s going on.
  117. With a 10-minute tests suite in CI - the commit with a failing test contains a lot of code. Plus my brain will do a context switch and go catch up on slack. So when I try to understand what’s going on with the failing test - it’s a lot more work.
  118. Now, some tests HAVE to be slow. But, we can still have pretty fast feedback loop.
  119. What helps me here is that instead of asking “How long does it take for the tests to run” I’m asking “How long does it take to catch a bug”
  120. And I’m visualizing this using the “bug funnel”. All possible theoretical bugs come in, and some of them get filtered out on every stage. And the key observation here is that what matters to the feedback loop is that we catch MOST bugs quickly. We will have a good experience if the feedback loop is USUALLY fast.
  121. Let’s say we start out with a bug funnel that looks like this. We only have long-running integration tests, and we only run them in CI.
  122. So if we create 10 bugs - we need to wait and debug the CI 10 times.
  123. If we start adding fast unit tests, then pretty quickly the bug funnel will look more like this. We don’t wait 10 times for the long-running CI anymore. Only for, say, 2 of the bugs. For the rest of the bugs, so most of the time - we’ll have a much faster feedback loop.
  124. And try to run at least some of the tests in watch-mode! You will have a 2-second feedback loop, even if it’s not for everything.
  125. And, as we discussed - you can also use test doubles, that’s why they exist. By the way, local recordings have a very good tradeoff here - you should try it.
  126. So, before the last footgun - I haven’t directly mentioned specific tools because it kind of broke the flow, so here is a bunch of stuff worth exploring. No need to take a picture - I uploaded the slides, and there’s a link at the end.
  127. Footgun 10: wrong priorities
  128. We saw a bunch of different practices, and how they will affect us by changing the properties of our tests. The bug funnel is all about performance. “Testing implementation instead of behavior” us about maintainability and strength. But how do we prioritize?
  129. Now, the objective of tests is their strength. We have tests so that they catch bugs.
  130. The unintuitive thing is that this is not what we should prioritize when we work. Start with making them maintainable, Then make sure they are fast enough And then make them strong
  131. Here’s the thing
  132. Slow tests are weak, or at least they are EVENTUALLY weak. Let’s say that, as a team, we decided that we are not willing to have tests that run for more than 30 minutes.
  133. If, at some point the tests reach 30 minutes…
  134. It becomes very difficult to add more tests.
  135. So after enough time, there will be a lot of code that’s not tested well.
  136. And the same thing happens with maintenance. It’s more subtle, but if tests are not maintainable, it costs more to have them, and we end up creating fewer tests. So again, they will be eventually weak.
  137. And, maintainability issues can also make it difficult to handle performance. An example we saw is test isolation and parallelization.
  138. In other words, Maintainability is a necessary condition for performance, and both are necessary conditions for strength. So make maintainability the priority. Testing a single fact, code design and all the others. When you have a choice to make - I suggest to go with the most maintainable option almost always. Even at the cost of other things. Because in the long run, that’s how we get tests that let us move fast, and have confidence in our code.
  139. Thank you!