Copilot to Cover: Why AI can't replace developers with robots, but can make life better
1. Copilot to Cover:
Why AI can’t replace
developers with robots, but
can make life better
Dr Andy Piper
VP Engineering, Diffblue
2.
3. What You Will Learn
• What is AI-Augmented coding?
• Different approaches to AI-Augmented software
development
• How AI can transform mundane-but-vital coding
tasks
4. Dr Andy Piper
• PhD @Cambridge - Map-Reduce
• Senior Staff Engineer @BEA –
WebLogic Server
• Manager @Oracle – WebLogic
Event Server
• CTO @Push Technology - Real-
time streaming
• Global head valuations tech
@CBRE
• VP Engineering @Diffblue
5. AI-Augmented Coding:
Use of AI – principally machine learning –
to help developers write code
Especially boring, repetitive code
that is tedious and error-prone to write
6. AI-Augmented Coding State of the Art
Coding competitions
Auto-completion Unit Test-Writing
Pre-Trained Transformer-Based
(GPT-2, GPT-3, others)
Reinforcement Learning
CodeWhisperer
8. Transformers: ML That Iteratively Predicts Output
Tokens from Input Tokens
the
quick
brown
fox
Transformer
le
the
quick
brown
fox
Transformer
le
renard
the
quick
brown
fox
Transformer
brun
le
renard
9. Generative Pre-Trained Transformers (GPT) from
OpenAI
GPT-3
• Pre-trained model
• Closed source
• 175b parameters
• GPT-2 + Writes new
text
.
Codex
• Pre-trained model
• Closed source
• 12b parameters
• Writes boilerplate,
repetitive code and
foreign API calls
GPT-2
• Pre-trained model
• Open source
• 1.5b parameters
• Translates text
• Summarises text
• Answers questions
about text
Feb 2019 July 2020 July 2021
11. Codex Runs In the Cloud Due to Model Size
Your IDE
Your code
Azure
Codex Model
Your IDE
Your code +
Completion
Code
fragments
Potential
completions
12. What Is Copilot Good For?
• Quickly completing
• Boilerplate code
• Repetitive code patterns
• “Foreign lands” – patterns for calling external APIs
16. AWS CodeWhisperer
• Same concept as Copilot
• Designed for apps using AWS
services & APIs
• Also transformer-based (supervised
learning)
• Training data is unknown
• Supports Python, Java, Javascript
• Currently in open ‘preview’
18. Test Writing Is Harder Than
Completion
• Needs more context
• The bar for value is much higher
• Best when 100% autonomous
• It has to work and be correct – no approximations
• Determinism is important
• Complex interdependencies & practical difficulties
19. Set of all code
that looks like
it might be a good
test
Supervised learning
(Copilot)
Programs that
are valid and run
High coverage
tests
Tests that
satisfy developer
taste
Tests that will work
The tests you
actually need
Tests that are effective
What you actually need
What GPT will give you, but
not what you need
Searching for the Right Kind of Code
21. Software Change Process with Diffblue Cover
Working
code
Engineer
writes code
change
Pull Request
Engineer
updates
code
PR approved
Update
Diffblue
baseline
No
Yes
Regression Unit
Test Suite
Diffblue
writes test
baseline
Is the
change
correct?
Run all unit
tests
22. What Is Cover Good For?
• 100% autonomous Java unit tests (other languages in
future)
• 100% automated Java unit test maintenance
• Skeleton tests for untestable code
• Dashboard and reporting on coverage, testability, risk
27. How AlphaCode Writes Code
Clear
unambiguous
description of
what the code
must do
Unit tests to
validate the
solution
AlphaCode
Transformer
Hundreds of
potential
solutions
Filter semantic duplicates
via cluster analysis
Tens of
potential
solutions
Run Unit tests
The winning
solution
28. Some Similarities To Cover
• Generates many potential code solutions to the
problem
• Picks the best one
29. What Is AlphaCode Good For?
• Beating 46% of programmers in coding competitions
• A demonstration of future potential vs. a practical solution
• Not available outside DeepMind (today)
30. Summary
• AI-augmented tools are real today
and help eliminate tedious, error-
prone coding tasks
• All the leading tools have free
editions you can try today
• The players in this space are just
getting started: buckle up
31. Learn More About Diffblue Cover
• Talk to us at stand 16
• Visit www.diffblue.com
• Try Cover plug-in & CLI
• www.diffblue.com/free-trial
Completion tools like copilot only use text from the current file for input, so they don’t know about imports
GitHub states that Copilot is exactly right about 43% of the time for Python – the only problem is that your computer can’t tell which completion is correct (if it could, it’d be right 100% of the time). So human review is required for autocompletion, which is a massive task for large-scale test-writing
Language predictors don’t understand anything other than the input text, so don’t apply logic, math etc. – which means it can’t write good assertions on results
Completion tools like copilot only use text from the current file for input, so they don’t know about imports
GitHub states that Copilot is exactly right about 43% of the time for Python – the only problem is that your computer can’t tell which completion is correct (if it could, it’d be right 100% of the time). So human review is required for autocompletion, which is a massive task for large-scale test-writing
Language predictors don’t understand anything other than the input text, so don’t apply logic, math etc. – which means it can’t write good assertions on results