SlideShare a Scribd company logo
1 of 20
Download to read offline
Annotate Types in Large Codebase with Automated
Refactoring
Jimmy Lai, Software Engineer at Carta
Feb. 9, 2022
Tech Stack
…
A Large Python Codebase
Python code
1.8 million lines
27,000 files
120,000 functions
~200 active developers
Lots of TypeError,
AttributeError, ValueError
Type Annotation and Mypy
Mypy: Argument 1 to "add" has incompatible type "str"; expected "int"
Automated Refactoring
Automated code changes for fixing large scale tech
debt (Code Formatting, Type Annotation, Dead Code
Cleanup)
LibCST Features:
● Concrete Syntax Tree
● Transformer and Matcher API
● Metadata with static analysis
Recommended tool: LibCST
A library for modifying Python code easily.
Code Review with Pull Requests
Pull
Request
Pull
Request
Pull
Request
Pull
Request
Add missing types based on static analysis
MonkeyType: add missing types based on runtime data
1. Collect types by running Python program.
2. Aggregate collected types and apply to the code using LibCST.
Run test cases and apply types:
Make it more fun!
Automated weekly updates and leaderboards!
Fully Typed Function Coverage
2018 2021
automated refactoring
Production Type Error Improvement
20

Carta
We are hiring! https://tinyurl.com/carta-jobs
Carta Engineering Blog https://medium.com/building-carta
Contact: jimmy.lai@carta.com

More Related Content

Similar to Annotate types in large codebase with automated refactoring

FDP-faculty deveopmemt program on python
FDP-faculty deveopmemt program on pythonFDP-faculty deveopmemt program on python
FDP-faculty deveopmemt program on python
kannikadg
 
1-Phases of compiler-26-04-2023.pptx
1-Phases of compiler-26-04-2023.pptx1-Phases of compiler-26-04-2023.pptx
1-Phases of compiler-26-04-2023.pptx
venkatapranaykumarGa
 
C# 4.0 and .NET 4.0
C# 4.0 and .NET 4.0C# 4.0 and .NET 4.0
C# 4.0 and .NET 4.0
Buu Nguyen
 
Никита Манько “Code review”
Никита Манько “Code review”Никита Манько “Code review”
Никита Манько “Code review”
EPAM Systems
 

Similar to Annotate types in large codebase with automated refactoring (20)

python programming.pptx
python programming.pptxpython programming.pptx
python programming.pptx
 
Static code analysis for verification of the 64-bit applications
Static code analysis for verification of the 64-bit applicationsStatic code analysis for verification of the 64-bit applications
Static code analysis for verification of the 64-bit applications
 
Pa1 json requests
Pa1 json requestsPa1 json requests
Pa1 json requests
 
Python Linters at Scale.pdf
Python Linters at Scale.pdfPython Linters at Scale.pdf
Python Linters at Scale.pdf
 
FDP-faculty deveopmemt program on python
FDP-faculty deveopmemt program on pythonFDP-faculty deveopmemt program on python
FDP-faculty deveopmemt program on python
 
The Onward Journey: Porting Twisted to Python 3
The Onward Journey: Porting Twisted to Python 3The Onward Journey: Porting Twisted to Python 3
The Onward Journey: Porting Twisted to Python 3
 
API Athens Meetup - API standards 22.03.2016
API Athens Meetup - API standards 22.03.2016API Athens Meetup - API standards 22.03.2016
API Athens Meetup - API standards 22.03.2016
 
1-Phases of compiler-26-04-2023.pptx
1-Phases of compiler-26-04-2023.pptx1-Phases of compiler-26-04-2023.pptx
1-Phases of compiler-26-04-2023.pptx
 
Python Course In Chandigarh
Python Course In ChandigarhPython Course In Chandigarh
Python Course In Chandigarh
 
Pa2 session 4
Pa2 session 4Pa2 session 4
Pa2 session 4
 
C# 4.0 and .NET 4.0
C# 4.0 and .NET 4.0C# 4.0 and .NET 4.0
C# 4.0 and .NET 4.0
 
Pa1 json requests
Pa1 json requestsPa1 json requests
Pa1 json requests
 
Overview of python 2019
Overview of python 2019Overview of python 2019
Overview of python 2019
 
Никита Манько “Code review”
Никита Манько “Code review”Никита Манько “Code review”
Никита Манько “Code review”
 
Python intro
Python introPython intro
Python intro
 
QTP Automation Testing Tutorial 2
QTP Automation Testing Tutorial 2QTP Automation Testing Tutorial 2
QTP Automation Testing Tutorial 2
 
PySide
PySidePySide
PySide
 
Module 3: Introduction to LINQ (PowerPoint Slides)
Module 3: Introduction to LINQ (PowerPoint Slides)Module 3: Introduction to LINQ (PowerPoint Slides)
Module 3: Introduction to LINQ (PowerPoint Slides)
 
Enjoy Type Hints and its benefits
Enjoy Type Hints and its benefitsEnjoy Type Hints and its benefits
Enjoy Type Hints and its benefits
 
Top python interview question and answer
Top python interview question and answerTop python interview question and answer
Top python interview question and answer
 

More from Jimmy Lai

Data Analyst Nanodegree
Data Analyst NanodegreeData Analyst Nanodegree
Data Analyst Nanodegree
Jimmy Lai
 
NetworkX - python graph analysis and visualization @ PyHug
NetworkX - python graph analysis and visualization @ PyHugNetworkX - python graph analysis and visualization @ PyHug
NetworkX - python graph analysis and visualization @ PyHug
Jimmy Lai
 

More from Jimmy Lai (19)

EuroPython 2022 - Automated Refactoring Large Python Codebases
EuroPython 2022 - Automated Refactoring Large Python CodebasesEuroPython 2022 - Automated Refactoring Large Python Codebases
EuroPython 2022 - Automated Refactoring Large Python Codebases
 
The journey of asyncio adoption in instagram
The journey of asyncio adoption in instagramThe journey of asyncio adoption in instagram
The journey of asyncio adoption in instagram
 
Data Analyst Nanodegree
Data Analyst NanodegreeData Analyst Nanodegree
Data Analyst Nanodegree
 
Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...
 
Continuous Delivery: automated testing, continuous integration and continuous...
Continuous Delivery: automated testing, continuous integration and continuous...Continuous Delivery: automated testing, continuous integration and continuous...
Continuous Delivery: automated testing, continuous integration and continuous...
 
Build a Searchable Knowledge Base
Build a Searchable Knowledge BaseBuild a Searchable Knowledge Base
Build a Searchable Knowledge Base
 
[LDSP] Solr Usage
[LDSP] Solr Usage[LDSP] Solr Usage
[LDSP] Solr Usage
 
[LDSP] Search Engine Back End API Solution for Fast Prototyping
[LDSP] Search Engine Back End API Solution for Fast Prototyping[LDSP] Search Engine Back End API Solution for Fast Prototyping
[LDSP] Search Engine Back End API Solution for Fast Prototyping
 
Text classification in scikit-learn
Text classification in scikit-learnText classification in scikit-learn
Text classification in scikit-learn
 
Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
 
Software development practices in python
Software development practices in pythonSoftware development practices in python
Software development practices in python
 
Fast data mining flow prototyping using IPython Notebook
Fast data mining flow prototyping using IPython NotebookFast data mining flow prototyping using IPython Notebook
Fast data mining flow prototyping using IPython Notebook
 
Documentation with sphinx @ PyHug
Documentation with sphinx @ PyHugDocumentation with sphinx @ PyHug
Documentation with sphinx @ PyHug
 
Apache thrift-RPC service cross languages
Apache thrift-RPC service cross languagesApache thrift-RPC service cross languages
Apache thrift-RPC service cross languages
 
NetworkX - python graph analysis and visualization @ PyHug
NetworkX - python graph analysis and visualization @ PyHugNetworkX - python graph analysis and visualization @ PyHug
NetworkX - python graph analysis and visualization @ PyHug
 
When big data meet python @ COSCUP 2012
When big data meet python @ COSCUP 2012When big data meet python @ COSCUP 2012
When big data meet python @ COSCUP 2012
 
Nltk natural language toolkit overview and application @ PyCon.tw 2012
Nltk  natural language toolkit overview and application @ PyCon.tw 2012Nltk  natural language toolkit overview and application @ PyCon.tw 2012
Nltk natural language toolkit overview and application @ PyCon.tw 2012
 
Nltk natural language toolkit overview and application @ PyHug
Nltk  natural language toolkit overview and application @ PyHugNltk  natural language toolkit overview and application @ PyHug
Nltk natural language toolkit overview and application @ PyHug
 

Recently uploaded

Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 

Recently uploaded (20)

Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesLinux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptx
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Ground Improvement Technique: Earth Reinforcement
Ground Improvement Technique: Earth ReinforcementGround Improvement Technique: Earth Reinforcement
Ground Improvement Technique: Earth Reinforcement
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
 
fitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .pptfitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .ppt
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
 

Annotate types in large codebase with automated refactoring

  • 1. Annotate Types in Large Codebase with Automated Refactoring Jimmy Lai, Software Engineer at Carta Feb. 9, 2022
  • 2.
  • 3.
  • 5. A Large Python Codebase Python code 1.8 million lines 27,000 files 120,000 functions ~200 active developers Lots of TypeError, AttributeError, ValueError
  • 6. Type Annotation and Mypy Mypy: Argument 1 to "add" has incompatible type "str"; expected "int"
  • 7. Automated Refactoring Automated code changes for fixing large scale tech debt (Code Formatting, Type Annotation, Dead Code Cleanup) LibCST Features: ● Concrete Syntax Tree ● Transformer and Matcher API ● Metadata with static analysis Recommended tool: LibCST A library for modifying Python code easily.
  • 8. Code Review with Pull Requests Pull Request Pull Request Pull Request Pull Request
  • 9.
  • 10.
  • 11.
  • 12.
  • 13. Add missing types based on static analysis
  • 14.
  • 15.
  • 16. MonkeyType: add missing types based on runtime data 1. Collect types by running Python program. 2. Aggregate collected types and apply to the code using LibCST. Run test cases and apply types:
  • 17. Make it more fun! Automated weekly updates and leaderboards!
  • 18. Fully Typed Function Coverage 2018 2021 automated refactoring
  • 19. Production Type Error Improvement
  • 20. 20  Carta We are hiring! https://tinyurl.com/carta-jobs Carta Engineering Blog https://medium.com/building-carta Contact: jimmy.lai@carta.com