Deep learning Type Inference for Dynamic Programming Languages

Deep Learning Type Inference for Dynamic
Programming Languages
Amir M. Mir
PhD Student in Software Engineering Research Group
s.a.m.mir@tudelft.nl
SERG Lunch
April 22, 20201

Content
● Introduction
● Type annotations
● Existing Deep Learning-based approaches
● Major Research Problem
● Our current approach
2

Introduction
Dynamic programming languages such as Python and JavaScript are extremely
popular nowadays.
3

Introduction
Dynamic languages enable fast prototyping.
4

Issues of Dynamic Languages
● Type errors
● Suboptimal IDE support
● Unexpected runtime behavior
● Difficult-to-understand APIs
5

Type Annotations
● Type hints for Python 3 (PEP 484, Sep. 2014)
● TypeScript with optional static types (Oct. 2012)
6

Type Annotations
TypeScript example:
7

Type Annotations
Python example:
8

Type Annotations
Issues
● Relies on developers
● Cumbersome and error-prone process
● Two main approaches for inferring types:
○ Static analysis tools
○ ML-based approaches
9

Static Type Checkers
● Mypy (mypy-lang.org/)
● Pyre (pyre-check.org/)
● Flow (flow.org/)
10

Existing Deep learning-based Approaches
11

Existing Deep learning-based Approaches
● DeepTyper (Vincent et al., 2018)
● NL2Type (Malik et al., 2019)
● TypeWriter (Pradel et al., 2020)
● LAMBDANET (Wel et al., 2020)
12

DeepTyper
● Inspired by part-of-speech (POS) tagging in NLP research
● The task is modelled as a sequence of annotations.
● Employs a Bi-directional Recurrent Neural Network (bi-RNN).
● Adds a consistency layer to the bi-RNN for considering multiple usage of a
variable.
13

NL2Type
● Considers natural language information embedded in code
○ Name of the function
○ Name of the formal parameters
○ Comment associated with the function
○ Comment associated with the parameters
○ Comment associated with return type of the function
● Learns two word embeddings for both comments and identifier names
● Adapts an RNN with long short-term memory (LSTM)
15

TypeWriter
● Considers four kinds of context information:
○ Identifiers names
○ Code occurrences
○ Function-level comments
○ Available type hints
● Similar to NL2Type, it trains two word embeddings
● Has three RNNs submodels:
○ Learning from identifiers
○ Learning from token sequences
○ Learning from comments
● Feedback-guided search for consistent types
18

TypeWriter
Available type hints
Identifiers
Comments
Code occurrences
19

LAMBDANET
● Imposes hard constraints on types
● Contextual hints
● Type dependency graph, i.e. a set of predicates
● Uses a Graph Neural Network (GNNs) and proposes a pointer-like network
for handling user-defined types
21

LAMBDANET
Type Dependency Graph
22

LAMBDANET
Hyperedges in type dependency graph
23

Major Research Problem
Closed type vocabulary, i.e. limited to 1000 types.
24

Out-of-Vocabulary Problem
25
DNN Model
Unknown
Return type
Parameter type

Our Current Approach
New dataset
26

Re-implementation of TypeWriter with new dataset
27
~27% higher ~7% higher

Improved available type extractor
28
[AbstactResolver,
ClientConnectionError,
ClientHttpProxyError,
…]
1
1
0
0Python dataset Visible Type hints extractor
Type mask vector● Lightweight static analysis with importlab and LibCST

Future
● Refinements to the search part and/or the neural model
● Performing extensive experiments to show the effectiveness of the
approach
● Writing a paper draft by the end of June.
29

References
1. Hellendoorn, V. J., Bird, C., Barr, E. T., & Allamanis, M. (2018, October). Deep learning type inference. In Proceedings of the 2018 26th acm
joint meeting on european software engineering conference and symposium on the foundations of software engineering (pp. 152-162).
2. Malik, R. S., Patra, J., & Pradel, M. (2019, May). NL2Type: inferring JavaScript function types from natural language information. In 2019
IEEE/ACM 41st International Conference on Software Engineering (ICSE) (pp. 304-315). IEEE.
3. Pradel, M., Gousios, G., Liu, J., & Chandra, S. (2019). TypeWriter: Neural Type Prediction with Search-based Validation. arXiv preprint
arXiv:1912.03768.
4. Wei, J., Goyal, M., Durrett, G., & Dillig, I. (2020). LambdaNet: Probabilistic Type Inference using Graph Neural Networks. ICLR 2020.
5. Gage, P. (1994). A new algorithm for data compression. C Users Journal, 12(2), 23-38.
6. Sennrich, R., Haddow, B., & Birch, A. (2015). Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909.
31

Deep learning Type Inference for Dynamic Programming Languages

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Deep learning Type Inference for Dynamic Programming Languages

Similar to Deep learning Type Inference for Dynamic Programming Languages (20)

Recently uploaded

Recently uploaded (20)

Deep learning Type Inference for Dynamic Programming Languages