Deep Learning Type Inference for Dynamic
Programming Languages
Amir M. Mir
PhD Student in Software Engineering Research Group
s.a.m.mir@tudelft.nl
SERG Lunch
April 22, 20201
Content
● Introduction
● Type annotations
● Existing Deep Learning-based approaches
● Major Research Problem
● Our current approach
2
Introduction
Dynamic programming languages such as Python and JavaScript are extremely
popular nowadays.
3
Introduction
Dynamic languages enable fast prototyping.
4
Issues of Dynamic Languages
● Type errors
● Suboptimal IDE support
● Unexpected runtime behavior
● Difficult-to-understand APIs
5
Type Annotations
● Type hints for Python 3 (PEP 484, Sep. 2014)
● TypeScript with optional static types (Oct. 2012)
6
Type Annotations
TypeScript example:
7
Type Annotations
Python example:
8
Type Annotations
Issues
● Relies on developers
● Cumbersome and error-prone process
● Two main approaches for inferring types:
○ Static analysis tools
○ ML-based approaches
9
Static Type Checkers
● Mypy (mypy-lang.org/)
● Pyre (pyre-check.org/)
● Flow (flow.org/)
10
Existing Deep learning-based Approaches
11
Existing Deep learning-based Approaches
● DeepTyper (Vincent et al., 2018)
● NL2Type (Malik et al., 2019)
● TypeWriter (Pradel et al., 2020)
● LAMBDANET (Wel et al., 2020)
12
DeepTyper
● Inspired by part-of-speech (POS) tagging in NLP research
● The task is modelled as a sequence of annotations.
● Employs a Bi-directional Recurrent Neural Network (bi-RNN).
● Adds a consistency layer to the bi-RNN for considering multiple usage of a
variable.
13
DeepTyper
14
NL2Type
● Considers natural language information embedded in code
○ Name of the function
○ Name of the formal parameters
○ Comment associated with the function
○ Comment associated with the parameters
○ Comment associated with return type of the function
● Learns two word embeddings for both comments and identifier names
● Adapts an RNN with long short-term memory (LSTM)
15
NL2Type
16
NL2Type
17
TypeWriter
● Considers four kinds of context information:
○ Identifiers names
○ Code occurrences
○ Function-level comments
○ Available type hints
● Similar to NL2Type, it trains two word embeddings
● Has three RNNs submodels:
○ Learning from identifiers
○ Learning from token sequences
○ Learning from comments
● Feedback-guided search for consistent types
18
TypeWriter
Available type hints
Identifiers
Comments
Code occurrences
19
TypeWriter
20
LAMBDANET
● Imposes hard constraints on types
● Contextual hints
● Type dependency graph, i.e. a set of predicates
● Uses a Graph Neural Network (GNNs) and proposes a pointer-like network
for handling user-defined types
21
LAMBDANET
Type Dependency Graph
22
LAMBDANET
Hyperedges in type dependency graph
23
Major Research Problem
Closed type vocabulary, i.e. limited to 1000 types.
24
Out-of-Vocabulary Problem
25
DNN Model
Unknown
Return type
Parameter type
Our Current Approach
New dataset
26
Re-implementation of TypeWriter with new dataset
27
~27% higher ~7% higher
Our Current Approach
Improved available type extractor
28
[AbstactResolver,
ClientConnectionError,
ClientHttpProxyError,
…]
1
1
0
0Python dataset Visible Type hints extractor
Type mask vector● Lightweight static analysis with importlab and LibCST
Our Current Approach
Future
● Refinements to the search part and/or the neural model
● Performing extensive experiments to show the effectiveness of the
approach
● Writing a paper draft by the end of June.
29
Thank You!
30
References
1. Hellendoorn, V. J., Bird, C., Barr, E. T., & Allamanis, M. (2018, October). Deep learning type inference. In Proceedings of the 2018 26th acm
joint meeting on european software engineering conference and symposium on the foundations of software engineering (pp. 152-162).
2. Malik, R. S., Patra, J., & Pradel, M. (2019, May). NL2Type: inferring JavaScript function types from natural language information. In 2019
IEEE/ACM 41st International Conference on Software Engineering (ICSE) (pp. 304-315). IEEE.
3. Pradel, M., Gousios, G., Liu, J., & Chandra, S. (2019). TypeWriter: Neural Type Prediction with Search-based Validation. arXiv preprint
arXiv:1912.03768.
4. Wei, J., Goyal, M., Durrett, G., & Dillig, I. (2020). LambdaNet: Probabilistic Type Inference using Graph Neural Networks. ICLR 2020.
5. Gage, P. (1994). A new algorithm for data compression. C Users Journal, 12(2), 23-38.
6. Sennrich, R., Haddow, B., & Birch, A. (2015). Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909.
31

Deep learning Type Inference for Dynamic Programming Languages

  • 1.
    Deep Learning TypeInference for Dynamic Programming Languages Amir M. Mir PhD Student in Software Engineering Research Group s.a.m.mir@tudelft.nl SERG Lunch April 22, 20201
  • 2.
    Content ● Introduction ● Typeannotations ● Existing Deep Learning-based approaches ● Major Research Problem ● Our current approach 2
  • 3.
    Introduction Dynamic programming languagessuch as Python and JavaScript are extremely popular nowadays. 3
  • 4.
  • 5.
    Issues of DynamicLanguages ● Type errors ● Suboptimal IDE support ● Unexpected runtime behavior ● Difficult-to-understand APIs 5
  • 6.
    Type Annotations ● Typehints for Python 3 (PEP 484, Sep. 2014) ● TypeScript with optional static types (Oct. 2012) 6
  • 7.
  • 8.
  • 9.
    Type Annotations Issues ● Relieson developers ● Cumbersome and error-prone process ● Two main approaches for inferring types: ○ Static analysis tools ○ ML-based approaches 9
  • 10.
    Static Type Checkers ●Mypy (mypy-lang.org/) ● Pyre (pyre-check.org/) ● Flow (flow.org/) 10
  • 11.
  • 12.
    Existing Deep learning-basedApproaches ● DeepTyper (Vincent et al., 2018) ● NL2Type (Malik et al., 2019) ● TypeWriter (Pradel et al., 2020) ● LAMBDANET (Wel et al., 2020) 12
  • 13.
    DeepTyper ● Inspired bypart-of-speech (POS) tagging in NLP research ● The task is modelled as a sequence of annotations. ● Employs a Bi-directional Recurrent Neural Network (bi-RNN). ● Adds a consistency layer to the bi-RNN for considering multiple usage of a variable. 13
  • 14.
  • 15.
    NL2Type ● Considers naturallanguage information embedded in code ○ Name of the function ○ Name of the formal parameters ○ Comment associated with the function ○ Comment associated with the parameters ○ Comment associated with return type of the function ● Learns two word embeddings for both comments and identifier names ● Adapts an RNN with long short-term memory (LSTM) 15
  • 16.
  • 17.
  • 18.
    TypeWriter ● Considers fourkinds of context information: ○ Identifiers names ○ Code occurrences ○ Function-level comments ○ Available type hints ● Similar to NL2Type, it trains two word embeddings ● Has three RNNs submodels: ○ Learning from identifiers ○ Learning from token sequences ○ Learning from comments ● Feedback-guided search for consistent types 18
  • 19.
  • 20.
  • 21.
    LAMBDANET ● Imposes hardconstraints on types ● Contextual hints ● Type dependency graph, i.e. a set of predicates ● Uses a Graph Neural Network (GNNs) and proposes a pointer-like network for handling user-defined types 21
  • 22.
  • 23.
    LAMBDANET Hyperedges in typedependency graph 23
  • 24.
    Major Research Problem Closedtype vocabulary, i.e. limited to 1000 types. 24
  • 25.
  • 26.
  • 27.
    Re-implementation of TypeWriterwith new dataset 27 ~27% higher ~7% higher
  • 28.
    Our Current Approach Improvedavailable type extractor 28 [AbstactResolver, ClientConnectionError, ClientHttpProxyError, …] 1 1 0 0Python dataset Visible Type hints extractor Type mask vector● Lightweight static analysis with importlab and LibCST
  • 29.
    Our Current Approach Future ●Refinements to the search part and/or the neural model ● Performing extensive experiments to show the effectiveness of the approach ● Writing a paper draft by the end of June. 29
  • 30.
  • 31.
    References 1. Hellendoorn, V.J., Bird, C., Barr, E. T., & Allamanis, M. (2018, October). Deep learning type inference. In Proceedings of the 2018 26th acm joint meeting on european software engineering conference and symposium on the foundations of software engineering (pp. 152-162). 2. Malik, R. S., Patra, J., & Pradel, M. (2019, May). NL2Type: inferring JavaScript function types from natural language information. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE) (pp. 304-315). IEEE. 3. Pradel, M., Gousios, G., Liu, J., & Chandra, S. (2019). TypeWriter: Neural Type Prediction with Search-based Validation. arXiv preprint arXiv:1912.03768. 4. Wei, J., Goyal, M., Durrett, G., & Dillig, I. (2020). LambdaNet: Probabilistic Type Inference using Graph Neural Networks. ICLR 2020. 5. Gage, P. (1994). A new algorithm for data compression. C Users Journal, 12(2), 23-38. 6. Sennrich, R., Haddow, B., & Birch, A. (2015). Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909. 31