Ontology Type Inference
Charles Chen, joint work with Werner Dietl, Mier Ta, and Jason Li
University of Waterloo
Annotated result
for corpus
.jaif file
class CylinderShape:
method
getLocalBounds()
Method.parameter
1:
@ontology.qual.Ont
ology(values={ontolo
gy.qual.OntologyValu
e.POSITION_3D})
Our part in the big picture
• Ontology propagation in corpus
Type Inference
Corpus
Ontology -> annotation
Mapping file
Corpus Refined corpus
Outline
• How ontology type inference works so far
• Work flow overview
• Constraints graph generation
• Constraints encoding & solver solving
• Other progress
• Infrastructure improvement
• Backward dataflow analysis
• Next steps
Compiler
Constraint
variable
Introduction
Constraint
generation
Generic solver
interface
Jaif files
generation
Source
code
AST
Constraints
Annotated
AST Annotations
Real solver
(MAX SAT,
Lingeling,
LogicBlox)
Encoding Solutions
Ground
truths
Type Inference workflow
Meaning of Ontology types
• How to interpret @Ontology({SEQUENCE, FORCE})
• 1. possible info: means this is SEQ or FORCE, but not others
• 2. precise info: means this is SEQ and FORCE, possibly also others
• we want precise information!
SEQUENCE FORCE VELOCITY
1. possible info: means this is SEQ or FORCE, but not others
Ontology type lattice
• Power set lattice: 2^n values for n ontologies
• TOP is empty
• A non-top value is precise ontology(ies)
• (e.g. must be SEQ and VEL)
• BOTTOM is conjunction of every values
• (must be SEQ and POS and VEL ...)
TOP
POSSEQ VEL
SEQ,POS SEQ,VEL POS,VEL
SEQ,POS,VEL
...
Two challenges
• Only look up declaration is not enough
• How to generate context-sensitive constraint variables?
• TOP is always a valid answer, but not useful
• How to give less-conservative answer while keeping type safety?
TOP
POSSEQ VEL
SEQ,POS SEQ,VEL POS,VEL
SEQ,POS,VEL
...
How to generate context-sensitive
constraint variables?
Requires context-sensitive analysis of method calls!
F
T
1
3
4
2
subtype supertype constant variable
Ground Truth
F
3
1
T
4
2
subtype supertype constant variable
How to give less-conservative answer while
keeping type safety?
Preference constraints generation
Compiler
Constraint
variable
Introduction
Constraint
generation
Generic solver
interface
Jaif files
generation
Source
code
AST
Constraints
Annotated
AST Annotations
Real solver
(MAX SAT,
Lingeling,
LogicBlox)
Encoding Solutions
Ground
truths
2 type lattice & preference to bottom
TOP
POSSEQ VEL
SEQ,POS SEQ,VEL POS,VEL
SEQ,POS,VEL
...
less-conservative
answer
merge solutions by LUB
VEL
POS
for each supertype in subtype constraints
add preference to be SEQ
preference constraints
SEQ
VAR
1
VAR
2
VAR
3
VAR
5
VAR
4
VAR
6
SEQ reachable
constraints graph
TOP
SEQ
current lattice
SEQ
two type lattices and
corresponding constraints graph
SEQ
VAR
1
VAR
2
VAR
3
VAR
5
VAR
4
VAR
6
SEQ reachable constraints graph
weighted
MAX SAT
solver
encoding
TOP
POSSEQ VEL
SEQ,POS SEQ,VEL POS,VEL
SEQ,POS,VEL
...
for each supertype in subtype constraints
add preference to be SEQ
preference constraints
SEQ
Current lattice
TOP
Other progress
• Infrastructure improvement
• Support more language features
• Make CF and CFI works on bigger and bigger projects!
project name inferred annotations code lines of project
java-callgraph 45 726
logback-extensions 548 1308
cal10n 921 5781
jReactPhysics3D 5347 15502
ode4j 25286 91683
statistic data of dataflow type system on real projects
<entry>
str = …str = …
<exit>
else then
File file = new File(str);
D
Other progress
• Backward dataflow analysis
• Extends dataflow framework in CF
• Now can do both forward and backward analysis
• Support more analyses kinds: e.g. live variable analysis
What happened since May
• New features
• Context-sensitive type inference
• Customize preference on certain solutions
• Backward dataflow analysis
• Backend improvement
• New solver backend – Lingeling
• Constraint graph improvement – parallel solving
• Infrastructure improvement
• Checker Framework
• Checker Framework Inference
Next steps
• Design inter-dependent ontology lattice
• e.g. SEQUENCE and SEQ_LENGTH
• Viewpoint adaptation
• Formalize and infer context-sensitive methods
• Improve LogicBlox solver
• Support of preference constraints
• Scale-up of supported language features and program sizes
Thank you!

short_talk

  • 1.
    Ontology Type Inference CharlesChen, joint work with Werner Dietl, Mier Ta, and Jason Li University of Waterloo
  • 2.
    Annotated result for corpus .jaiffile class CylinderShape: method getLocalBounds() Method.parameter 1: @ontology.qual.Ont ology(values={ontolo gy.qual.OntologyValu e.POSITION_3D}) Our part in the big picture • Ontology propagation in corpus Type Inference Corpus Ontology -> annotation Mapping file Corpus Refined corpus
  • 3.
    Outline • How ontologytype inference works so far • Work flow overview • Constraints graph generation • Constraints encoding & solver solving • Other progress • Infrastructure improvement • Backward dataflow analysis • Next steps
  • 4.
    Compiler Constraint variable Introduction Constraint generation Generic solver interface Jaif files generation Source code AST Constraints Annotated ASTAnnotations Real solver (MAX SAT, Lingeling, LogicBlox) Encoding Solutions Ground truths Type Inference workflow
  • 5.
    Meaning of Ontologytypes • How to interpret @Ontology({SEQUENCE, FORCE}) • 1. possible info: means this is SEQ or FORCE, but not others • 2. precise info: means this is SEQ and FORCE, possibly also others • we want precise information! SEQUENCE FORCE VELOCITY 1. possible info: means this is SEQ or FORCE, but not others
  • 6.
    Ontology type lattice •Power set lattice: 2^n values for n ontologies • TOP is empty • A non-top value is precise ontology(ies) • (e.g. must be SEQ and VEL) • BOTTOM is conjunction of every values • (must be SEQ and POS and VEL ...) TOP POSSEQ VEL SEQ,POS SEQ,VEL POS,VEL SEQ,POS,VEL ...
  • 7.
    Two challenges • Onlylook up declaration is not enough • How to generate context-sensitive constraint variables? • TOP is always a valid answer, but not useful • How to give less-conservative answer while keeping type safety? TOP POSSEQ VEL SEQ,POS SEQ,VEL POS,VEL SEQ,POS,VEL ...
  • 8.
    How to generatecontext-sensitive constraint variables?
  • 9.
    Requires context-sensitive analysisof method calls! F T 1 3 4 2 subtype supertype constant variable Ground Truth
  • 10.
  • 11.
    How to giveless-conservative answer while keeping type safety?
  • 12.
    Preference constraints generation Compiler Constraint variable Introduction Constraint generation Genericsolver interface Jaif files generation Source code AST Constraints Annotated AST Annotations Real solver (MAX SAT, Lingeling, LogicBlox) Encoding Solutions Ground truths
  • 13.
    2 type lattice& preference to bottom TOP POSSEQ VEL SEQ,POS SEQ,VEL POS,VEL SEQ,POS,VEL ... less-conservative answer merge solutions by LUB VEL POS for each supertype in subtype constraints add preference to be SEQ preference constraints SEQ VAR 1 VAR 2 VAR 3 VAR 5 VAR 4 VAR 6 SEQ reachable constraints graph TOP SEQ current lattice SEQ two type lattices and corresponding constraints graph SEQ VAR 1 VAR 2 VAR 3 VAR 5 VAR 4 VAR 6 SEQ reachable constraints graph weighted MAX SAT solver encoding TOP POSSEQ VEL SEQ,POS SEQ,VEL POS,VEL SEQ,POS,VEL ... for each supertype in subtype constraints add preference to be SEQ preference constraints SEQ Current lattice TOP
  • 14.
    Other progress • Infrastructureimprovement • Support more language features • Make CF and CFI works on bigger and bigger projects! project name inferred annotations code lines of project java-callgraph 45 726 logback-extensions 548 1308 cal10n 921 5781 jReactPhysics3D 5347 15502 ode4j 25286 91683 statistic data of dataflow type system on real projects
  • 15.
    <entry> str = …str= … <exit> else then File file = new File(str); D Other progress • Backward dataflow analysis • Extends dataflow framework in CF • Now can do both forward and backward analysis • Support more analyses kinds: e.g. live variable analysis
  • 16.
    What happened sinceMay • New features • Context-sensitive type inference • Customize preference on certain solutions • Backward dataflow analysis • Backend improvement • New solver backend – Lingeling • Constraint graph improvement – parallel solving • Infrastructure improvement • Checker Framework • Checker Framework Inference
  • 17.
    Next steps • Designinter-dependent ontology lattice • e.g. SEQUENCE and SEQ_LENGTH • Viewpoint adaptation • Formalize and infer context-sensitive methods • Improve LogicBlox solver • Support of preference constraints • Scale-up of supported language features and program sizes
  • 18.