SlideShare a Scribd company logo
1 of 18
Ontology Type Inference
Charles Chen, joint work with Werner Dietl, Mier Ta, and Jason Li
University of Waterloo
Annotated result
for corpus
.jaif file
class CylinderShape:
method
getLocalBounds()
Method.parameter
1:
@ontology.qual.Ont
ology(values={ontolo
gy.qual.OntologyValu
e.POSITION_3D})
Our part in the big picture
• Ontology propagation in corpus
Type Inference
Corpus
Ontology -> annotation
Mapping file
Corpus Refined corpus
Outline
• How ontology type inference works so far
• Work flow overview
• Constraints graph generation
• Constraints encoding & solver solving
• Other progress
• Infrastructure improvement
• Backward dataflow analysis
• Next steps
Compiler
Constraint
variable
Introduction
Constraint
generation
Generic solver
interface
Jaif files
generation
Source
code
AST
Constraints
Annotated
AST Annotations
Real solver
(MAX SAT,
Lingeling,
LogicBlox)
Encoding Solutions
Ground
truths
Type Inference workflow
Meaning of Ontology types
• How to interpret @Ontology({SEQUENCE, FORCE})
• 1. possible info: means this is SEQ or FORCE, but not others
• 2. precise info: means this is SEQ and FORCE, possibly also others
• we want precise information!
SEQUENCE FORCE VELOCITY
1. possible info: means this is SEQ or FORCE, but not others
Ontology type lattice
• Power set lattice: 2^n values for n ontologies
• TOP is empty
• A non-top value is precise ontology(ies)
• (e.g. must be SEQ and VEL)
• BOTTOM is conjunction of every values
• (must be SEQ and POS and VEL ...)
TOP
POSSEQ VEL
SEQ,POS SEQ,VEL POS,VEL
SEQ,POS,VEL
...
Two challenges
• Only look up declaration is not enough
• How to generate context-sensitive constraint variables?
• TOP is always a valid answer, but not useful
• How to give less-conservative answer while keeping type safety?
TOP
POSSEQ VEL
SEQ,POS SEQ,VEL POS,VEL
SEQ,POS,VEL
...
How to generate context-sensitive
constraint variables?
Requires context-sensitive analysis of method calls!
F
T
1
3
4
2
subtype supertype constant variable
Ground Truth
F
3
1
T
4
2
subtype supertype constant variable
How to give less-conservative answer while
keeping type safety?
Preference constraints generation
Compiler
Constraint
variable
Introduction
Constraint
generation
Generic solver
interface
Jaif files
generation
Source
code
AST
Constraints
Annotated
AST Annotations
Real solver
(MAX SAT,
Lingeling,
LogicBlox)
Encoding Solutions
Ground
truths
2 type lattice & preference to bottom
TOP
POSSEQ VEL
SEQ,POS SEQ,VEL POS,VEL
SEQ,POS,VEL
...
less-conservative
answer
merge solutions by LUB
VEL
POS
for each supertype in subtype constraints
add preference to be SEQ
preference constraints
SEQ
VAR
1
VAR
2
VAR
3
VAR
5
VAR
4
VAR
6
SEQ reachable
constraints graph
TOP
SEQ
current lattice
SEQ
two type lattices and
corresponding constraints graph
SEQ
VAR
1
VAR
2
VAR
3
VAR
5
VAR
4
VAR
6
SEQ reachable constraints graph
weighted
MAX SAT
solver
encoding
TOP
POSSEQ VEL
SEQ,POS SEQ,VEL POS,VEL
SEQ,POS,VEL
...
for each supertype in subtype constraints
add preference to be SEQ
preference constraints
SEQ
Current lattice
TOP
Other progress
• Infrastructure improvement
• Support more language features
• Make CF and CFI works on bigger and bigger projects!
project name inferred annotations code lines of project
java-callgraph 45 726
logback-extensions 548 1308
cal10n 921 5781
jReactPhysics3D 5347 15502
ode4j 25286 91683
statistic data of dataflow type system on real projects
<entry>
str = …str = …
<exit>
else then
File file = new File(str);
D
Other progress
• Backward dataflow analysis
• Extends dataflow framework in CF
• Now can do both forward and backward analysis
• Support more analyses kinds: e.g. live variable analysis
What happened since May
• New features
• Context-sensitive type inference
• Customize preference on certain solutions
• Backward dataflow analysis
• Backend improvement
• New solver backend – Lingeling
• Constraint graph improvement – parallel solving
• Infrastructure improvement
• Checker Framework
• Checker Framework Inference
Next steps
• Design inter-dependent ontology lattice
• e.g. SEQUENCE and SEQ_LENGTH
• Viewpoint adaptation
• Formalize and infer context-sensitive methods
• Improve LogicBlox solver
• Support of preference constraints
• Scale-up of supported language features and program sizes
Thank you!

More Related Content

What's hot

10. Introduction to Datastructure
10. Introduction to Datastructure10. Introduction to Datastructure
10. Introduction to DatastructureNilesh Dalvi
 
5. Inheritances, Packages and Intefaces
5. Inheritances, Packages and Intefaces5. Inheritances, Packages and Intefaces
5. Inheritances, Packages and IntefacesNilesh Dalvi
 
1. Overview of Java
1. Overview of Java1. Overview of Java
1. Overview of JavaNilesh Dalvi
 
Reference Scope Identification in Citing Sentences
Reference Scope Identification in Citing SentencesReference Scope Identification in Citing Sentences
Reference Scope Identification in Citing SentencesAkihiro Kameda
 
Future Programming Language
Future Programming LanguageFuture Programming Language
Future Programming LanguageYLTO
 
Object Oriented Programming : Part 2
Object Oriented Programming : Part 2Object Oriented Programming : Part 2
Object Oriented Programming : Part 2Madhavan Malolan
 

What's hot (11)

NAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITIONNAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITION
 
10. Introduction to Datastructure
10. Introduction to Datastructure10. Introduction to Datastructure
10. Introduction to Datastructure
 
8. String
8. String8. String
8. String
 
5. Inheritances, Packages and Intefaces
5. Inheritances, Packages and Intefaces5. Inheritances, Packages and Intefaces
5. Inheritances, Packages and Intefaces
 
1. Overview of Java
1. Overview of Java1. Overview of Java
1. Overview of Java
 
Asp.net main
Asp.net mainAsp.net main
Asp.net main
 
Reference Scope Identification in Citing Sentences
Reference Scope Identification in Citing SentencesReference Scope Identification in Citing Sentences
Reference Scope Identification in Citing Sentences
 
Future Programming Language
Future Programming LanguageFuture Programming Language
Future Programming Language
 
Scala-Ls1
Scala-Ls1Scala-Ls1
Scala-Ls1
 
Java full stack1
Java full stack1Java full stack1
Java full stack1
 
Object Oriented Programming : Part 2
Object Oriented Programming : Part 2Object Oriented Programming : Part 2
Object Oriented Programming : Part 2
 

Similar to short_talk

Chapter09.ppt
Chapter09.pptChapter09.ppt
Chapter09.pptbutest
 
What Did They Do? Deriving High-Level Edit Histories in Wikis
What Did They Do? Deriving High-Level Edit Histories in WikisWhat Did They Do? Deriving High-Level Edit Histories in Wikis
What Did They Do? Deriving High-Level Edit Histories in WikisRobert Biuk-Aghai
 
Source-Level Proof Reconstruction for Interactive Proving
Source-Level Proof Reconstruction for Interactive ProvingSource-Level Proof Reconstruction for Interactive Proving
Source-Level Proof Reconstruction for Interactive ProvingLawrence Paulson
 
Prilimanary Concepts of VHDL by Dr.R.Prakash Rao
Prilimanary Concepts of VHDL by    Dr.R.Prakash RaoPrilimanary Concepts of VHDL by    Dr.R.Prakash Rao
Prilimanary Concepts of VHDL by Dr.R.Prakash Raorachurivlsi
 
Blazes: coordination analysis for distributed programs
Blazes: coordination analysis for distributed programsBlazes: coordination analysis for distributed programs
Blazes: coordination analysis for distributed programspalvaro
 
Fuel Up JavaScript with Functional Programming
Fuel Up JavaScript with Functional ProgrammingFuel Up JavaScript with Functional Programming
Fuel Up JavaScript with Functional ProgrammingShine Xavier
 
Java collections the force awakens
Java collections  the force awakensJava collections  the force awakens
Java collections the force awakensRichardWarburton
 
About "Apache Cassandra"
About "Apache Cassandra"About "Apache Cassandra"
About "Apache Cassandra"Jihyun Ahn
 
Computational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in RComputational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in Rherbps10
 
Introduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaIntroduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaopenseesdays
 
Compilers Are Databases
Compilers Are DatabasesCompilers Are Databases
Compilers Are DatabasesMartin Odersky
 
Sequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learningSequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learningRoberto Pereira Silveira
 
Introduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyIntroduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyRobert Viseur
 
The Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsThe Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsUniversity of Washington
 
JavaParser - A tool to generate, analyze and refactor Java code
JavaParser - A tool to generate, analyze and refactor Java codeJavaParser - A tool to generate, analyze and refactor Java code
JavaParser - A tool to generate, analyze and refactor Java codeFederico Tomassetti
 

Similar to short_talk (20)

Chapter09.ppt
Chapter09.pptChapter09.ppt
Chapter09.ppt
 
Collections forceawakens
Collections forceawakensCollections forceawakens
Collections forceawakens
 
Icsm07.ppt
Icsm07.pptIcsm07.ppt
Icsm07.ppt
 
What Did They Do? Deriving High-Level Edit Histories in Wikis
What Did They Do? Deriving High-Level Edit Histories in WikisWhat Did They Do? Deriving High-Level Edit Histories in Wikis
What Did They Do? Deriving High-Level Edit Histories in Wikis
 
Source-Level Proof Reconstruction for Interactive Proving
Source-Level Proof Reconstruction for Interactive ProvingSource-Level Proof Reconstruction for Interactive Proving
Source-Level Proof Reconstruction for Interactive Proving
 
Prilimanary Concepts of VHDL by Dr.R.Prakash Rao
Prilimanary Concepts of VHDL by    Dr.R.Prakash RaoPrilimanary Concepts of VHDL by    Dr.R.Prakash Rao
Prilimanary Concepts of VHDL by Dr.R.Prakash Rao
 
ICSM07.ppt
ICSM07.pptICSM07.ppt
ICSM07.ppt
 
Blazes: coordination analysis for distributed programs
Blazes: coordination analysis for distributed programsBlazes: coordination analysis for distributed programs
Blazes: coordination analysis for distributed programs
 
Fuel Up JavaScript with Functional Programming
Fuel Up JavaScript with Functional ProgrammingFuel Up JavaScript with Functional Programming
Fuel Up JavaScript with Functional Programming
 
Java collections the force awakens
Java collections  the force awakensJava collections  the force awakens
Java collections the force awakens
 
About "Apache Cassandra"
About "Apache Cassandra"About "Apache Cassandra"
About "Apache Cassandra"
 
Computational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in RComputational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in R
 
Introduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaIntroduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKenna
 
Compilers Are Databases
Compilers Are DatabasesCompilers Are Databases
Compilers Are Databases
 
BioWeka
BioWekaBioWeka
BioWeka
 
Sequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learningSequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learning
 
Introduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyIntroduction to libre « fulltext » technology
Introduction to libre « fulltext » technology
 
The Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsThe Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore Environments
 
R Basics
R BasicsR Basics
R Basics
 
JavaParser - A tool to generate, analyze and refactor Java code
JavaParser - A tool to generate, analyze and refactor Java codeJavaParser - A tool to generate, analyze and refactor Java code
JavaParser - A tool to generate, analyze and refactor Java code
 

short_talk

  • 1. Ontology Type Inference Charles Chen, joint work with Werner Dietl, Mier Ta, and Jason Li University of Waterloo
  • 2. Annotated result for corpus .jaif file class CylinderShape: method getLocalBounds() Method.parameter 1: @ontology.qual.Ont ology(values={ontolo gy.qual.OntologyValu e.POSITION_3D}) Our part in the big picture • Ontology propagation in corpus Type Inference Corpus Ontology -> annotation Mapping file Corpus Refined corpus
  • 3. Outline • How ontology type inference works so far • Work flow overview • Constraints graph generation • Constraints encoding & solver solving • Other progress • Infrastructure improvement • Backward dataflow analysis • Next steps
  • 4. Compiler Constraint variable Introduction Constraint generation Generic solver interface Jaif files generation Source code AST Constraints Annotated AST Annotations Real solver (MAX SAT, Lingeling, LogicBlox) Encoding Solutions Ground truths Type Inference workflow
  • 5. Meaning of Ontology types • How to interpret @Ontology({SEQUENCE, FORCE}) • 1. possible info: means this is SEQ or FORCE, but not others • 2. precise info: means this is SEQ and FORCE, possibly also others • we want precise information! SEQUENCE FORCE VELOCITY 1. possible info: means this is SEQ or FORCE, but not others
  • 6. Ontology type lattice • Power set lattice: 2^n values for n ontologies • TOP is empty • A non-top value is precise ontology(ies) • (e.g. must be SEQ and VEL) • BOTTOM is conjunction of every values • (must be SEQ and POS and VEL ...) TOP POSSEQ VEL SEQ,POS SEQ,VEL POS,VEL SEQ,POS,VEL ...
  • 7. Two challenges • Only look up declaration is not enough • How to generate context-sensitive constraint variables? • TOP is always a valid answer, but not useful • How to give less-conservative answer while keeping type safety? TOP POSSEQ VEL SEQ,POS SEQ,VEL POS,VEL SEQ,POS,VEL ...
  • 8. How to generate context-sensitive constraint variables?
  • 9. Requires context-sensitive analysis of method calls! F T 1 3 4 2 subtype supertype constant variable Ground Truth
  • 11. How to give less-conservative answer while keeping type safety?
  • 12. Preference constraints generation Compiler Constraint variable Introduction Constraint generation Generic solver interface Jaif files generation Source code AST Constraints Annotated AST Annotations Real solver (MAX SAT, Lingeling, LogicBlox) Encoding Solutions Ground truths
  • 13. 2 type lattice & preference to bottom TOP POSSEQ VEL SEQ,POS SEQ,VEL POS,VEL SEQ,POS,VEL ... less-conservative answer merge solutions by LUB VEL POS for each supertype in subtype constraints add preference to be SEQ preference constraints SEQ VAR 1 VAR 2 VAR 3 VAR 5 VAR 4 VAR 6 SEQ reachable constraints graph TOP SEQ current lattice SEQ two type lattices and corresponding constraints graph SEQ VAR 1 VAR 2 VAR 3 VAR 5 VAR 4 VAR 6 SEQ reachable constraints graph weighted MAX SAT solver encoding TOP POSSEQ VEL SEQ,POS SEQ,VEL POS,VEL SEQ,POS,VEL ... for each supertype in subtype constraints add preference to be SEQ preference constraints SEQ Current lattice TOP
  • 14. Other progress • Infrastructure improvement • Support more language features • Make CF and CFI works on bigger and bigger projects! project name inferred annotations code lines of project java-callgraph 45 726 logback-extensions 548 1308 cal10n 921 5781 jReactPhysics3D 5347 15502 ode4j 25286 91683 statistic data of dataflow type system on real projects
  • 15. <entry> str = …str = … <exit> else then File file = new File(str); D Other progress • Backward dataflow analysis • Extends dataflow framework in CF • Now can do both forward and backward analysis • Support more analyses kinds: e.g. live variable analysis
  • 16. What happened since May • New features • Context-sensitive type inference • Customize preference on certain solutions • Backward dataflow analysis • Backend improvement • New solver backend – Lingeling • Constraint graph improvement – parallel solving • Infrastructure improvement • Checker Framework • Checker Framework Inference
  • 17. Next steps • Design inter-dependent ontology lattice • e.g. SEQUENCE and SEQ_LENGTH • Viewpoint adaptation • Formalize and infer context-sensitive methods • Improve LogicBlox solver • Support of preference constraints • Scale-up of supported language features and program sizes