SlideShare a Scribd company logo
1 of 38
Structural Version Control
You know, it’s always safe to put “Towards” in the title
when you haven’t done much ;-)
Towards
1
Yin Wang
2
Q: What’s the best way to solve HARD problems?
A: Don’t solve them. Make them DISAPPEAR.
This often just requires a slight change of DESIGN.
Introducing “Structural Programming”
3
Disambiguate:
Structural Programming
not Structured Programming
The idea has been decades old
Lambda calculus is even older
“What goes around comes around”
Outline
Structural Editing (other people’s work)
Structural Comparison (my work)
Structural Version Control (vaporware)
4
Programs are data structures
 Usually called “parse tree” or “AST” (abstract syntax tree)
5
Data structures are usually
encoded as text
function factorial(n) {
if (n == 0) {
return 1;
}
return n * factorial(n - 1);
}
The encoding scheme is called syntax
6
keywords,
delimeters
Parsers
 A parser is a decoder from text to data structures
 Parsers are tricky to write and hard to debug
We need parsers because we encode
programs into text!
7
Why text?
 Write programs that do one thing and do it well
 Write programs to work together
 Write programs to handle text streams, because that is a
universal interface
A universal interface =/= THE universal interface
8
Text is an inconvenient universal
interface
 Data has different types: String, Int, records, functions, …
 Text is just one type: String
 Why should we encode all other types into strings?
9
Programming without syntax
(demo: Kirill Osenkov’s editor prototype)
10
See also:
• MPS (JetBrains)
• Intentional Software
• Software Factories (Microsoft)
• paredit-mode (Emacs)
Potentials of Structural Editing
 Easily extensible to ALL programming languages
 Semantics-aware context help (limit number of choices)
 Unable to write ill-formed / ill-typed programs
 Efficient transformations and refactorizations
 Pictures, math formulas together with programs
 Incremental compilation at fine granularity
 Version control at fine granularity
11
New problems
 How do we display code in emails?
 Need to standardize a data format for parse trees
 Easy. We have been making standards all the time: ASCII,
Unicode, JPEG …
 How do we do version control?
 No more text means no more “lines”
 … means most VC tools will stop working!
12
Outline
Structural Editing (other people’s work)
Structural Comparison (my work)
Structural Version Control (vaporware)
13
ydiff: Structural Diff
 Language-aware
 Refactor-aware
 Format-insensitive
 Comprehensible output
 Open-source
Demo
14
http://github.com/yinwang0/ydiff
Ingredients
 Structural comparison
algorithms
 Generalized parse tree
format
 Home-made parser
combinator library
 Experimental parsers for
JavaScript, C++, Scheme, …
15
Parsec.ss: Parser Combinator
Library in Scheme
 Modeled similar to Parsec.hs
 Macros make parsers look like BNF grammars (“DSL”)
 Left-recursion detection (direct / indirect)
16
Left-recursion Detection
trace
problem
token
17
Generalized Parse Tree Format
18
Parsers Built
 C++ (596 lines, incomplete, most of C++)
 JavaScript (464 lines, complete, may still contain bugs)
 (Scheme )
19
Key Algorithms
 Tree Editing Distance (TED)
 Move Detection
 Substructure Extraction
20
Tree Editing Distance
1
2 3
4
1
2 3
45
delete “4”?
insert “5”?
modify “4”
into “5”?
Minimize the
number of
changes that
make the two
trees equal
All three
cases are
equally
possible
Node -> Node –> [Change]
21
cost = 3
cost = 2
cost = 1
Types of Changes
 Deletion
 Insertion
 Modification
 Move
 Reparent (aka “refactoring”)
TED can handle
Observation: allowing modification generates incomprehensible results
22
Tree Editing Distance with
Recursion
diff-node diff-list
mutual recursion
Compare
two nodes
Compare
components of
the two nodes
23
diff-node :: Node -> Node –> [Change]
substructure
extraction from the
changes
base
cases
dispatch on
node types
memoization
only compare nodes
of the same type
compare
subnodes
24
diff-list :: [Node] -> [Node] –> [Change]
Otherwise, two
choices:
delete head1
or
insert head2
compare head
nodes
shortcut: same
definition or
unchanged
pick the
branch with
lower cost
25
Move Detection
 Some moved node can be detected by simple pairwise
comparison between DELETED and INSERTED change
sets.
normal diff (Git) ydiff
26
Substructure Extraction
frame: keep as a
new change for
further extractions
27
Outline
Structural Editing (other people’s work)
Structural Comparison (my work)
Structural Version Control (vaporware)
28
Merging in text-based version control 29
1 apples
2 bananas
3 cookies
4 rice
1 apples
2 bananas
3 beer
4 cookies
5 rice
1 apples
2 bananas
3 cookies
4 pasta
5 rice
Insert “pasta”
on line 4
Insert “beer”
on line 3
1 apples
2 bananas
3 beer
4 cookies
5 pasta
6 rice
Insert “pasta”
on line 5
Insert “beer”
on line 3
 Modifying a line of text
changes the line number of
consequent lines
 Patch that says “insert
pasta to line 4” must
relocate to line 5
  Patch Theory (Darcs)
Prediction 1: merging will no longer be
a problem in Structural Version Control
30
Modifying Different Nodes
31
1
2 3
4
Each node has a
GUID
Insert node #323F
containing “5” in node
#2EA4
Insert node #3248
containing “6” in node
#5B40
#2EA4 #5B40
#3224
#2048
1
2 3
45
#323F
#5B40
#3224
#2EA4
#2048
1
2 3
46
#3248
#5B40
#3224
#2EA4
#2048
1
2 3
465
#5B40
#3224
#2EA4
#2048
#3248#323F
Insert node
#323F containing
“5” in node #2EA4
Insert node #3248
containing “6” in
node #5B40
Modifying The Same Node
32
6
1
2 3
4 5
0.2 0.8
Insert node #2048
containing “6” in
node #5B40, at
position 0.5
0.5
7
1
2 3
4 5
0.2 0.8
Insert node #2056
containing “7” in
node #5B40, at
position 0.1
0.1
7
1
2 3
4 6 5
Insert node #2048 containing “6”
in node #5B40, at position 0.5
Insert node #2056 containing “7”
in node #5B40, at position 0.1
0.1 0.2 0.5 0.8
Because the real line can be infinitely divided, we
can always sort the numbers into relative positions!
100%
conflict-free
merging!!
What’s wrong?
33
0.2 x = 1
0.8 print y
0.2 x = 1
0.5 y = 2
0.8 print y
0.2 x = 1
0.5 blah
0.65 y = 3
0.8 print y
0.2 x = 1
0.5 y = 2
0.65 y = 3
0.8 print y
All line-based VC
tools have this
behavior. Try it!
Merge succeed,
but bugs introduced!
Modifying The Same Node
(a more sensible way)
34
6
1
2 3
4 5
Insert node #2048
containing “6” in
node #5B40,
between #31FE and
#3208
7
1
2 3
4 5
Insert node #2056
containing “7” in
node #5B40, before
#31FE
7
1
2 3
4 6 5
Insert node #2048 containing “6”
in node #5B40, between #31FE
and #3208
Insert node #2056 containing “7”
in node #5B40, before #31FE
Modifying The Same Node (again) 35
6
1
2 3
4 5
Insert node #2048
containing “6” in
node #5B40,
between #31FE and
#3208
7
1
2 3
4 5
Insert node #2056
containing “7” in
node #5B40,
between #31FE and
#3208
7
1
2 3
4 6 5
Insert node #2048 containing “6”
in node #5B40, between #31FE
and #3208
Insert node #2056 containing “7”
in node #5B40, between #31FE
and #3208
Conflict!
Some observations into text-base
VC tools
 Grounds are where programs sit on.
 Merging is hard because simultaneous edits change the
grounds in different ways, but text-based VC tools don’t have
a handle on them.
 This is why Darcs uses Patch Theory, which gives us limited
power for reasoning about the grounds.
 Git uses hash values to locate the grounds, but has larger
granularity. Also, hash values have dependency on the
contents.
 Once we have true handles on the grounds, the problem
disappears.
36
What’s next?
 Other scenarios
 HOW MUCH and WHAT context to include in the patches?
 A descriptive language for patches, and a constraint solver
for merging them?
 A database-like transaction system for parse tree structures?
 Let the structural editor construct the change sets?
 Generalize structural programming to natural languages?
37
Discussions
38

More Related Content

Similar to Towards Structural Version Control

The First C# Project Analyzed
The First C# Project AnalyzedThe First C# Project Analyzed
The First C# Project AnalyzedPVS-Studio
 
Core .NET Framework 4.0 Enhancements
Core .NET Framework 4.0 EnhancementsCore .NET Framework 4.0 Enhancements
Core .NET Framework 4.0 EnhancementsRobert MacLean
 
The pragmatic programmer
The pragmatic programmerThe pragmatic programmer
The pragmatic programmerLeylimYaln
 
MSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for Developers
MSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for DevelopersMSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for Developers
MSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for DevelopersDave Bost
 
SQL Saturday 28 - .NET Fundamentals
SQL Saturday 28 - .NET FundamentalsSQL Saturday 28 - .NET Fundamentals
SQL Saturday 28 - .NET Fundamentalsmikehuguet
 
Spss tutorial 1
Spss tutorial 1Spss tutorial 1
Spss tutorial 1debataraja
 
Interactive Browsing and Navigation in Relational Databases
Interactive Browsing and Navigation in Relational DatabasesInteractive Browsing and Navigation in Relational Databases
Interactive Browsing and Navigation in Relational DatabasesMinsuk Kahng
 
Relational Database CI/CD
Relational Database CI/CDRelational Database CI/CD
Relational Database CI/CDJasmin Fluri
 
Azure DevOps for Developers
Azure DevOps for DevelopersAzure DevOps for Developers
Azure DevOps for DevelopersSarah Dutkiewicz
 
Donut chart in Revit with Dynamo
Donut chart in Revit with DynamoDonut chart in Revit with Dynamo
Donut chart in Revit with DynamoWojciech Klepacki
 
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015Windows Developer
 
A Dictionary Of Vb .Net
A Dictionary Of Vb .NetA Dictionary Of Vb .Net
A Dictionary Of Vb .NetLiquidHub
 
Simple SQL Change Management with Sqitch
Simple SQL Change Management with SqitchSimple SQL Change Management with Sqitch
Simple SQL Change Management with SqitchDavid Wheeler
 
Handling Database Deployments
Handling Database DeploymentsHandling Database Deployments
Handling Database DeploymentsMike Willbanks
 
Optimizing Application Architecture (.NET/Java topics)
Optimizing Application Architecture (.NET/Java topics)Optimizing Application Architecture (.NET/Java topics)
Optimizing Application Architecture (.NET/Java topics)Ravi Okade
 

Similar to Towards Structural Version Control (20)

The First C# Project Analyzed
The First C# Project AnalyzedThe First C# Project Analyzed
The First C# Project Analyzed
 
Core .NET Framework 4.0 Enhancements
Core .NET Framework 4.0 EnhancementsCore .NET Framework 4.0 Enhancements
Core .NET Framework 4.0 Enhancements
 
The pragmatic programmer
The pragmatic programmerThe pragmatic programmer
The pragmatic programmer
 
Tech talks#6: Code Refactoring
Tech talks#6: Code RefactoringTech talks#6: Code Refactoring
Tech talks#6: Code Refactoring
 
Lecture 6 & 7
Lecture 6 & 7Lecture 6 & 7
Lecture 6 & 7
 
MSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for Developers
MSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for DevelopersMSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for Developers
MSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for Developers
 
SQL Saturday 28 - .NET Fundamentals
SQL Saturday 28 - .NET FundamentalsSQL Saturday 28 - .NET Fundamentals
SQL Saturday 28 - .NET Fundamentals
 
Spss tutorial 1
Spss tutorial 1Spss tutorial 1
Spss tutorial 1
 
Spss tutorial 1
Spss tutorial 1Spss tutorial 1
Spss tutorial 1
 
Interactive Browsing and Navigation in Relational Databases
Interactive Browsing and Navigation in Relational DatabasesInteractive Browsing and Navigation in Relational Databases
Interactive Browsing and Navigation in Relational Databases
 
Relational Database CI/CD
Relational Database CI/CDRelational Database CI/CD
Relational Database CI/CD
 
Azure DevOps for Developers
Azure DevOps for DevelopersAzure DevOps for Developers
Azure DevOps for Developers
 
Donut chart in Revit with Dynamo
Donut chart in Revit with DynamoDonut chart in Revit with Dynamo
Donut chart in Revit with Dynamo
 
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
 
A Dictionary Of Vb .Net
A Dictionary Of Vb .NetA Dictionary Of Vb .Net
A Dictionary Of Vb .Net
 
Simple SQL Change Management with Sqitch
Simple SQL Change Management with SqitchSimple SQL Change Management with Sqitch
Simple SQL Change Management with Sqitch
 
Handling Database Deployments
Handling Database DeploymentsHandling Database Deployments
Handling Database Deployments
 
Optimizing Application Architecture (.NET/Java topics)
Optimizing Application Architecture (.NET/Java topics)Optimizing Application Architecture (.NET/Java topics)
Optimizing Application Architecture (.NET/Java topics)
 
COBOL to Apache Spark
COBOL to Apache SparkCOBOL to Apache Spark
COBOL to Apache Spark
 
Ibm redbook
Ibm redbookIbm redbook
Ibm redbook
 

Recently uploaded

INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitterShivangiSharma879191
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Comparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization TechniquesComparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization Techniquesugginaramesh
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
An introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxAn introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxPurva Nikam
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 

Recently uploaded (20)

INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Comparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization TechniquesComparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization Techniques
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
An introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxAn introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptx
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 

Towards Structural Version Control

  • 1. Structural Version Control You know, it’s always safe to put “Towards” in the title when you haven’t done much ;-) Towards 1 Yin Wang
  • 2. 2 Q: What’s the best way to solve HARD problems? A: Don’t solve them. Make them DISAPPEAR. This often just requires a slight change of DESIGN.
  • 3. Introducing “Structural Programming” 3 Disambiguate: Structural Programming not Structured Programming The idea has been decades old Lambda calculus is even older “What goes around comes around”
  • 4. Outline Structural Editing (other people’s work) Structural Comparison (my work) Structural Version Control (vaporware) 4
  • 5. Programs are data structures  Usually called “parse tree” or “AST” (abstract syntax tree) 5
  • 6. Data structures are usually encoded as text function factorial(n) { if (n == 0) { return 1; } return n * factorial(n - 1); } The encoding scheme is called syntax 6 keywords, delimeters
  • 7. Parsers  A parser is a decoder from text to data structures  Parsers are tricky to write and hard to debug We need parsers because we encode programs into text! 7
  • 8. Why text?  Write programs that do one thing and do it well  Write programs to work together  Write programs to handle text streams, because that is a universal interface A universal interface =/= THE universal interface 8
  • 9. Text is an inconvenient universal interface  Data has different types: String, Int, records, functions, …  Text is just one type: String  Why should we encode all other types into strings? 9
  • 10. Programming without syntax (demo: Kirill Osenkov’s editor prototype) 10 See also: • MPS (JetBrains) • Intentional Software • Software Factories (Microsoft) • paredit-mode (Emacs)
  • 11. Potentials of Structural Editing  Easily extensible to ALL programming languages  Semantics-aware context help (limit number of choices)  Unable to write ill-formed / ill-typed programs  Efficient transformations and refactorizations  Pictures, math formulas together with programs  Incremental compilation at fine granularity  Version control at fine granularity 11
  • 12. New problems  How do we display code in emails?  Need to standardize a data format for parse trees  Easy. We have been making standards all the time: ASCII, Unicode, JPEG …  How do we do version control?  No more text means no more “lines”  … means most VC tools will stop working! 12
  • 13. Outline Structural Editing (other people’s work) Structural Comparison (my work) Structural Version Control (vaporware) 13
  • 14. ydiff: Structural Diff  Language-aware  Refactor-aware  Format-insensitive  Comprehensible output  Open-source Demo 14 http://github.com/yinwang0/ydiff
  • 15. Ingredients  Structural comparison algorithms  Generalized parse tree format  Home-made parser combinator library  Experimental parsers for JavaScript, C++, Scheme, … 15
  • 16. Parsec.ss: Parser Combinator Library in Scheme  Modeled similar to Parsec.hs  Macros make parsers look like BNF grammars (“DSL”)  Left-recursion detection (direct / indirect) 16
  • 19. Parsers Built  C++ (596 lines, incomplete, most of C++)  JavaScript (464 lines, complete, may still contain bugs)  (Scheme ) 19
  • 20. Key Algorithms  Tree Editing Distance (TED)  Move Detection  Substructure Extraction 20
  • 21. Tree Editing Distance 1 2 3 4 1 2 3 45 delete “4”? insert “5”? modify “4” into “5”? Minimize the number of changes that make the two trees equal All three cases are equally possible Node -> Node –> [Change] 21 cost = 3 cost = 2 cost = 1
  • 22. Types of Changes  Deletion  Insertion  Modification  Move  Reparent (aka “refactoring”) TED can handle Observation: allowing modification generates incomprehensible results 22
  • 23. Tree Editing Distance with Recursion diff-node diff-list mutual recursion Compare two nodes Compare components of the two nodes 23
  • 24. diff-node :: Node -> Node –> [Change] substructure extraction from the changes base cases dispatch on node types memoization only compare nodes of the same type compare subnodes 24
  • 25. diff-list :: [Node] -> [Node] –> [Change] Otherwise, two choices: delete head1 or insert head2 compare head nodes shortcut: same definition or unchanged pick the branch with lower cost 25
  • 26. Move Detection  Some moved node can be detected by simple pairwise comparison between DELETED and INSERTED change sets. normal diff (Git) ydiff 26
  • 27. Substructure Extraction frame: keep as a new change for further extractions 27
  • 28. Outline Structural Editing (other people’s work) Structural Comparison (my work) Structural Version Control (vaporware) 28
  • 29. Merging in text-based version control 29 1 apples 2 bananas 3 cookies 4 rice 1 apples 2 bananas 3 beer 4 cookies 5 rice 1 apples 2 bananas 3 cookies 4 pasta 5 rice Insert “pasta” on line 4 Insert “beer” on line 3 1 apples 2 bananas 3 beer 4 cookies 5 pasta 6 rice Insert “pasta” on line 5 Insert “beer” on line 3  Modifying a line of text changes the line number of consequent lines  Patch that says “insert pasta to line 4” must relocate to line 5   Patch Theory (Darcs)
  • 30. Prediction 1: merging will no longer be a problem in Structural Version Control 30
  • 31. Modifying Different Nodes 31 1 2 3 4 Each node has a GUID Insert node #323F containing “5” in node #2EA4 Insert node #3248 containing “6” in node #5B40 #2EA4 #5B40 #3224 #2048 1 2 3 45 #323F #5B40 #3224 #2EA4 #2048 1 2 3 46 #3248 #5B40 #3224 #2EA4 #2048 1 2 3 465 #5B40 #3224 #2EA4 #2048 #3248#323F Insert node #323F containing “5” in node #2EA4 Insert node #3248 containing “6” in node #5B40
  • 32. Modifying The Same Node 32 6 1 2 3 4 5 0.2 0.8 Insert node #2048 containing “6” in node #5B40, at position 0.5 0.5 7 1 2 3 4 5 0.2 0.8 Insert node #2056 containing “7” in node #5B40, at position 0.1 0.1 7 1 2 3 4 6 5 Insert node #2048 containing “6” in node #5B40, at position 0.5 Insert node #2056 containing “7” in node #5B40, at position 0.1 0.1 0.2 0.5 0.8 Because the real line can be infinitely divided, we can always sort the numbers into relative positions! 100% conflict-free merging!!
  • 33. What’s wrong? 33 0.2 x = 1 0.8 print y 0.2 x = 1 0.5 y = 2 0.8 print y 0.2 x = 1 0.5 blah 0.65 y = 3 0.8 print y 0.2 x = 1 0.5 y = 2 0.65 y = 3 0.8 print y All line-based VC tools have this behavior. Try it! Merge succeed, but bugs introduced!
  • 34. Modifying The Same Node (a more sensible way) 34 6 1 2 3 4 5 Insert node #2048 containing “6” in node #5B40, between #31FE and #3208 7 1 2 3 4 5 Insert node #2056 containing “7” in node #5B40, before #31FE 7 1 2 3 4 6 5 Insert node #2048 containing “6” in node #5B40, between #31FE and #3208 Insert node #2056 containing “7” in node #5B40, before #31FE
  • 35. Modifying The Same Node (again) 35 6 1 2 3 4 5 Insert node #2048 containing “6” in node #5B40, between #31FE and #3208 7 1 2 3 4 5 Insert node #2056 containing “7” in node #5B40, between #31FE and #3208 7 1 2 3 4 6 5 Insert node #2048 containing “6” in node #5B40, between #31FE and #3208 Insert node #2056 containing “7” in node #5B40, between #31FE and #3208 Conflict!
  • 36. Some observations into text-base VC tools  Grounds are where programs sit on.  Merging is hard because simultaneous edits change the grounds in different ways, but text-based VC tools don’t have a handle on them.  This is why Darcs uses Patch Theory, which gives us limited power for reasoning about the grounds.  Git uses hash values to locate the grounds, but has larger granularity. Also, hash values have dependency on the contents.  Once we have true handles on the grounds, the problem disappears. 36
  • 37. What’s next?  Other scenarios  HOW MUCH and WHAT context to include in the patches?  A descriptive language for patches, and a constraint solver for merging them?  A database-like transaction system for parse tree structures?  Let the structural editor construct the change sets?  Generalize structural programming to natural languages? 37