SlideShare a Scribd company logo
Relaxed Parsing of Regular Approximations of
String-Embedded Languages
Author: Ekaterina Verbitskaia
Saint Petersburg State University
JetBrains Programming Languages and Tools Lab
26/08/2015
Ekaterina Verbitskaia (SPbSU) 26/08/2015 1 / 27
String-embedded code
Embedded SQL
SqlCommand myCommand = new SqlCommand(
"SELECT * FROM table WHERE Column = @Param2",
myConnection);
myCommand.Parameters.Add(myParam2);
Dynamic SQL
IF @X = @Y
SET @TBL = ’ #table1 ’
ELSE
SET @TBL = ’ table2 ’
SET @S = ’SELECT x FROM’ + @TBL + ’WHERE ISNULL(n,0) > 1’
EXECUTE (@S)
Ekaterina Verbitskaia (SPbSU) 26/08/2015 2 / 27
Problems
String-embedded code are expressions in some programming language
It may be necessary to support them in IDE: code highlighting,
autocomplete, refactorings
It may be necessary to transform them: migration of legacy software to
new platforms
It may be necessary to detect vulnerabilities in such code
Any other problems of programming languages can occur
Ekaterina Verbitskaia (SPbSU) 26/08/2015 3 / 27
Static analysis of string-embedded code
Performed without programm execution
Checks that the set of properties holds for each possible expression
value
Undecidable for string-embedded code in the general case
The set of possible expression values is over approximated and then
the approximation is analysed
Ekaterina Verbitskaia (SPbSU) 26/08/2015 4 / 27
Existing tools
PHP String Analyzer, Java String Analyzer, Alvor
Static analyzers for PHP, Java, and SQL embedded into Java
respectively
Kyung-Goo Doh et al.
Checks syntactical correctness of embedded code
PHPStorm
IDE for PHP with support of HTML, CSS, JavaScript
IntelliLang
PHPStorm and IDEA plugin, supports various languages
STRANGER
Vulnerability detection of PHP
Flaws
Limited functionality
Hard to extend them with new features or support new languages
Do not create structural representation of code
Ekaterina Verbitskaia (SPbSU) 26/08/2015 5 / 27
Static analysis of string-embedded code: the scheme
Identification of hotspots: points of interest, where the analysis is
desirable
Approximation construction
Lexical analysis
Syntactic analysis
Semantic analysis
Ekaterina Verbitskaia (SPbSU) 26/08/2015 6 / 27
Static analysis of string-embedded code: the scheme
Code: hotspot is marked string res = "";
for(i = 0; i < l; i++)
res = "()" + res;
use(res);
Possible values {"", "()", "()()", ..., "()"^l}
Regular approximation ("()")*
Approximation
Ekaterina Verbitskaia (SPbSU) 26/08/2015 7 / 27
Static analysis of string-embedded code: the scheme
Approximation
After lexing
Grammar
start ::= s
s ::= LBR s RBR s
s ::= 𝜖
Parse forest
start_rule
prod 2
s !
prod 0 prod 0
LBR s RBR
eps
LBR s RBR
eps
Ekaterina Verbitskaia (SPbSU) 26/08/2015 8 / 27
Problem statement
The aim is to develop the algorithm suitable for syntactic analysis of
string-embedded code
Tasks:
Develop an algorithm for parsing of regular approximation of
embedded code which produce a finite parse forest
Parse forest should contain a parse tree for every correct (w.r.t.
reference grammar) string accepted by the input automaton
Incorrect strings should be omitted: no error detection
An algorithm should not depend on the language of the host program
and the language of embedded code
Ekaterina Verbitskaia (SPbSU) 26/08/2015 9 / 27
Algorithm
Input: reference DCF grammar G and DFA graph with no
𝜖-transitions over the alphabeth of terminals of G
Output: finite representation of the trees corresponding to all correct
string accepted by input automaton
Ekaterina Verbitskaia (SPbSU) 26/08/2015 10 / 27
Right-Nulled Generalized LR algorithm
RNGLR processes context free grammars
In case when LR conflicts occur, parses in each possible way
Shift/Reduce conflict
Reduce/Reduce conflict
Uses data structures which reduce memory consumption and
guarantee appropriate time of analysis
Ekaterina Verbitskaia (SPbSU) 26/08/2015 11 / 27
RNGLR data structures: Graph-Structured Stack
0
2
a
3
b
9
B
4
A
6
a
8a
10
A
5
a
8
a
a
11a
1
S
Ekaterina Verbitskaia (SPbSU) 26/08/2015 12 / 27
RNGLR operations: shift
0 2
a
3b
9
B
4
A
6
a
8
a
10
A
Ekaterina Verbitskaia (SPbSU) 26/08/2015 13 / 27
RNGLR operations: shift
0 2
a
3b
9
B
4
A
6
a
8
a
10
A
0 2
a
3b
9
B
4
A
6
a
8
a
10
A
5
a
8
a
a
11
a
Ekaterina Verbitskaia (SPbSU) 26/08/2015 13 / 27
RNGLR operations: reduce
0 2
a
3b
9
B
4
A
6
a
8
a
10
A
5
a
8
a
a
11
a
Ekaterina Verbitskaia (SPbSU) 26/08/2015 14 / 27
RNGLR operations: reduce
0 2
a
3b
9
B
4
A
6
a
8
a
10
A
5
a
8
a
a
11
a
0
2
a
3
b
9
B
4
A
6
a
8a
10
A
5
a
8
a
a
11a
1
S
Ekaterina Verbitskaia (SPbSU) 26/08/2015 14 / 27
RNGLR
Read the input sequentially
Process all reductions
Shift the next token
Each time new vertex is added to the GSS, shift is calculated
Each time new edge is created in the GSS, reductions are calculated
Ekaterina Verbitskaia (SPbSU) 26/08/2015 15 / 27
Algorithm
Traverse the automaton graph and sequentially construct GSS,
similarly as in RNGLR
New type of “conflict”: Shift/Shift
The set of LR-states is associated with each vertex of input graph
The order in which the vertices of input graph are traversed is
controled with a queue. Whenever new edge is added to GSS, its tail
vertex is enqueued
The algorithm implements relaxed parsing: errors are not detected,
erroneous strings are ignored
Ekaterina Verbitskaia (SPbSU) 26/08/2015 16 / 27
Cycles processing: initial stack
Ekaterina Verbitskaia (SPbSU) 26/08/2015 17 / 27
Cycles processing: some edges were added
5
1
0
2
4
3
...
Ekaterina Verbitskaia (SPbSU) 26/08/2015 18 / 27
Cycles processing: cycle closed
5
1
0
2
4
3
...
Ekaterina Verbitskaia (SPbSU) 26/08/2015 19 / 27
Cycles processing: new reduction
5
1
0
2
4
3
...
Ekaterina Verbitskaia (SPbSU) 26/08/2015 20 / 27
Cycles processing: new reduction added
5
1
0
2
4
3
...
Ekaterina Verbitskaia (SPbSU) 26/08/2015 21 / 27
Parse forest construction
Shared Packed Parse Forest — graph, in which all derivation trees are
merged
Constructed in the same manner as in the RNGLR-algorithm
Fragments of derivation trees are associated with GSS edges
Each time shift is processed, new tree of one terminal is created
Each time reduction is processed, new tree with children accumulated
along the paths is created
Children are not copied, they are reused
The root of resulting tree is associated with GSS edge, corresponding
to reduction to the starting nonterminal
Unreachable vertices are deleted from resulting graph
Ekaterina Verbitskaia (SPbSU) 26/08/2015 22 / 27
Algorithm: correctness
Correct tree — derivation tree of
some string accumulated along
the path in the input graph
n start
n s
t LBR n s t RBR n s
eps t LBR n s t RBR
eps
Ekaterina Verbitskaia (SPbSU) 26/08/2015 23 / 27
Algorithm: correctness
Theorem (Termination)
Algorithm terminates for any input
Theorem (Correctness)
Every tree, generated from SPPF, is correct
Theorem (Correctness)
For every path p in the inner graph, recognized w.r.t. reference grammar, a
correct tree corresponding to p can be generated from SPPF
Ekaterina Verbitskaia (SPbSU) 26/08/2015 24 / 27
Implementation
The algorithm is implemented as a part of YaccConstructor project
using F# programming language
The generator of RNGLR parse tables and data structures for GSS and
SPPF are reused
Ekaterina Verbitskaia (SPbSU) 26/08/2015 25 / 27
Evaluation
The data is taken from the project of migration from MS-SQL to
Oracle Server
2,7 lines of code, 2430 queries, 2188 successfully processed
The number of queries which previously could not be processed
because of timeout is decreased from 45 to 1
Ekaterina Verbitskaia (SPbSU) 26/08/2015 26 / 27
Conclusion
The algorithm for parsing of regular approximation of dynamically
generated string which constructs the finite representation of parse
forest is developped
Its termination and correctness are proved
The algorithm is implemented as a part of YaccConstructor project
The evaluation demonstrated it could be used for complex tasks
Ekaterina Verbitskaia (SPbSU) 26/08/2015 27 / 27

More Related Content

What's hot

Intro to p-spice
Intro to p-spiceIntro to p-spice
Intro to p-spice
sathiyavathisasikumar
 
Verified Subtyping with Traits and Mixins
Verified Subtyping with Traits and MixinsVerified Subtyping with Traits and Mixins
Verified Subtyping with Traits and Mixins
Asankhaya Sharma
 
Scilab Modelica conference 20150921
Scilab Modelica conference 20150921Scilab Modelica conference 20150921
Scilab Modelica conference 20150921
Scilab
 
Lecture6 syntax analysis_2
Lecture6 syntax analysis_2Lecture6 syntax analysis_2
Lecture6 syntax analysis_2
Mahesh Kumar Chelimilla
 
Convolution using Scilab
Convolution using ScilabConvolution using Scilab
Convolution using Scilab
sachin achari
 
Modelica Tutorial with PowerSystems: A tutorial for Modelica simulation
Modelica Tutorial with PowerSystems: A tutorial for Modelica simulationModelica Tutorial with PowerSystems: A tutorial for Modelica simulation
Modelica Tutorial with PowerSystems: A tutorial for Modelica simulation
智哉 今西
 
Hivemall meets Digdag @Hackertackle 2018-02-17
Hivemall meets Digdag @Hackertackle 2018-02-17Hivemall meets Digdag @Hackertackle 2018-02-17
Hivemall meets Digdag @Hackertackle 2018-02-17
Makoto Yui
 
stacks in algorithems and data structure
stacks in algorithems and data structurestacks in algorithems and data structure
stacks in algorithems and data structure
faran nawaz
 
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
Spark Summit
 
Aligning OCL and UML
Aligning OCL and UMLAligning OCL and UML
Aligning OCL and UML
Edward Willink
 

What's hot (10)

Intro to p-spice
Intro to p-spiceIntro to p-spice
Intro to p-spice
 
Verified Subtyping with Traits and Mixins
Verified Subtyping with Traits and MixinsVerified Subtyping with Traits and Mixins
Verified Subtyping with Traits and Mixins
 
Scilab Modelica conference 20150921
Scilab Modelica conference 20150921Scilab Modelica conference 20150921
Scilab Modelica conference 20150921
 
Lecture6 syntax analysis_2
Lecture6 syntax analysis_2Lecture6 syntax analysis_2
Lecture6 syntax analysis_2
 
Convolution using Scilab
Convolution using ScilabConvolution using Scilab
Convolution using Scilab
 
Modelica Tutorial with PowerSystems: A tutorial for Modelica simulation
Modelica Tutorial with PowerSystems: A tutorial for Modelica simulationModelica Tutorial with PowerSystems: A tutorial for Modelica simulation
Modelica Tutorial with PowerSystems: A tutorial for Modelica simulation
 
Hivemall meets Digdag @Hackertackle 2018-02-17
Hivemall meets Digdag @Hackertackle 2018-02-17Hivemall meets Digdag @Hackertackle 2018-02-17
Hivemall meets Digdag @Hackertackle 2018-02-17
 
stacks in algorithems and data structure
stacks in algorithems and data structurestacks in algorithems and data structure
stacks in algorithems and data structure
 
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)
 
Aligning OCL and UML
Aligning OCL and UMLAligning OCL and UML
Aligning OCL and UML
 

Similar to Relaxed Parsing of Regular Approximations of String-Embedded Languages

An Efficient and Parallel Abstract Interpreter in Scala — First Algorithm
An Efficient and Parallel Abstract Interpreter in Scala — First AlgorithmAn Efficient and Parallel Abstract Interpreter in Scala — First Algorithm
An Efficient and Parallel Abstract Interpreter in Scala — First Algorithm
🌳 Olivier Pirson — OPi 🇧🇪🇫🇷🇬🇧 🐧 👨‍💻 👨‍🔬
 
COMPILER DESIGN
COMPILER DESIGNCOMPILER DESIGN
COMPILER DESIGN
Vetukurivenkatashiva
 
An Efficient and Parallel Abstract Interpreter in Scala — Second Presentation
An Efficient and Parallel Abstract Interpreter in Scala — Second PresentationAn Efficient and Parallel Abstract Interpreter in Scala — Second Presentation
An Efficient and Parallel Abstract Interpreter in Scala — Second Presentation
🌳 Olivier Pirson — OPi 🇧🇪🇫🇷🇬🇧 🐧 👨‍💻 👨‍🔬
 
Spark algorithms
Spark algorithmsSpark algorithms
Spark algorithms
Ashutosh Trivedi
 
An Efficient and Parallel Abstract Interpreter in Scala — Presentation
An Efficient and Parallel Abstract Interpreter in Scala — PresentationAn Efficient and Parallel Abstract Interpreter in Scala — Presentation
An Efficient and Parallel Abstract Interpreter in Scala — Presentation
🌳 Olivier Pirson — OPi 🇧🇪🇫🇷🇬🇧 🐧 👨‍💻 👨‍🔬
 
Rattle Graphical Interface for R Language
Rattle Graphical Interface for R LanguageRattle Graphical Interface for R Language
Rattle Graphical Interface for R Language
Majid Abdollahi
 
kalaivaanar_NSK_V3.ppt
kalaivaanar_NSK_V3.pptkalaivaanar_NSK_V3.ppt
kalaivaanar_NSK_V3.ppt
VINOTHRAJR1
 
Control system Lab record
Control system Lab record Control system Lab record
Control system Lab record
Yuvraj Singh
 
Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...
TELKOMNIKA JOURNAL
 
EdSketch: Execution-Driven Sketching for Java
EdSketch: Execution-Driven Sketching for JavaEdSketch: Execution-Driven Sketching for Java
EdSketch: Execution-Driven Sketching for Java
Lisa Hua
 
2014.06.24.what is ubix
2014.06.24.what is ubix2014.06.24.what is ubix
2014.06.24.what is ubix
Jim Cooley
 
FractalTreeIndex
FractalTreeIndexFractalTreeIndex
FractalTreeIndex
Akhil M Sreenath
 
Ssbse12b.ppt
Ssbse12b.pptSsbse12b.ppt
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
rantav
 
IRJET - Multi-Key Privacy in Cloud Computing
IRJET -  	  Multi-Key Privacy in Cloud ComputingIRJET -  	  Multi-Key Privacy in Cloud Computing
IRJET - Multi-Key Privacy in Cloud Computing
IRJET Journal
 
Psat toolbox-8631349
Psat toolbox-8631349Psat toolbox-8631349
Psat toolbox-8631349
Kannan Kathiravan
 
How the Postgres Query Optimizer Works
How the Postgres Query Optimizer WorksHow the Postgres Query Optimizer Works
How the Postgres Query Optimizer Works
EDB
 
Reducing Redundancies in Multi-Revision Code Analysis
Reducing Redundancies in Multi-Revision Code AnalysisReducing Redundancies in Multi-Revision Code Analysis
Reducing Redundancies in Multi-Revision Code Analysis
Sebastiano Panichella
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model
佳蓉 倪
 
SQL Pattern Matching – should I start using it?
SQL Pattern Matching – should I start using it?SQL Pattern Matching – should I start using it?
SQL Pattern Matching – should I start using it?
Andrej Pashchenko
 

Similar to Relaxed Parsing of Regular Approximations of String-Embedded Languages (20)

An Efficient and Parallel Abstract Interpreter in Scala — First Algorithm
An Efficient and Parallel Abstract Interpreter in Scala — First AlgorithmAn Efficient and Parallel Abstract Interpreter in Scala — First Algorithm
An Efficient and Parallel Abstract Interpreter in Scala — First Algorithm
 
COMPILER DESIGN
COMPILER DESIGNCOMPILER DESIGN
COMPILER DESIGN
 
An Efficient and Parallel Abstract Interpreter in Scala — Second Presentation
An Efficient and Parallel Abstract Interpreter in Scala — Second PresentationAn Efficient and Parallel Abstract Interpreter in Scala — Second Presentation
An Efficient and Parallel Abstract Interpreter in Scala — Second Presentation
 
Spark algorithms
Spark algorithmsSpark algorithms
Spark algorithms
 
An Efficient and Parallel Abstract Interpreter in Scala — Presentation
An Efficient and Parallel Abstract Interpreter in Scala — PresentationAn Efficient and Parallel Abstract Interpreter in Scala — Presentation
An Efficient and Parallel Abstract Interpreter in Scala — Presentation
 
Rattle Graphical Interface for R Language
Rattle Graphical Interface for R LanguageRattle Graphical Interface for R Language
Rattle Graphical Interface for R Language
 
kalaivaanar_NSK_V3.ppt
kalaivaanar_NSK_V3.pptkalaivaanar_NSK_V3.ppt
kalaivaanar_NSK_V3.ppt
 
Control system Lab record
Control system Lab record Control system Lab record
Control system Lab record
 
Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...
 
EdSketch: Execution-Driven Sketching for Java
EdSketch: Execution-Driven Sketching for JavaEdSketch: Execution-Driven Sketching for Java
EdSketch: Execution-Driven Sketching for Java
 
2014.06.24.what is ubix
2014.06.24.what is ubix2014.06.24.what is ubix
2014.06.24.what is ubix
 
FractalTreeIndex
FractalTreeIndexFractalTreeIndex
FractalTreeIndex
 
Ssbse12b.ppt
Ssbse12b.pptSsbse12b.ppt
Ssbse12b.ppt
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
 
IRJET - Multi-Key Privacy in Cloud Computing
IRJET -  	  Multi-Key Privacy in Cloud ComputingIRJET -  	  Multi-Key Privacy in Cloud Computing
IRJET - Multi-Key Privacy in Cloud Computing
 
Psat toolbox-8631349
Psat toolbox-8631349Psat toolbox-8631349
Psat toolbox-8631349
 
How the Postgres Query Optimizer Works
How the Postgres Query Optimizer WorksHow the Postgres Query Optimizer Works
How the Postgres Query Optimizer Works
 
Reducing Redundancies in Multi-Revision Code Analysis
Reducing Redundancies in Multi-Revision Code AnalysisReducing Redundancies in Multi-Revision Code Analysis
Reducing Redundancies in Multi-Revision Code Analysis
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model
 
SQL Pattern Matching – should I start using it?
SQL Pattern Matching – should I start using it?SQL Pattern Matching – should I start using it?
SQL Pattern Matching – should I start using it?
 

Recently uploaded

Malibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed RoundMalibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed Round
sjcobrien
 
The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024
Yara Milbes
 
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
kalichargn70th171
 
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
safelyiotech
 
What’s New in Odoo 17 – A Complete Roadmap
What’s New in Odoo 17 – A Complete RoadmapWhat’s New in Odoo 17 – A Complete Roadmap
What’s New in Odoo 17 – A Complete Roadmap
Envertis Software Solutions
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
Remote DBA Services
 
Quarter 3 SLRP grade 9.. gshajsbhhaheabh
Quarter 3 SLRP grade 9.. gshajsbhhaheabhQuarter 3 SLRP grade 9.. gshajsbhhaheabh
Quarter 3 SLRP grade 9.. gshajsbhhaheabh
aisafed42
 
Enhanced Screen Flows UI/UX using SLDS with Tom Kitt
Enhanced Screen Flows UI/UX using SLDS with Tom KittEnhanced Screen Flows UI/UX using SLDS with Tom Kitt
Enhanced Screen Flows UI/UX using SLDS with Tom Kitt
Peter Caitens
 
Project Management: The Role of Project Dashboards.pdf
Project Management: The Role of Project Dashboards.pdfProject Management: The Role of Project Dashboards.pdf
Project Management: The Role of Project Dashboards.pdf
Karya Keeper
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
kalichargn70th171
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
The Third Creative Media
 
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
kgyxske
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
Peter Muessig
 
ACE - Team 24 Wrapup event at ahmedabad.
ACE - Team 24 Wrapup event at ahmedabad.ACE - Team 24 Wrapup event at ahmedabad.
ACE - Team 24 Wrapup event at ahmedabad.
Maitrey Patel
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
Alina Yurenko
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
rodomar2
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
Green Software Development
 
Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !
Marcin Chrost
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
XfilesPro
 

Recently uploaded (20)

Malibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed RoundMalibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed Round
 
The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024
 
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
 
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
 
What’s New in Odoo 17 – A Complete Roadmap
What’s New in Odoo 17 – A Complete RoadmapWhat’s New in Odoo 17 – A Complete Roadmap
What’s New in Odoo 17 – A Complete Roadmap
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
 
Quarter 3 SLRP grade 9.. gshajsbhhaheabh
Quarter 3 SLRP grade 9.. gshajsbhhaheabhQuarter 3 SLRP grade 9.. gshajsbhhaheabh
Quarter 3 SLRP grade 9.. gshajsbhhaheabh
 
Enhanced Screen Flows UI/UX using SLDS with Tom Kitt
Enhanced Screen Flows UI/UX using SLDS with Tom KittEnhanced Screen Flows UI/UX using SLDS with Tom Kitt
Enhanced Screen Flows UI/UX using SLDS with Tom Kitt
 
Project Management: The Role of Project Dashboards.pdf
Project Management: The Role of Project Dashboards.pdfProject Management: The Role of Project Dashboards.pdf
Project Management: The Role of Project Dashboards.pdf
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
 
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
 
ACE - Team 24 Wrapup event at ahmedabad.
ACE - Team 24 Wrapup event at ahmedabad.ACE - Team 24 Wrapup event at ahmedabad.
ACE - Team 24 Wrapup event at ahmedabad.
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
 
Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
 

Relaxed Parsing of Regular Approximations of String-Embedded Languages

  • 1. Relaxed Parsing of Regular Approximations of String-Embedded Languages Author: Ekaterina Verbitskaia Saint Petersburg State University JetBrains Programming Languages and Tools Lab 26/08/2015 Ekaterina Verbitskaia (SPbSU) 26/08/2015 1 / 27
  • 2. String-embedded code Embedded SQL SqlCommand myCommand = new SqlCommand( "SELECT * FROM table WHERE Column = @Param2", myConnection); myCommand.Parameters.Add(myParam2); Dynamic SQL IF @X = @Y SET @TBL = ’ #table1 ’ ELSE SET @TBL = ’ table2 ’ SET @S = ’SELECT x FROM’ + @TBL + ’WHERE ISNULL(n,0) > 1’ EXECUTE (@S) Ekaterina Verbitskaia (SPbSU) 26/08/2015 2 / 27
  • 3. Problems String-embedded code are expressions in some programming language It may be necessary to support them in IDE: code highlighting, autocomplete, refactorings It may be necessary to transform them: migration of legacy software to new platforms It may be necessary to detect vulnerabilities in such code Any other problems of programming languages can occur Ekaterina Verbitskaia (SPbSU) 26/08/2015 3 / 27
  • 4. Static analysis of string-embedded code Performed without programm execution Checks that the set of properties holds for each possible expression value Undecidable for string-embedded code in the general case The set of possible expression values is over approximated and then the approximation is analysed Ekaterina Verbitskaia (SPbSU) 26/08/2015 4 / 27
  • 5. Existing tools PHP String Analyzer, Java String Analyzer, Alvor Static analyzers for PHP, Java, and SQL embedded into Java respectively Kyung-Goo Doh et al. Checks syntactical correctness of embedded code PHPStorm IDE for PHP with support of HTML, CSS, JavaScript IntelliLang PHPStorm and IDEA plugin, supports various languages STRANGER Vulnerability detection of PHP Flaws Limited functionality Hard to extend them with new features or support new languages Do not create structural representation of code Ekaterina Verbitskaia (SPbSU) 26/08/2015 5 / 27
  • 6. Static analysis of string-embedded code: the scheme Identification of hotspots: points of interest, where the analysis is desirable Approximation construction Lexical analysis Syntactic analysis Semantic analysis Ekaterina Verbitskaia (SPbSU) 26/08/2015 6 / 27
  • 7. Static analysis of string-embedded code: the scheme Code: hotspot is marked string res = ""; for(i = 0; i < l; i++) res = "()" + res; use(res); Possible values {"", "()", "()()", ..., "()"^l} Regular approximation ("()")* Approximation Ekaterina Verbitskaia (SPbSU) 26/08/2015 7 / 27
  • 8. Static analysis of string-embedded code: the scheme Approximation After lexing Grammar start ::= s s ::= LBR s RBR s s ::= 𝜖 Parse forest start_rule prod 2 s ! prod 0 prod 0 LBR s RBR eps LBR s RBR eps Ekaterina Verbitskaia (SPbSU) 26/08/2015 8 / 27
  • 9. Problem statement The aim is to develop the algorithm suitable for syntactic analysis of string-embedded code Tasks: Develop an algorithm for parsing of regular approximation of embedded code which produce a finite parse forest Parse forest should contain a parse tree for every correct (w.r.t. reference grammar) string accepted by the input automaton Incorrect strings should be omitted: no error detection An algorithm should not depend on the language of the host program and the language of embedded code Ekaterina Verbitskaia (SPbSU) 26/08/2015 9 / 27
  • 10. Algorithm Input: reference DCF grammar G and DFA graph with no 𝜖-transitions over the alphabeth of terminals of G Output: finite representation of the trees corresponding to all correct string accepted by input automaton Ekaterina Verbitskaia (SPbSU) 26/08/2015 10 / 27
  • 11. Right-Nulled Generalized LR algorithm RNGLR processes context free grammars In case when LR conflicts occur, parses in each possible way Shift/Reduce conflict Reduce/Reduce conflict Uses data structures which reduce memory consumption and guarantee appropriate time of analysis Ekaterina Verbitskaia (SPbSU) 26/08/2015 11 / 27
  • 12. RNGLR data structures: Graph-Structured Stack 0 2 a 3 b 9 B 4 A 6 a 8a 10 A 5 a 8 a a 11a 1 S Ekaterina Verbitskaia (SPbSU) 26/08/2015 12 / 27
  • 13. RNGLR operations: shift 0 2 a 3b 9 B 4 A 6 a 8 a 10 A Ekaterina Verbitskaia (SPbSU) 26/08/2015 13 / 27
  • 14. RNGLR operations: shift 0 2 a 3b 9 B 4 A 6 a 8 a 10 A 0 2 a 3b 9 B 4 A 6 a 8 a 10 A 5 a 8 a a 11 a Ekaterina Verbitskaia (SPbSU) 26/08/2015 13 / 27
  • 15. RNGLR operations: reduce 0 2 a 3b 9 B 4 A 6 a 8 a 10 A 5 a 8 a a 11 a Ekaterina Verbitskaia (SPbSU) 26/08/2015 14 / 27
  • 16. RNGLR operations: reduce 0 2 a 3b 9 B 4 A 6 a 8 a 10 A 5 a 8 a a 11 a 0 2 a 3 b 9 B 4 A 6 a 8a 10 A 5 a 8 a a 11a 1 S Ekaterina Verbitskaia (SPbSU) 26/08/2015 14 / 27
  • 17. RNGLR Read the input sequentially Process all reductions Shift the next token Each time new vertex is added to the GSS, shift is calculated Each time new edge is created in the GSS, reductions are calculated Ekaterina Verbitskaia (SPbSU) 26/08/2015 15 / 27
  • 18. Algorithm Traverse the automaton graph and sequentially construct GSS, similarly as in RNGLR New type of “conflict”: Shift/Shift The set of LR-states is associated with each vertex of input graph The order in which the vertices of input graph are traversed is controled with a queue. Whenever new edge is added to GSS, its tail vertex is enqueued The algorithm implements relaxed parsing: errors are not detected, erroneous strings are ignored Ekaterina Verbitskaia (SPbSU) 26/08/2015 16 / 27
  • 19. Cycles processing: initial stack Ekaterina Verbitskaia (SPbSU) 26/08/2015 17 / 27
  • 20. Cycles processing: some edges were added 5 1 0 2 4 3 ... Ekaterina Verbitskaia (SPbSU) 26/08/2015 18 / 27
  • 21. Cycles processing: cycle closed 5 1 0 2 4 3 ... Ekaterina Verbitskaia (SPbSU) 26/08/2015 19 / 27
  • 22. Cycles processing: new reduction 5 1 0 2 4 3 ... Ekaterina Verbitskaia (SPbSU) 26/08/2015 20 / 27
  • 23. Cycles processing: new reduction added 5 1 0 2 4 3 ... Ekaterina Verbitskaia (SPbSU) 26/08/2015 21 / 27
  • 24. Parse forest construction Shared Packed Parse Forest — graph, in which all derivation trees are merged Constructed in the same manner as in the RNGLR-algorithm Fragments of derivation trees are associated with GSS edges Each time shift is processed, new tree of one terminal is created Each time reduction is processed, new tree with children accumulated along the paths is created Children are not copied, they are reused The root of resulting tree is associated with GSS edge, corresponding to reduction to the starting nonterminal Unreachable vertices are deleted from resulting graph Ekaterina Verbitskaia (SPbSU) 26/08/2015 22 / 27
  • 25. Algorithm: correctness Correct tree — derivation tree of some string accumulated along the path in the input graph n start n s t LBR n s t RBR n s eps t LBR n s t RBR eps Ekaterina Verbitskaia (SPbSU) 26/08/2015 23 / 27
  • 26. Algorithm: correctness Theorem (Termination) Algorithm terminates for any input Theorem (Correctness) Every tree, generated from SPPF, is correct Theorem (Correctness) For every path p in the inner graph, recognized w.r.t. reference grammar, a correct tree corresponding to p can be generated from SPPF Ekaterina Verbitskaia (SPbSU) 26/08/2015 24 / 27
  • 27. Implementation The algorithm is implemented as a part of YaccConstructor project using F# programming language The generator of RNGLR parse tables and data structures for GSS and SPPF are reused Ekaterina Verbitskaia (SPbSU) 26/08/2015 25 / 27
  • 28. Evaluation The data is taken from the project of migration from MS-SQL to Oracle Server 2,7 lines of code, 2430 queries, 2188 successfully processed The number of queries which previously could not be processed because of timeout is decreased from 45 to 1 Ekaterina Verbitskaia (SPbSU) 26/08/2015 26 / 27
  • 29. Conclusion The algorithm for parsing of regular approximation of dynamically generated string which constructs the finite representation of parse forest is developped Its termination and correctness are proved The algorithm is implemented as a part of YaccConstructor project The evaluation demonstrated it could be used for complex tasks Ekaterina Verbitskaia (SPbSU) 26/08/2015 27 / 27