SlideShare a Scribd company logo
1 of 13
Download to read offline
Formal Semantics of SQL
(and Cypher)
Paolo Guagliardo
SQL
• Standard query language for relational databases
• $30B/year business
• Implemented in all major RDBMSs (free and commercial)
• First standardized in 1986 (ANSI) and 1987 (ISO)
• Several revision afterwards
(SQL-89, SQL-92, SQL:1999, SQL:2003, SQL:2006, SQL:2008, SQL:2011, SQL:2016)
“The nice thing about standards is that you have
so many to choose from” — Andrew S. Tanenbaum
How standard is SQL?
SELECT * FROM R WHERE EXISTS (
SELECT * FROM (
SELECT R.A, R.A FROM R ) S )
Both PostgreSQL and Oracle output R
SELECT * FROM ( SELECT R.A, R.A FROM R ) S
PostgreSQL outputs a table with two columns named “A”
Oracle throws an ERROR: reference to column “A” is ambiguous
Who is right?
Let’s have a look at the standard!
A. If the <select list> * is simply contained in a <subquery> that is
immediately contained in an <exists predicate>, then the <select list>
is equivalent to a <value expression> that is an arbitrary <literal>.
B. Otherwise, the <select list> * is equivalent to a <value expression>
sequence in which each <value expression> is a column reference
that references a column of T and each column of T is referenced
exactly once. The columns are referenced in the ascending sequence
of their ordinal position within T.
… which means
SELECT * FROM R
WHERE EXISTS (
SELECT *
FROM ( SELECT R.A, R.A
FROM R ) S
)
SELECT *
FROM ( SELECT R.A, R.A
FROM R ) S
SELECT S.A, S.A
FROM ( SELECT R.A, R.A
FROM R ) S
SELECT R.A FROM R
WHERE EXISTS (
SELECT 1
FROM ( SELECT R.A, R.A
FROM R ) S
)
≡
≡
The Need for a Formal Semantics
• Avoid ambiguity of natural language
• Clearly defined and not subject to interpretation
• Easy to understand and implement
Previous attempts
• Many simplifying assumptions: no bags, no nulls
• No justification of correctness
SELECT R.A FROM R
WHERE R.A NOT IN (
SELECT S.A FROM S)
SELECT R.A FROM R
EXCEPT
SELECT S.A FROM S
SELECT R.A FROM R
WHERE NOT EXISTS (
SELECT S.A FROM S
WHERE S.A=R.A )
A
1
NULL
A
NULLR S
A
1
A
1
NULL
A
Answer
Core SQL fragment
⌧ := (T1, . . . , Tk), := (N1, . . . , Nk), k > 0
↵ := (A1, . . . , Am), 0
:= (N0
1, . . . , N0
m), m > 0
Queries:
Q := SELECT [DISTINCT] (↵: 0
| *) FROM ⌧ : WHERE ✓
| Q (UNION | INTERSECT | EXCEPT) [ ALL ] Q
Conditions:
✓ := TRUE | ¯t (= | 6=) ¯t | t IS [ NOT ] NULL
| ¯t [ NOT ] IN Q | EXISTS Q
| ✓ AND ✓ | ✓ OR ✓ | NOT ✓
Figure 1: Syntax of core SQL
tribute), or a pair of names (e.g., R1.A), and values are
either constants in C or NULL.
The field name fi and the field value vi of the i-th
as the only com
arbitrary compa
clause is compu
can simply atta
2.2 Semant
The semantic
database D and
a partial map fr
⌘ provides the
name and attri
fined. One star
the semantics o
vironment gets
the notation JQ
to D and ⌘; th
JQKD,?.
To explain th
definitions rela
tions on relatio
Essentially SQL without arithmetic, grouping and aggregation
Formal Semantics: Challenges
Data model
• Base relations / query outputs / intermediate results
• Primitive data manipulation operations
Attribute references
• Binding rules in subqueries
• Environment collects and propagates bindings
Proposed Semantics
JRKD,⌘ = RD
J⌧ : KD,⌘ = J(T1, . . . , Tk) : (N1, . . . , Nk)KD,⌘ = N1.JT1KD,⌘ ⇥ · · · ⇥ Nk.JTkKD,⌘
s
FROM ⌧ :
WHERE ✓
{
D,⌘
= ¯a 2 J⌧ : KD,⌘ | J✓KD,⌘;⌘¯a = t
t
SELECT ⇤
FROM ⌧ :
WHERE ✓
|
D,⌘
=
s
FROM ⌧ :
WHERE ✓
{
D,⌘
: 1
t
SELECT ↵ : 0
FROM ⌧ :
WHERE ✓
|
D,⌘
= ⇡↵
s
FROM ⌧ :
WHERE ✓
{
D,⌘
!
: 0
t
SELECT DISTINCT ↵ : 0
| ⇤
FROM ⌧ :
WHERE ✓
|
D,⌘
= "
0
@
t
SELECT ↵ : 0
| ⇤
FROM ⌧ :
WHERE ✓
|
D,⌘
1
A
JTRUEKD,⌘ = t
JtKD,⌘ =
⇢
⌘(A) if t = A
t if t 2 C or t = NULL
Jt1 = t2KD,⌘ =
8
<
:
t if Jt1KD,⌘ = Jt2KD,⌘ and Jt1KD,⌘ 6= NULL and Jt2KD,⌘ 6= NULL
f if Jt1KD,⌘ 6= Jt2KD,⌘ and Jt1KD,⌘ 6= NULL and Jt2KD,⌘ 6= NULL
u if Jt1KD,⌘ = NULL or Jt2KD,⌘ = NULL
Jt IS NULLKD,⌘ =
⇢
t if JtKD,⌘ = NULL
f if JtKD,⌘ 6= NULL
Jt IS NOT NULLKD,⌘ = ¬Jt IS NULLKD,⌘
J(t1, . . . tn) = (t0
1, . . . , t0
n)KD,⌘ =
n^
i=1
Jti = t0
iKD,⌘ J(t1, . . . tn) 6= (t0
1, . . . , t0
n)KD,⌘ =
n_
i=1
Jti 6= t0
iKD,⌘
J¯t IN QKD,⌘ =
8
<
:
t if 9¯r 2 JQKD,⌘ : J¯t = ⌫(¯r)KD,⌘ = t
f if 8¯r 2 JQKD,⌘ : J¯t = ⌫(¯r)KD,⌘ = f
u if @¯r 2 JQKD,⌘ : J¯t = ⌫(¯r)KD,⌘ = t and 9¯r 2 JQKD,⌘ : J¯t = ⌫(¯r)KD,⌘ 6= f
J¯t NOT IN QKD,⌘ = ¬J¯t IN QKD,⌘
JEXISTS QKD,⌘ =
⇢
t if JQKD,⌘ 6= ?
f if JQKD,⌘ = ?
J✓1 AND ✓2KD,⌘ = J✓1KD,⌘ ^ J✓2KD,⌘ J✓1 OR ✓2KD,⌘ = J✓1KD,⌘ _ J✓2KD,⌘ JNOT ✓KD,⌘ = ¬J✓KD,⌘
JQ1 UNION ALL Q2KD,⌘ = JQ1KD,⌘ [ JQ2KD,⌘ : `(JQ1K)
JQ1 INTERSECT ALL Q2KD,⌘ = JQ1KD,⌘  JQ2KD,⌘ : `(JQ1K)
JQ1 EXCEPT ALL Q2KD,⌘ = JQ1KD,⌘ JQ2KD,⌘ : `(JQ1K)
JQ1 ? Q2KD,⌘ = " JQ1 ? ALL Q2KD,⌘ , ? 2 {UNION, INTERSECT}
JQ1 EXCEPT Q2KD,⌘ = "(JQ1KD,⌘) JQ2KD,⌘ : `(JQ1K)
Figure 2: Semantics of core SQL
• Fits in one page
• Non-ambiguous
• Easy to understand
• Easy to implement
• Easy to modify
Formal Semantics: Validation
• Cannot prove that semantics is correct
• Provide sufficient experimental evidence
• Implemented in Python
• Validated on 100000+ random SQL queries
Formal Semantics of Cypher
• Collaboration between Neo Technology
and the University of Edinburgh
• Preliminary meeting in December
• Legal agreements finalized recently
• Neo Technology sponsors a researcher
(Nadime Francis)
Challenges
• Getting the (abstract) data model right
• Intermediate representation (QUIL?)
• Identify core fragment
• Language constantly evolving
• Follow the footsteps of SQL? (nulls)

More Related Content

Similar to Formal Semantics of SQL and Cypher

Data profiling in Apache Calcite
Data profiling in Apache CalciteData profiling in Apache Calcite
Data profiling in Apache CalciteDataWorks Summit
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache CalciteJulian Hyde
 
A System for Stratified Datalog Programs - Master's thesis presentation
A System for Stratified Datalog Programs - Master's thesis presentationA System for Stratified Datalog Programs - Master's thesis presentation
A System for Stratified Datalog Programs - Master's thesis presentationSimone Spaccarotella
 
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...Austin Benson
 
Rob Sullivan at Heroku's Waza 2013: Your Database -- A Story of Indifference
Rob Sullivan at Heroku's Waza 2013: Your Database -- A Story of IndifferenceRob Sullivan at Heroku's Waza 2013: Your Database -- A Story of Indifference
Rob Sullivan at Heroku's Waza 2013: Your Database -- A Story of IndifferenceHeroku
 
Beyond Shuffling, Tips and Tricks for Scaling Apache Spark updated for Spark ...
Beyond Shuffling, Tips and Tricks for Scaling Apache Spark updated for Spark ...Beyond Shuffling, Tips and Tricks for Scaling Apache Spark updated for Spark ...
Beyond Shuffling, Tips and Tricks for Scaling Apache Spark updated for Spark ...Holden Karau
 
Functional Gradient Boosting based on Residual Network Perception
Functional Gradient Boosting based on Residual Network PerceptionFunctional Gradient Boosting based on Residual Network Perception
Functional Gradient Boosting based on Residual Network PerceptionAtsushi Nitanda
 
第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」
第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」
第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」Ken'ichi Matsui
 
Data Mining With A Simulated Annealing Based Fuzzy Classification System
Data Mining With A Simulated Annealing Based Fuzzy Classification SystemData Mining With A Simulated Annealing Based Fuzzy Classification System
Data Mining With A Simulated Annealing Based Fuzzy Classification SystemJamie (Taka) Wang
 
Manipulating string data with a pattern in R
Manipulating string data with  a pattern in RManipulating string data with  a pattern in R
Manipulating string data with a pattern in RLun-Hsien Chang
 
AB-RNA-Mfold&SCFGs-2011
AB-RNA-Mfold&SCFGs-2011AB-RNA-Mfold&SCFGs-2011
AB-RNA-Mfold&SCFGs-2011Paula Tataru
 
Improving PySpark Performance - Spark Beyond the JVM @ PyData DC 2016
Improving PySpark Performance - Spark Beyond the JVM @ PyData DC 2016Improving PySpark Performance - Spark Beyond the JVM @ PyData DC 2016
Improving PySpark Performance - Spark Beyond the JVM @ PyData DC 2016Holden Karau
 
time_complexity_list_02_04_2024_22_pages.pdf
time_complexity_list_02_04_2024_22_pages.pdftime_complexity_list_02_04_2024_22_pages.pdf
time_complexity_list_02_04_2024_22_pages.pdfSrinivasaReddyPolamR
 
Cassandra EU - State of CQL
Cassandra EU - State of CQLCassandra EU - State of CQL
Cassandra EU - State of CQLpcmanus
 
C* Summit EU 2013: The State of CQL
C* Summit EU 2013: The State of CQLC* Summit EU 2013: The State of CQL
C* Summit EU 2013: The State of CQLDataStax Academy
 
Estimating structured vector autoregressive models
Estimating structured vector autoregressive modelsEstimating structured vector autoregressive models
Estimating structured vector autoregressive modelsAkira Tanimoto
 

Similar to Formal Semantics of SQL and Cypher (20)

Data profiling in Apache Calcite
Data profiling in Apache CalciteData profiling in Apache Calcite
Data profiling in Apache Calcite
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache Calcite
 
A System for Stratified Datalog Programs - Master's thesis presentation
A System for Stratified Datalog Programs - Master's thesis presentationA System for Stratified Datalog Programs - Master's thesis presentation
A System for Stratified Datalog Programs - Master's thesis presentation
 
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
 
Distributed computing with spark
Distributed computing with sparkDistributed computing with spark
Distributed computing with spark
 
Rob Sullivan at Heroku's Waza 2013: Your Database -- A Story of Indifference
Rob Sullivan at Heroku's Waza 2013: Your Database -- A Story of IndifferenceRob Sullivan at Heroku's Waza 2013: Your Database -- A Story of Indifference
Rob Sullivan at Heroku's Waza 2013: Your Database -- A Story of Indifference
 
Beyond Shuffling, Tips and Tricks for Scaling Apache Spark updated for Spark ...
Beyond Shuffling, Tips and Tricks for Scaling Apache Spark updated for Spark ...Beyond Shuffling, Tips and Tricks for Scaling Apache Spark updated for Spark ...
Beyond Shuffling, Tips and Tricks for Scaling Apache Spark updated for Spark ...
 
Functional Gradient Boosting based on Residual Network Perception
Functional Gradient Boosting based on Residual Network PerceptionFunctional Gradient Boosting based on Residual Network Perception
Functional Gradient Boosting based on Residual Network Perception
 
第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」
第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」
第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」
 
Data Mining With A Simulated Annealing Based Fuzzy Classification System
Data Mining With A Simulated Annealing Based Fuzzy Classification SystemData Mining With A Simulated Annealing Based Fuzzy Classification System
Data Mining With A Simulated Annealing Based Fuzzy Classification System
 
Manipulating string data with a pattern in R
Manipulating string data with  a pattern in RManipulating string data with  a pattern in R
Manipulating string data with a pattern in R
 
AB-RNA-Mfold&SCFGs-2011
AB-RNA-Mfold&SCFGs-2011AB-RNA-Mfold&SCFGs-2011
AB-RNA-Mfold&SCFGs-2011
 
Improving PySpark Performance - Spark Beyond the JVM @ PyData DC 2016
Improving PySpark Performance - Spark Beyond the JVM @ PyData DC 2016Improving PySpark Performance - Spark Beyond the JVM @ PyData DC 2016
Improving PySpark Performance - Spark Beyond the JVM @ PyData DC 2016
 
time_complexity_list_02_04_2024_22_pages.pdf
time_complexity_list_02_04_2024_22_pages.pdftime_complexity_list_02_04_2024_22_pages.pdf
time_complexity_list_02_04_2024_22_pages.pdf
 
Cassandra EU - State of CQL
Cassandra EU - State of CQLCassandra EU - State of CQL
Cassandra EU - State of CQL
 
C* Summit EU 2013: The State of CQL
C* Summit EU 2013: The State of CQLC* Summit EU 2013: The State of CQL
C* Summit EU 2013: The State of CQL
 
D3.js workshop
D3.js workshopD3.js workshop
D3.js workshop
 
Estimating structured vector autoregressive models
Estimating structured vector autoregressive modelsEstimating structured vector autoregressive models
Estimating structured vector autoregressive models
 
Families of Triangular Norm Based Kernel Function and Its Application to Kern...
Families of Triangular Norm Based Kernel Function and Its Application to Kern...Families of Triangular Norm Based Kernel Function and Its Application to Kern...
Families of Triangular Norm Based Kernel Function and Its Application to Kern...
 
MUMS Undergraduate Workshop - Parameter Selection and Model Calibration for a...
MUMS Undergraduate Workshop - Parameter Selection and Model Calibration for a...MUMS Undergraduate Workshop - Parameter Selection and Model Calibration for a...
MUMS Undergraduate Workshop - Parameter Selection and Model Calibration for a...
 

More from openCypher

Learning Timed Automata with Cypher
Learning Timed Automata with CypherLearning Timed Automata with Cypher
Learning Timed Automata with CypheropenCypher
 
Incremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher QueriesIncremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher QueriesopenCypher
 
Formal semantics for Cypher queries and updates
Formal semantics for Cypher queries and updatesFormal semantics for Cypher queries and updates
Formal semantics for Cypher queries and updatesopenCypher
 
Cypher.PL: an executable specification of Cypher semantics
Cypher.PL: an executable specification of Cypher semanticsCypher.PL: an executable specification of Cypher semantics
Cypher.PL: an executable specification of Cypher semanticsopenCypher
 
Multiple Graphs: Updatable Views
Multiple Graphs: Updatable ViewsMultiple Graphs: Updatable Views
Multiple Graphs: Updatable ViewsopenCypher
 
Micro-Servicing Linked Data
Micro-Servicing Linked DataMicro-Servicing Linked Data
Micro-Servicing Linked DataopenCypher
 
Graph abstraction
Graph abstractionGraph abstraction
Graph abstractionopenCypher
 
From Cypher 9 to GQL: Conceptual overview of multiple named graphs and compos...
From Cypher 9 to GQL: Conceptual overview of multiple named graphs and compos...From Cypher 9 to GQL: Conceptual overview of multiple named graphs and compos...
From Cypher 9 to GQL: Conceptual overview of multiple named graphs and compos...openCypher
 
Cypher for Gremlin
Cypher for GremlinCypher for Gremlin
Cypher for GremlinopenCypher
 
Comparing PGQL, G-Core and Cypher
Comparing PGQL, G-Core and CypherComparing PGQL, G-Core and Cypher
Comparing PGQL, G-Core and CypheropenCypher
 
Multiple graphs in openCypher
Multiple graphs in openCypherMultiple graphs in openCypher
Multiple graphs in openCypheropenCypher
 
Eighth openCypher Implementers Group Meeting: Status Update
Eighth openCypher Implementers Group Meeting: Status UpdateEighth openCypher Implementers Group Meeting: Status Update
Eighth openCypher Implementers Group Meeting: Status UpdateopenCypher
 
Cypher for Gremlin
Cypher for GremlinCypher for Gremlin
Cypher for GremlinopenCypher
 
Supporting dates and times in Cypher
Supporting dates and times in CypherSupporting dates and times in Cypher
Supporting dates and times in CypheropenCypher
 
Seventh openCypher Implementers Group Meeting: Status Update
Seventh openCypher Implementers Group Meeting: Status UpdateSeventh openCypher Implementers Group Meeting: Status Update
Seventh openCypher Implementers Group Meeting: Status UpdateopenCypher
 
Academic research on graph processing: connecting recent findings to industri...
Academic research on graph processing: connecting recent findings to industri...Academic research on graph processing: connecting recent findings to industri...
Academic research on graph processing: connecting recent findings to industri...openCypher
 
Property Graphs with Time
Property Graphs with TimeProperty Graphs with Time
Property Graphs with TimeopenCypher
 
Cypher.PL: Executable Specification of Cypher written in Prolog
Cypher.PL: Executable Specification of Cypher written in PrologCypher.PL: Executable Specification of Cypher written in Prolog
Cypher.PL: Executable Specification of Cypher written in PrologopenCypher
 
Use case: processing multiple graphs
Use case: processing multiple graphsUse case: processing multiple graphs
Use case: processing multiple graphsopenCypher
 
openCypher Technology Compatibility Kit (TCK)
openCypher Technology Compatibility Kit (TCK)openCypher Technology Compatibility Kit (TCK)
openCypher Technology Compatibility Kit (TCK)openCypher
 

More from openCypher (20)

Learning Timed Automata with Cypher
Learning Timed Automata with CypherLearning Timed Automata with Cypher
Learning Timed Automata with Cypher
 
Incremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher QueriesIncremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher Queries
 
Formal semantics for Cypher queries and updates
Formal semantics for Cypher queries and updatesFormal semantics for Cypher queries and updates
Formal semantics for Cypher queries and updates
 
Cypher.PL: an executable specification of Cypher semantics
Cypher.PL: an executable specification of Cypher semanticsCypher.PL: an executable specification of Cypher semantics
Cypher.PL: an executable specification of Cypher semantics
 
Multiple Graphs: Updatable Views
Multiple Graphs: Updatable ViewsMultiple Graphs: Updatable Views
Multiple Graphs: Updatable Views
 
Micro-Servicing Linked Data
Micro-Servicing Linked DataMicro-Servicing Linked Data
Micro-Servicing Linked Data
 
Graph abstraction
Graph abstractionGraph abstraction
Graph abstraction
 
From Cypher 9 to GQL: Conceptual overview of multiple named graphs and compos...
From Cypher 9 to GQL: Conceptual overview of multiple named graphs and compos...From Cypher 9 to GQL: Conceptual overview of multiple named graphs and compos...
From Cypher 9 to GQL: Conceptual overview of multiple named graphs and compos...
 
Cypher for Gremlin
Cypher for GremlinCypher for Gremlin
Cypher for Gremlin
 
Comparing PGQL, G-Core and Cypher
Comparing PGQL, G-Core and CypherComparing PGQL, G-Core and Cypher
Comparing PGQL, G-Core and Cypher
 
Multiple graphs in openCypher
Multiple graphs in openCypherMultiple graphs in openCypher
Multiple graphs in openCypher
 
Eighth openCypher Implementers Group Meeting: Status Update
Eighth openCypher Implementers Group Meeting: Status UpdateEighth openCypher Implementers Group Meeting: Status Update
Eighth openCypher Implementers Group Meeting: Status Update
 
Cypher for Gremlin
Cypher for GremlinCypher for Gremlin
Cypher for Gremlin
 
Supporting dates and times in Cypher
Supporting dates and times in CypherSupporting dates and times in Cypher
Supporting dates and times in Cypher
 
Seventh openCypher Implementers Group Meeting: Status Update
Seventh openCypher Implementers Group Meeting: Status UpdateSeventh openCypher Implementers Group Meeting: Status Update
Seventh openCypher Implementers Group Meeting: Status Update
 
Academic research on graph processing: connecting recent findings to industri...
Academic research on graph processing: connecting recent findings to industri...Academic research on graph processing: connecting recent findings to industri...
Academic research on graph processing: connecting recent findings to industri...
 
Property Graphs with Time
Property Graphs with TimeProperty Graphs with Time
Property Graphs with Time
 
Cypher.PL: Executable Specification of Cypher written in Prolog
Cypher.PL: Executable Specification of Cypher written in PrologCypher.PL: Executable Specification of Cypher written in Prolog
Cypher.PL: Executable Specification of Cypher written in Prolog
 
Use case: processing multiple graphs
Use case: processing multiple graphsUse case: processing multiple graphs
Use case: processing multiple graphs
 
openCypher Technology Compatibility Kit (TCK)
openCypher Technology Compatibility Kit (TCK)openCypher Technology Compatibility Kit (TCK)
openCypher Technology Compatibility Kit (TCK)
 

Recently uploaded

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Recently uploaded (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Formal Semantics of SQL and Cypher

  • 1. Formal Semantics of SQL (and Cypher) Paolo Guagliardo
  • 2. SQL • Standard query language for relational databases • $30B/year business • Implemented in all major RDBMSs (free and commercial) • First standardized in 1986 (ANSI) and 1987 (ISO) • Several revision afterwards (SQL-89, SQL-92, SQL:1999, SQL:2003, SQL:2006, SQL:2008, SQL:2011, SQL:2016) “The nice thing about standards is that you have so many to choose from” — Andrew S. Tanenbaum
  • 3. How standard is SQL? SELECT * FROM R WHERE EXISTS ( SELECT * FROM ( SELECT R.A, R.A FROM R ) S ) Both PostgreSQL and Oracle output R SELECT * FROM ( SELECT R.A, R.A FROM R ) S PostgreSQL outputs a table with two columns named “A” Oracle throws an ERROR: reference to column “A” is ambiguous
  • 4. Who is right? Let’s have a look at the standard! A. If the <select list> * is simply contained in a <subquery> that is immediately contained in an <exists predicate>, then the <select list> is equivalent to a <value expression> that is an arbitrary <literal>. B. Otherwise, the <select list> * is equivalent to a <value expression> sequence in which each <value expression> is a column reference that references a column of T and each column of T is referenced exactly once. The columns are referenced in the ascending sequence of their ordinal position within T.
  • 5. … which means SELECT * FROM R WHERE EXISTS ( SELECT * FROM ( SELECT R.A, R.A FROM R ) S ) SELECT * FROM ( SELECT R.A, R.A FROM R ) S SELECT S.A, S.A FROM ( SELECT R.A, R.A FROM R ) S SELECT R.A FROM R WHERE EXISTS ( SELECT 1 FROM ( SELECT R.A, R.A FROM R ) S ) ≡ ≡
  • 6. The Need for a Formal Semantics • Avoid ambiguity of natural language • Clearly defined and not subject to interpretation • Easy to understand and implement Previous attempts • Many simplifying assumptions: no bags, no nulls • No justification of correctness
  • 7. SELECT R.A FROM R WHERE R.A NOT IN ( SELECT S.A FROM S) SELECT R.A FROM R EXCEPT SELECT S.A FROM S SELECT R.A FROM R WHERE NOT EXISTS ( SELECT S.A FROM S WHERE S.A=R.A ) A 1 NULL A NULLR S A 1 A 1 NULL A Answer
  • 8. Core SQL fragment ⌧ := (T1, . . . , Tk), := (N1, . . . , Nk), k > 0 ↵ := (A1, . . . , Am), 0 := (N0 1, . . . , N0 m), m > 0 Queries: Q := SELECT [DISTINCT] (↵: 0 | *) FROM ⌧ : WHERE ✓ | Q (UNION | INTERSECT | EXCEPT) [ ALL ] Q Conditions: ✓ := TRUE | ¯t (= | 6=) ¯t | t IS [ NOT ] NULL | ¯t [ NOT ] IN Q | EXISTS Q | ✓ AND ✓ | ✓ OR ✓ | NOT ✓ Figure 1: Syntax of core SQL tribute), or a pair of names (e.g., R1.A), and values are either constants in C or NULL. The field name fi and the field value vi of the i-th as the only com arbitrary compa clause is compu can simply atta 2.2 Semant The semantic database D and a partial map fr ⌘ provides the name and attri fined. One star the semantics o vironment gets the notation JQ to D and ⌘; th JQKD,?. To explain th definitions rela tions on relatio Essentially SQL without arithmetic, grouping and aggregation
  • 9. Formal Semantics: Challenges Data model • Base relations / query outputs / intermediate results • Primitive data manipulation operations Attribute references • Binding rules in subqueries • Environment collects and propagates bindings
  • 10. Proposed Semantics JRKD,⌘ = RD J⌧ : KD,⌘ = J(T1, . . . , Tk) : (N1, . . . , Nk)KD,⌘ = N1.JT1KD,⌘ ⇥ · · · ⇥ Nk.JTkKD,⌘ s FROM ⌧ : WHERE ✓ { D,⌘ = ¯a 2 J⌧ : KD,⌘ | J✓KD,⌘;⌘¯a = t t SELECT ⇤ FROM ⌧ : WHERE ✓ | D,⌘ = s FROM ⌧ : WHERE ✓ { D,⌘ : 1 t SELECT ↵ : 0 FROM ⌧ : WHERE ✓ | D,⌘ = ⇡↵ s FROM ⌧ : WHERE ✓ { D,⌘ ! : 0 t SELECT DISTINCT ↵ : 0 | ⇤ FROM ⌧ : WHERE ✓ | D,⌘ = " 0 @ t SELECT ↵ : 0 | ⇤ FROM ⌧ : WHERE ✓ | D,⌘ 1 A JTRUEKD,⌘ = t JtKD,⌘ = ⇢ ⌘(A) if t = A t if t 2 C or t = NULL Jt1 = t2KD,⌘ = 8 < : t if Jt1KD,⌘ = Jt2KD,⌘ and Jt1KD,⌘ 6= NULL and Jt2KD,⌘ 6= NULL f if Jt1KD,⌘ 6= Jt2KD,⌘ and Jt1KD,⌘ 6= NULL and Jt2KD,⌘ 6= NULL u if Jt1KD,⌘ = NULL or Jt2KD,⌘ = NULL Jt IS NULLKD,⌘ = ⇢ t if JtKD,⌘ = NULL f if JtKD,⌘ 6= NULL Jt IS NOT NULLKD,⌘ = ¬Jt IS NULLKD,⌘ J(t1, . . . tn) = (t0 1, . . . , t0 n)KD,⌘ = n^ i=1 Jti = t0 iKD,⌘ J(t1, . . . tn) 6= (t0 1, . . . , t0 n)KD,⌘ = n_ i=1 Jti 6= t0 iKD,⌘ J¯t IN QKD,⌘ = 8 < : t if 9¯r 2 JQKD,⌘ : J¯t = ⌫(¯r)KD,⌘ = t f if 8¯r 2 JQKD,⌘ : J¯t = ⌫(¯r)KD,⌘ = f u if @¯r 2 JQKD,⌘ : J¯t = ⌫(¯r)KD,⌘ = t and 9¯r 2 JQKD,⌘ : J¯t = ⌫(¯r)KD,⌘ 6= f J¯t NOT IN QKD,⌘ = ¬J¯t IN QKD,⌘ JEXISTS QKD,⌘ = ⇢ t if JQKD,⌘ 6= ? f if JQKD,⌘ = ? J✓1 AND ✓2KD,⌘ = J✓1KD,⌘ ^ J✓2KD,⌘ J✓1 OR ✓2KD,⌘ = J✓1KD,⌘ _ J✓2KD,⌘ JNOT ✓KD,⌘ = ¬J✓KD,⌘ JQ1 UNION ALL Q2KD,⌘ = JQ1KD,⌘ [ JQ2KD,⌘ : `(JQ1K) JQ1 INTERSECT ALL Q2KD,⌘ = JQ1KD,⌘ JQ2KD,⌘ : `(JQ1K) JQ1 EXCEPT ALL Q2KD,⌘ = JQ1KD,⌘ JQ2KD,⌘ : `(JQ1K) JQ1 ? Q2KD,⌘ = " JQ1 ? ALL Q2KD,⌘ , ? 2 {UNION, INTERSECT} JQ1 EXCEPT Q2KD,⌘ = "(JQ1KD,⌘) JQ2KD,⌘ : `(JQ1K) Figure 2: Semantics of core SQL • Fits in one page • Non-ambiguous • Easy to understand • Easy to implement • Easy to modify
  • 11. Formal Semantics: Validation • Cannot prove that semantics is correct • Provide sufficient experimental evidence • Implemented in Python • Validated on 100000+ random SQL queries
  • 12. Formal Semantics of Cypher • Collaboration between Neo Technology and the University of Edinburgh • Preliminary meeting in December • Legal agreements finalized recently • Neo Technology sponsors a researcher (Nadime Francis)
  • 13. Challenges • Getting the (abstract) data model right • Intermediate representation (QUIL?) • Identify core fragment • Language constantly evolving • Follow the footsteps of SQL? (nulls)