SlideShare a Scribd company logo
1 of 39
Download to read offline
Open Source
Relational Databases
Open Source
Relational Databases
Open Source
Relational Databases
Who and what is about?
• Emanuel Calvo, currently at
OnGres as a PostgreSQL
Consultant and ayres.io as
_root_.

• Working on Modern
techniques for DBRE.

• What is the current status of
the Open Source SQL
databases per component?

• What’s the good, the bad and
the ugly in the market?
ER model
Entity-Relationship and why SQL isn’t considered so.

At least in its pure state.
The ER Map
• Needs a First-Order logic
language for retrieving data.

• Relational Algebra

• Tuple and Domain Relational
Calculus.
The model example
• Obscures everything behind
the complexity of the storage.

• It is represented as relational
algebra, but is hidden from
you.

• How to select the names of
the people of "Black" team?
Some SQL:2011 tangent
distinctions
• Support NULLs

• Support SubQueries

• Column precedence affects
(horizontal alignment)
depending on the engine

• SQL/MED
• Is a declarative language

• Hides all the complexity of the
executions to the end user

• Planners were very advanced
already.
The Transaction
Model
Concurrency, consistency and availability.
The Entity Consistency
• CAP Theorem (Consistency, Availability and Partition
Tolerance). PACELC adds to choose between [L]atency
and [C]onsistency.

• ACID (Atomicity, Consistency, Isolation and Durability)

• BASE (Basically Available, Soft State, Eventual
consistency)
The chosen
We grab them by the storage and use them 

wisely without paying money to Oracle.
• CockroachDB

• PostgreSQL

• MySQL / MariaDB

• Clickhouse

• MongoDB
Components
The Lego
• Storage Engine

• Planner

• Protocol

• Language

• Ecosystem

• Framework

• WAL

• Transaction Manager

• Source Code availability, documentation both user and internal,
community, etc.
• Buffer Management

• IO method (Direct/io, fsync)

• Transaction Management (storage
layer)

• Point in Time Recovery and Undo Log

• For distributed engines you want to 

read Jepsen tests.

• Is the sauce
Wide-range Storage Engine
Map
Columnar Based
Tuple Based
Leveled Structured 

Map Tree
Quick cherry pick
• Fast for aggregations

• Easy for parallelization

• Better compression due to ColBased

• Better to scale massive amount of data
• Bloom filters

• Sparse indexes by design

• Avoid Write Amplification

• Index-based storage

• More disk efficient, more CPU
• Better for concurrency 

• Hard to scale

• Better when manipulating entities

atomically

• Balance between performance and

concurrency.
– Jorge de Lanús Oeste (maneja Uber pero sabe mucho de Bases de Datos)
“Relational databases require a Query Optimizer/
Query Planner for translating the first-order logic
language to relational algebra and other
optimizations. The result is called Execution Plan.”
• Storage Engine

• Planner

• Protocol

• Language

• Ecosystem

• Framework

• WAL

• Transaction Manager

• Source Code availability, documentation both user and internal,
community, etc.
Plan: {

…
• Heuristic

• Cost based {Parametric, MO, MOP}

• Mixed

• Planner, Resolver, Opmitizer, Executor
• Storage Engine

• Planner

• Protocol

• Language

• Ecosystem

• Framework

• WAL

• Transaction Manager

• Source Code availability, documentation both user and internal,
community, etc.
Plan: {

…
• Heuristic

• Cost based {Parametric, MO, MOP}

• Mixed

• Planner, Resolver, Opmitizer, Executor
• MySQL has also Condition Pushdown

• PostgreSQL has a rich planner

• MySQL plan information lacks of 

information

• PostgreSQL does not provide additional

tools for plan reading.
• Storage Engine

• Planner

• Protocol

• Language

• Ecosystem

• Framework

• WAL

• Transaction Manager

• Source Code availability, documentation both user and internal,
community, etc.
• Client Protocol

• Replication Protocol

• Logical/Binary

• Coordination Protocol

• HA protocol

• Gossip

• Consensus {RAFT, Paxos}

• …
• Storage Engine

• Planner

• Protocol

• Language

• Ecosystem

• Framework

• WAL

• Source Code availability, documentation both user and
internal, community, etc.
• Client Protocol

• Replication Protocol

• Coordination Protocol

• HA protocol

• Gossip

• Consensus {RAFT, paxos}

• …
• No standard

• JSON is becoming more present

(thankfully)

• Absence of internal consensus
• Storage Engine

• Planner

• Protocol

• Language

• Ecosystem

• Framework

• WAL

• Transaction Manager

• Source Code availability, documentation both user and internal,
community, etc.
• Abstract all relation algebra

• SQL != Relational

• NULLs

• Column Alignment

• Subquery

• Mixed implementations

• Relational is conceptually unable to

return more than 1 result set.
• Storage Engine

• Planner

• Protocol

• Language

• Ecosystem

• Framework

• WAL

• Transaction Manager

• Source Code availability, documentation both user and internal,
community, etc.
• Abstract all relation algebra

• SQL != Relational

• NULLs

• Column Alignment

• Subquery

• Mixed implementations

• Relational is conceptually unable to

return more than 1 result set.
• Standard

• Backward Compatibility

• Modern
What do
we want?
Postgres95 -> PostgreSQL
“Postgres original implementation was in QUEL and
its organization resembles to many of the concepts
of the original ER model. COPY is a inherited piece
from this prior implementation.”
• Storage Engine

• Planner

• Protocol

• Language

• Ecosystem
• Framework

• WAL

• Transaction Manager

• Source Code availability, documentation both user and internal,
community, etc.
• Single Provider or fake Open source

• Community contribution or 

Social Entropy Experiment

• Satellite companies building tools

• Satellite companies building forks

• Satellite coders copy pasting 

• Tons of under-proven libraries
• Storage Engine

• Planner

• Protocol

• Language

• Ecosystem
• Framework

• WAL

• Transaction Manager

• Source Code availability, documentation both user and internal,
community, etc.
• Multi-database tools tend to fail 

awesomely

• Choose tools that are integrated with

the core and that have frequent updates

• Bug fixing tied to community times

• bugs.mysql.com

• Postgres uses mailing list 

• Clickhouse/Cockroach use GH
• Storage Engine

• Planner

• Protocol

• Language

• Ecosystem

• Framework
• WAL

• Transaction Manager

• Source Code availability, documentation both user and internal,
community, etc.
• Core extensibility plugins or extensions

• Customize Planner

• Manage protocol

• Creating workers

• Creating own types
• Storage Engine

• Planner

• Protocol

• Language

• Ecosystem

• Framework
• WAL

• Transaction Manager

• Source Code availability, documentation both user and internal,
community, etc.
• Complex, generally in C.

• Multi-provider packages.
• Storage Engine

• Planner

• Protocol

• Language

• Ecosystem

• Framework

• WAL

• Transaction Manager

• Source Code availability, documentation both user and internal,
community, etc.
• WAL or Redo

• MySQL has undo log, but only for

rollback space.

• Postgres has extensions for rewind

(pg_rewind)

• It can reside on the Storage Engine

or higher layers

• It’s local and provides consistency and

durability

• Distributed WALs or Certification log could

be in this group, although there will be 

always a WAL.
• Storage Engine

• Planner

• Protocol

• Language

• Ecosystem

• Framework

• WAL

• Transaction Manager

• Source Code availability, documentation both user and internal,
community, etc.
• It can be at node level or cluster level

• Concept of source and origin

• Group Replication

• Logical Replication

• Concept of Global Id

• Centralized Commits are possible through 

Kafka brokers

• Functional sharing must relay on node

try level

• Serializable only supported by Postgres

• Uncommitted only supported by InnoDB
Other components or
capabilities
• Access Methods (B-Tree, L-Tree, Reverse, Hash)

• FTS (Full Text Search) and advanced search

• Geo capabilities
Entity Consistency
at Scale
Replication, Sharding and HA.
What is in the land of single
leader engines?
• Async

• Semi-synchronous replication

• First node response, as in MySQL.

• Simple Synchronous replication

• Quorum Synchronous

• Postgres
What is in the land of distributed/
multileader[less] engines?
• Asynchronous Multi Leader replication

• BDR

• Snapshot Isolation

• Galera (MySQL layer on top InnoDB)

• Serializability

• CockroachDB (2PC to a consensus group, with Hybrid Logical Clock, not strict serial)

• VoltDB

• External consistency

• Google Spanner (through True Time clocks).
The [full] architecture
Service Check HTTP
Replication Worker /

Certification /

Tx Coordination
Client Worker
Internal Pooling /

Thread Management /

Process per worker
External Pooling
Executor
• Write Quorum

• Single Leader

• Multi Leader

• Group Replication

• Inter node coordination

• Distributed Transactions

• Conflict-Free Replicated

Datatype (LWW, 2PC set, 

etc)
• Consensus for HA

• Also in the entry points if

external
• Centralized Commit
The status of horizontal
scalability in OSDBs
• Non native support for distributed consensus.

• Only MySQL has Global identifiers and recently supported
Group Replication.

• There are extensions/forks for providing sharding in
Postgres and MySQL.
SandBox
• https://gitlab.com/3manuek/HA_PoC

• https://gitlab.com/ongresinc/testing-pg-ha-solutions
References
• Designing Data-Intensive Applications (Martin
Kleppmann)

• Database Reliability Engineering (L. Campbell/C. Majors)
Thank you!
@3manuek

3manuek [at] gmail {dot} com

More Related Content

What's hot

Serialization and performance by Sergey Morenets
Serialization and performance by Sergey MorenetsSerialization and performance by Sergey Morenets
Serialization and performance by Sergey Morenets
Alex Tumanoff
 
Dancing with the elephant h base1_final
Dancing with the elephant   h base1_finalDancing with the elephant   h base1_final
Dancing with the elephant h base1_final
asterix_smartplatf
 

What's hot (20)

Spotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great SuccessSpotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great Success
 
Serialization and performance by Sergey Morenets
Serialization and performance by Sergey MorenetsSerialization and performance by Sergey Morenets
Serialization and performance by Sergey Morenets
 
Ballerina- A programming language for the networked world
Ballerina- A programming language for the networked worldBallerina- A programming language for the networked world
Ballerina- A programming language for the networked world
 
Stardog 1.1: An Easier, Smarter, Faster RDF Database
Stardog 1.1: An Easier, Smarter, Faster RDF DatabaseStardog 1.1: An Easier, Smarter, Faster RDF Database
Stardog 1.1: An Easier, Smarter, Faster RDF Database
 
Hive & HBase For Transaction Processing
Hive & HBase For Transaction ProcessingHive & HBase For Transaction Processing
Hive & HBase For Transaction Processing
 
Apache Content Technologies
Apache Content TechnologiesApache Content Technologies
Apache Content Technologies
 
Pulsar Summit Asia - Structured Data Stream with Apache Pulsar
Pulsar Summit Asia - Structured Data Stream with Apache PulsarPulsar Summit Asia - Structured Data Stream with Apache Pulsar
Pulsar Summit Asia - Structured Data Stream with Apache Pulsar
 
Enabling real interactive BI on Hadoop
Enabling real interactive BI on HadoopEnabling real interactive BI on Hadoop
Enabling real interactive BI on Hadoop
 
lessons from managing a pulsar cluster
 lessons from managing a pulsar cluster lessons from managing a pulsar cluster
lessons from managing a pulsar cluster
 
Dancing with the elephant h base1_final
Dancing with the elephant   h base1_finalDancing with the elephant   h base1_final
Dancing with the elephant h base1_final
 
HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017
 
Serialization (Avro, Message Pack, Kryo)
Serialization (Avro, Message Pack, Kryo)Serialization (Avro, Message Pack, Kryo)
Serialization (Avro, Message Pack, Kryo)
 
Rr embedded systems linux system programming and kernel internals
Rr embedded systems   linux system programming and kernel internalsRr embedded systems   linux system programming and kernel internals
Rr embedded systems linux system programming and kernel internals
 
Embedded systems training India - Linux system programming and kernel intern...
Embedded systems training India  - Linux system programming and kernel intern...Embedded systems training India  - Linux system programming and kernel intern...
Embedded systems training India - Linux system programming and kernel intern...
 
Feb 2013 HUG: Large Scale Data Ingest Using Apache Flume
Feb 2013 HUG: Large Scale Data Ingest Using Apache FlumeFeb 2013 HUG: Large Scale Data Ingest Using Apache Flume
Feb 2013 HUG: Large Scale Data Ingest Using Apache Flume
 
Strata feb2013
Strata feb2013Strata feb2013
Strata feb2013
 
Practical Kerberos with Apache HBase
Practical Kerberos with Apache HBasePractical Kerberos with Apache HBase
Practical Kerberos with Apache HBase
 
Apache con2016final
Apache con2016final Apache con2016final
Apache con2016final
 
HBaseCon 2012 | HBase Filtering - Lars George, Cloudera
HBaseCon 2012 | HBase Filtering - Lars George, ClouderaHBaseCon 2012 | HBase Filtering - Lars George, Cloudera
HBaseCon 2012 | HBase Filtering - Lars George, Cloudera
 
De-Mystifying the Apache Phoenix QueryServer
De-Mystifying the Apache Phoenix QueryServerDe-Mystifying the Apache Phoenix QueryServer
De-Mystifying the Apache Phoenix QueryServer
 

Similar to Open Source SQL Databases

Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Manik Surtani
 
Drill architecture 20120913
Drill architecture 20120913Drill architecture 20120913
Drill architecture 20120913
jasonfrantz
 

Similar to Open Source SQL Databases (20)

High Performance Systems in Go - GopherCon 2014
High Performance Systems in Go - GopherCon 2014High Performance Systems in Go - GopherCon 2014
High Performance Systems in Go - GopherCon 2014
 
Dissecting Scalable Database Architectures
Dissecting Scalable Database ArchitecturesDissecting Scalable Database Architectures
Dissecting Scalable Database Architectures
 
High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)
 
Distributed Logging Architecture in the Container Era
Distributed Logging Architecture in the Container EraDistributed Logging Architecture in the Container Era
Distributed Logging Architecture in the Container Era
 
Distributed Logging Architecture in Container Era
Distributed Logging Architecture in Container EraDistributed Logging Architecture in Container Era
Distributed Logging Architecture in Container Era
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
Presto: Fast SQL on Everything
Presto: Fast SQL on EverythingPresto: Fast SQL on Everything
Presto: Fast SQL on Everything
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
 
ROS - an open-source Robot Operating System
ROS - an open-source Robot Operating SystemROS - an open-source Robot Operating System
ROS - an open-source Robot Operating System
 
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
 
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
 
Modern software architectures - PHP UK Conference 2015
Modern software architectures - PHP UK Conference 2015Modern software architectures - PHP UK Conference 2015
Modern software architectures - PHP UK Conference 2015
 
Drill at the Chicago Hug
Drill at the Chicago HugDrill at the Chicago Hug
Drill at the Chicago Hug
 
Cool NoSQL on Azure with DocumentDB
Cool NoSQL on Azure with DocumentDBCool NoSQL on Azure with DocumentDB
Cool NoSQL on Azure with DocumentDB
 
Apache Geode Meetup, London
Apache Geode Meetup, LondonApache Geode Meetup, London
Apache Geode Meetup, London
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
An introduction to Pincaster
An introduction to PincasterAn introduction to Pincaster
An introduction to Pincaster
 
Drill architecture 20120913
Drill architecture 20120913Drill architecture 20120913
Drill architecture 20120913
 

More from Emanuel Calvo (9)

Demystifying postgres logical replication percona live sc
Demystifying postgres logical replication percona live scDemystifying postgres logical replication percona live sc
Demystifying postgres logical replication percona live sc
 
Pgbr 2013 fts
Pgbr 2013 ftsPgbr 2013 fts
Pgbr 2013 fts
 
Pgbr 2013 postgres on aws
Pgbr 2013   postgres on awsPgbr 2013   postgres on aws
Pgbr 2013 postgres on aws
 
PostgreSQL and Sphinx pgcon 2013
PostgreSQL and Sphinx   pgcon 2013PostgreSQL and Sphinx   pgcon 2013
PostgreSQL and Sphinx pgcon 2013
 
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAYPostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
 
LSWC PostgreSQL 9.1 (2011)
LSWC PostgreSQL 9.1 (2011)LSWC PostgreSQL 9.1 (2011)
LSWC PostgreSQL 9.1 (2011)
 
Admon PG 1
Admon PG 1Admon PG 1
Admon PG 1
 
Monitoreo de MySQL y PostgreSQL con SQL
Monitoreo de MySQL y PostgreSQL con SQLMonitoreo de MySQL y PostgreSQL con SQL
Monitoreo de MySQL y PostgreSQL con SQL
 
Osol Pgsql
Osol PgsqlOsol Pgsql
Osol Pgsql
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 

Open Source SQL Databases

  • 4. Who and what is about? • Emanuel Calvo, currently at OnGres as a PostgreSQL Consultant and ayres.io as _root_. • Working on Modern techniques for DBRE. • What is the current status of the Open Source SQL databases per component? • What’s the good, the bad and the ugly in the market?
  • 5. ER model Entity-Relationship and why SQL isn’t considered so.
 At least in its pure state.
  • 6. The ER Map • Needs a First-Order logic language for retrieving data. • Relational Algebra • Tuple and Domain Relational Calculus.
  • 7. The model example • Obscures everything behind the complexity of the storage. • It is represented as relational algebra, but is hidden from you. • How to select the names of the people of "Black" team?
  • 8. Some SQL:2011 tangent distinctions • Support NULLs • Support SubQueries • Column precedence affects (horizontal alignment) depending on the engine • SQL/MED • Is a declarative language • Hides all the complexity of the executions to the end user • Planners were very advanced already.
  • 10. The Entity Consistency • CAP Theorem (Consistency, Availability and Partition Tolerance). PACELC adds to choose between [L]atency and [C]onsistency. • ACID (Atomicity, Consistency, Isolation and Durability) • BASE (Basically Available, Soft State, Eventual consistency)
  • 11. The chosen We grab them by the storage and use them 
 wisely without paying money to Oracle.
  • 12. • CockroachDB • PostgreSQL • MySQL / MariaDB • Clickhouse • MongoDB
  • 14. • Storage Engine • Planner • Protocol • Language • Ecosystem • Framework • WAL • Transaction Manager • Source Code availability, documentation both user and internal, community, etc. • Buffer Management • IO method (Direct/io, fsync) • Transaction Management (storage layer) • Point in Time Recovery and Undo Log • For distributed engines you want to 
 read Jepsen tests. • Is the sauce
  • 15. Wide-range Storage Engine Map Columnar Based Tuple Based Leveled Structured 
 Map Tree
  • 16. Quick cherry pick • Fast for aggregations • Easy for parallelization • Better compression due to ColBased • Better to scale massive amount of data • Bloom filters • Sparse indexes by design • Avoid Write Amplification • Index-based storage • More disk efficient, more CPU • Better for concurrency • Hard to scale • Better when manipulating entities
 atomically • Balance between performance and
 concurrency.
  • 17. – Jorge de Lanús Oeste (maneja Uber pero sabe mucho de Bases de Datos) “Relational databases require a Query Optimizer/ Query Planner for translating the first-order logic language to relational algebra and other optimizations. The result is called Execution Plan.”
  • 18. • Storage Engine • Planner • Protocol • Language • Ecosystem • Framework • WAL • Transaction Manager • Source Code availability, documentation both user and internal, community, etc. Plan: {
 … • Heuristic • Cost based {Parametric, MO, MOP} • Mixed • Planner, Resolver, Opmitizer, Executor
  • 19. • Storage Engine • Planner • Protocol • Language • Ecosystem • Framework • WAL • Transaction Manager • Source Code availability, documentation both user and internal, community, etc. Plan: {
 … • Heuristic • Cost based {Parametric, MO, MOP} • Mixed • Planner, Resolver, Opmitizer, Executor • MySQL has also Condition Pushdown • PostgreSQL has a rich planner • MySQL plan information lacks of 
 information • PostgreSQL does not provide additional
 tools for plan reading.
  • 20. • Storage Engine • Planner • Protocol • Language • Ecosystem • Framework • WAL • Transaction Manager • Source Code availability, documentation both user and internal, community, etc. • Client Protocol • Replication Protocol • Logical/Binary • Coordination Protocol • HA protocol • Gossip • Consensus {RAFT, Paxos} • …
  • 21. • Storage Engine • Planner • Protocol • Language • Ecosystem • Framework • WAL • Source Code availability, documentation both user and internal, community, etc. • Client Protocol • Replication Protocol • Coordination Protocol • HA protocol • Gossip • Consensus {RAFT, paxos} • … • No standard • JSON is becoming more present
 (thankfully) • Absence of internal consensus
  • 22. • Storage Engine • Planner • Protocol • Language • Ecosystem • Framework • WAL • Transaction Manager • Source Code availability, documentation both user and internal, community, etc. • Abstract all relation algebra • SQL != Relational • NULLs • Column Alignment • Subquery • Mixed implementations • Relational is conceptually unable to
 return more than 1 result set.
  • 23. • Storage Engine • Planner • Protocol • Language • Ecosystem • Framework • WAL • Transaction Manager • Source Code availability, documentation both user and internal, community, etc. • Abstract all relation algebra • SQL != Relational • NULLs • Column Alignment • Subquery • Mixed implementations • Relational is conceptually unable to
 return more than 1 result set. • Standard • Backward Compatibility • Modern What do we want?
  • 24. Postgres95 -> PostgreSQL “Postgres original implementation was in QUEL and its organization resembles to many of the concepts of the original ER model. COPY is a inherited piece from this prior implementation.”
  • 25. • Storage Engine • Planner • Protocol • Language • Ecosystem • Framework • WAL • Transaction Manager • Source Code availability, documentation both user and internal, community, etc. • Single Provider or fake Open source • Community contribution or 
 Social Entropy Experiment • Satellite companies building tools • Satellite companies building forks • Satellite coders copy pasting • Tons of under-proven libraries
  • 26. • Storage Engine • Planner • Protocol • Language • Ecosystem • Framework • WAL • Transaction Manager • Source Code availability, documentation both user and internal, community, etc. • Multi-database tools tend to fail 
 awesomely • Choose tools that are integrated with
 the core and that have frequent updates • Bug fixing tied to community times • bugs.mysql.com • Postgres uses mailing list • Clickhouse/Cockroach use GH
  • 27. • Storage Engine • Planner • Protocol • Language • Ecosystem • Framework • WAL • Transaction Manager • Source Code availability, documentation both user and internal, community, etc. • Core extensibility plugins or extensions • Customize Planner • Manage protocol • Creating workers • Creating own types
  • 28. • Storage Engine • Planner • Protocol • Language • Ecosystem • Framework • WAL • Transaction Manager • Source Code availability, documentation both user and internal, community, etc. • Complex, generally in C. • Multi-provider packages.
  • 29. • Storage Engine • Planner • Protocol • Language • Ecosystem • Framework • WAL • Transaction Manager • Source Code availability, documentation both user and internal, community, etc. • WAL or Redo • MySQL has undo log, but only for
 rollback space. • Postgres has extensions for rewind
 (pg_rewind) • It can reside on the Storage Engine
 or higher layers • It’s local and provides consistency and
 durability • Distributed WALs or Certification log could
 be in this group, although there will be 
 always a WAL.
  • 30. • Storage Engine • Planner • Protocol • Language • Ecosystem • Framework • WAL • Transaction Manager • Source Code availability, documentation both user and internal, community, etc. • It can be at node level or cluster level • Concept of source and origin • Group Replication • Logical Replication • Concept of Global Id • Centralized Commits are possible through 
 Kafka brokers • Functional sharing must relay on node
 try level • Serializable only supported by Postgres • Uncommitted only supported by InnoDB
  • 31. Other components or capabilities • Access Methods (B-Tree, L-Tree, Reverse, Hash) • FTS (Full Text Search) and advanced search • Geo capabilities
  • 33. What is in the land of single leader engines? • Async • Semi-synchronous replication • First node response, as in MySQL. • Simple Synchronous replication • Quorum Synchronous • Postgres
  • 34. What is in the land of distributed/ multileader[less] engines? • Asynchronous Multi Leader replication • BDR • Snapshot Isolation • Galera (MySQL layer on top InnoDB) • Serializability • CockroachDB (2PC to a consensus group, with Hybrid Logical Clock, not strict serial) • VoltDB • External consistency • Google Spanner (through True Time clocks).
  • 35. The [full] architecture Service Check HTTP Replication Worker /
 Certification / Tx Coordination Client Worker Internal Pooling / Thread Management / Process per worker External Pooling Executor • Write Quorum • Single Leader • Multi Leader • Group Replication • Inter node coordination • Distributed Transactions • Conflict-Free Replicated
 Datatype (LWW, 2PC set, 
 etc) • Consensus for HA • Also in the entry points if
 external • Centralized Commit
  • 36. The status of horizontal scalability in OSDBs • Non native support for distributed consensus. • Only MySQL has Global identifiers and recently supported Group Replication. • There are extensions/forks for providing sharding in Postgres and MySQL.
  • 38. References • Designing Data-Intensive Applications (Martin Kleppmann) • Database Reliability Engineering (L. Campbell/C. Majors)