© 2020 Percona.
1
Peter Zaitsev
17 Things Developers should know about
Open Source Databases
CEO, Percona
Open Source 101
Columbia,SC March 3rd, 2020
© 2020 Percona.
2
Lets start with some Humor
© 2020 Percona.
3
About Percona
Open Source Database Solutions Company
Support, Managed Services, Consulting, Training, Engineering
Focus on MySQL, MariaDB, MongoDB, PostgreSQL
Support Cloud DBaaS Variants on major clouds
Develop Database Software and Tools
Release Everything as 100% Free and Open Source
© 2020 Percona.
4
5,000,000+ downloads 175,000+ downloads 4,500,000+ downloads
450,000+ downloads 2,000,000+ downloads 1,500,000+ downloads
Widely Deployed Open Source Software
© 2020 Percona.
5
Freshly Released
Percona Distribution for PostgreSQL
© 2020 Percona.
6
Who Are you ?
More
Developer ?
More OPS ?
© 2020 Percona.
7
Operations
Focused on Database (DBA etc)
Generalist (Sysadmin, SRE, etc)
© 2020 Percona.
8
Programming Language
What Programming
Languages does your
team use ?
© 2020 Percona.
9
Devs vs Ops
Is there Separation
between Devs and Ops in
your Organization ?
© 2020 Percona.
10
Any Tension ?
Any Tension Between
Devs and Ops ?
© 2020 Percona.
11
Devs vs Ops
DevOps suppose to have solved it but tension is still
common between Devs and Ops
Especially with Databases which are often special
snowflake
Especially with larger organizations
© 2020 Percona.
12
Large Organizations
Ops vs Ops have conflict
too
© 2020 Percona.
13
Devs vs Ops Conflict
Devs
• Why is this stupid
database always the
problem.
• Why can’t it just work and
work fast
Ops
• Why do not learn schema
design
• Why do not you write
optimized queries
• Why do not you think
about capacity planning
© 2020 Percona.
14
Database Responsibility
Shared Responsibility for
Ultimate Success
© 2020 Percona.
15
Top Recommendations for Developers
© 2020 Percona.
16
Learn Database Basics
You can’t build great database powered applications if you do not understand
how databases work
Schema Design
Power of the Database Language
How Database Executes the Query
© 2020 Percona.
17
Example of Relational Database Schema
© 2020 Percona.
18
Query Execution Diagram
© 2020 Percona.
19
EXPLAIN
https://dev.mysql.com/doc/refman/8.0/en/execution-plan-information.html
© 2020 Percona.
20
Visualizing PostgreSQL Plan with PEV
http://tatiyants.com/postgres-query-plan-visualization/
© 2020 Percona.
21
Which Queries are Causing the Load
© 2020 Percona.
22
Why Are they Causing this Load
© 2020 Percona.
23
How to Improve their Performance
© 2020 Percona.
24
Check out PMM
http://pmmdemo.percona.com
PMM v 2 is now GA
© 2020 Percona.
25
How are Queries Executed ?
Single Threaded
Single Node
Distributed
© 2020 Percona.
26
Indexes
Indexes are
Must
Indexes are
Expensive
© 2020 Percona.
27
Capacity Planning
No Database can handle “unlimited scale”
Scalability is very application dependent
Trust Measurements more than Promises
Can be done or can be done Efficiently ?
© 2020 Percona.
28
Vertical and Horizontal Scaling
© 2020 Percona.
29
Scalable != Efficient
The Systems which promote a scalable can be less efficient
Hadoop, Cassandra, TiDB are great examples
By only the wrong thing you can get in trouble
© 2020 Percona.
30
TiDB Scalability (Single Node)
© 2020 Percona.
31
TiDB Efficiency
© 2020 Percona.
32
Throughput != Latency
If I tell you system can do
100.000 queries/sec
would you say it is fast ?
© 2020 Percona.
33
Latency Numbers to Know
Very cool Interactive Diagram
https://colin-scott.github.io/personal_website/research/interactive_latency.html
© 2020 Percona.
34
Speed of Light Limitations
High Availability Design Choices
You want instant durable replication over wide geography or Performance ?
Understanding Difference between High Availability and Disaster Recovery
protocols
Network Bandwidth is not the same as Latency
© 2020 Percona.
35
Wide Area Networks
Tend not to be as stable as Local Area Networks (LAN)
Expect increased Jitter
Expect Short term unavailability
© 2020 Percona.
36
Also Understand
Connections to the database are expensive
Especially if doing TLS Handshake
Query Latency Tends to Add Up
Especially on real network and not your laptop
© 2020 Percona.
37
ORM (Object-Relational-Mapping)
Allows Developers to query the database without need to understand SQL
Can create SQL which is very inefficient
Learn SQL Generation “Hints”, Learn JPQL/HQL advanced features
Be ready to manually write SQL if there is no other choice
© 2020 Percona.
38
Do not Leave Transactions Open
Open Connection is rather inexpensive
Transaction open for Long Time can get very expensive (even if it performed no writes)
Isolation Mode Matters
SET AUTOCOMMIT=0 - Any SELECT query will Open Transaction
COMMIT/ROLLBACK closes transaction
© 2020 Percona.
39
Understanding Optimal Concurrency
© 2020 Percona.
40
Embrace Queueing
Request Queueing is Normal
With requests coming at “Random Arrivals” some queueing will happen with
any system scale
Should not happen to often or for very long
Queueing is “Cheaper” Close to the User
© 2020 Percona.
41
Benefits of Connection Pooling
Avoiding
Connection
Overhead,
especially TLS/SSL
Avoiding using
Excessive Number
of Database
Connections
Multiplexing/Load
Management
© 2020 Percona.
42
Configuring Connection Pool
Default and Maximum Connection Pool Size
Scaling Parameters
Combined Connection Pool Max Size should be smaller than number of
connections database can support
Waiting for free connection to become available is OK
© 2020 Percona.
43
Law of Gravity
Shitty Application at
scale will bring down any
Database
© 2020 Percona.
44
Scale Matters
Developing and Testing with Toy Database is risky
Queries Do not slow down linearly
The slowest query may slow down most rapidly
© 2020 Percona.
45
Memory or Disk
Data Accessed in memory is much faster than on disk
It is true even with modern SSDs
SSD accesses data in large blocks, memory does not
Fitting data in Working Set
© 2020 Percona.
46
Newer is not Always Faster
Upgrading to the
new
Software/Hardware
is not always faster
Test it out
Defaults Change are
often to blame
© 2020 Percona.
47
Upgrades are needed but not seamless
Major Database Upgrades often
require application changes
Having Conversation on Application
Lifecycle is a key
© 2020 Percona.
48
Character Sets
Performance Impact
Pain to Change
Wrong Character Set can cause Data Loss
© 2020 Percona.
49
Character Sets
https://per.co.na/MySQLCharsetImpact
© 2020 Percona.
50
Less impact In MySQL 8
© 2020 Percona.
51
Operational Overhead
Operations Take Time, Cost Money, Cause Overhead
10TB Database Backup ?
Adding The Index to Large Table ?
© 2020 Percona.
52
Distributed Systems
10x+ More Complicated
Better High Availability
Many Failure Scenarios
Test how application performs
© 2020 Percona.
53
Risks of Automation
Automation is
Must
Mistakes can
destroy
database at scale
© 2020 Percona.
54
Security
Database is where the most sensitive
data tends to live
Shared Devs and Ops Responsibility
© 2020 Percona.
55
What Else
What Would you Add ?
© 2020 Percona.
56
Choosing the Open Source Database
© 2020 Percona.
57
Open Source Databases
General Classes
License
Popularity
Choices
© 2020 Percona.
58
General Classes
© 2020 Percona.
59
Mobile/Embeddable
SQLite
Couchbase
Lite
© 2020 Percona.
60
Client-Server
MySQL PostgreSQL MongoDB
Cassandra Redis Clickhouse
© 2020 Percona.
61
Other Names
Serverless Cloud Native
© 2020 Percona.
62
Purpose
Operational
Analytical
HTAP (Hybrid Transactional/Analytical Processing)
© 2020 Percona.
63
Purpose
Database of Record
Supplemental Database
© 2020 Percona.
64
Data Model
Relational (SQL)
Non-Relational (No-SQL)
© 2020 Percona.
65
NoSQL Models
Document/Object Model
Key-Value
Graph
Wide Column
Hybrid/Multi-Model
Data Structure
© 2020 Percona.
66
Consistency Model
“Strong
Consistency”
(ACID)
“Weak
Consistency”
(BASE etc)
© 2020 Percona.
67
License
© 2020 Percona.
68
Great Article to Read!
https://arstechnica.com/gadgets/2020/02/how-
to-choose-an-open-source-license/
© 2020 Percona.
69
Open Source
There is a lot of confusion
around Open Source.
Intentional Strategy by
some Vendors
© 2020 Percona.
70
Open Source
Truly Open
Source
Kind of
Open Source
Open Source
Compatible
© 2020 Percona.
71
Truly Open Source
OSI: https://opensource.org/osd-annotated
GNU: https://www.gnu.org/philosophy/free-sw.en.html
DEBIAN: https://www.debian.org/social_contract#guidelines
© 2020 Percona.
72
Peter’s Practical Question
What will I have to give
up if I stop commercial
relationship with a
vendor ?
© 2020 Percona.
73
Kind of Open Source Software
Open Core
Shared Source Software
Eventually Open Source Software
© 2020 Percona.
74
Open Source Compatible Software
Honest Proprietary
Software claiming
compatibility with Open
Source Technology
© 2020 Percona.
75
“Hotel California Compatibility”
© 2020 Percona.
76
Types of Open Source Licenses
Copyleft
•GPL, AGPL
Permissive
•BSD, Apache,
MIT
Public Domain
•Not Really a
License
© 2020 Percona.
77
Percona’s Approach
All Software Percona
Distributes is 100% Free
and Open Source (except
Percona Server MongoDB
is Source Available – SSPL)
© 2020 Percona.
78
Popularity
© 2020 Percona.
79
Why Popularity is Important ?
Easier to Find Talent and Information
Typically Higher Quality
Better Integration
More Vendor Choices (Easier to avoid vendor Lock-in)
© 2020 Percona.
80
Top Databases
Source: https://db-engines.com/en/ranking
© 2020 Percona.
81
Developers love Open Source Databases
Most Used databases per Stack Overflow Survey (2019)
Source: https://insights.stackoverflow.com/survey/2019
© 2020 Percona.
82
Love Databases are not same as Used
Databases Most Loved by Developers
© 2020 Percona.
83
Choices
© 2020 Percona.
84
Databases not Database
© 2020 Percona.
85
Note
Most Large Organizations Run Multiple Databases
Many Organizations Like to Limit the Sprawl
Grow Number of Databases Wisely
© 2020 Percona.
86
Popular NoSQL Categories
• RedisKey Value Store
• MongoDBDocument Store
• CassandraWide Column Store
• ElasticSearch Engine
• Neo4jGraph
• InfluxDBTime Series
© 2020 Percona.
87
Cross-Polynation
MySQL, PostgreSQL add Document
Storage Support
Many databases provide SQL or SQL-
like language support
© 2020 Percona.
88
Q1: Application Data Model
What Database your
application Data Model and
Operations Bests maps to ?
© 2020 Percona.
89
Not too Obvious Things to Consider
Application Life Cycle and
Development Process
© 2020 Percona.
90
Q2: Scale
© 2020 Percona.
91
Understanding Scaling
Any Technology Scales,
Just with different
efficiency and amount of
pain
© 2020 Percona.
92
Q3: It is not just about development
Technology Long Term Viability
Availability of Commercial Support
Corporate Policies
Compliance (Security, Auditing etc)
© 2020 Percona.
93
Reasons to choose MySQL
Database
which powers
the Web
A lot of
Operational
Experience
A lot of Talent
Available
© 2020 Percona.
94
Reasons to chose PostgeSQL
Permissive License
Supports advances
SQL Language,
Simpler Oracle
Migration
Has Great
Momentum
© 2020 Percona.
95
Reasons to chose MongoDB
Developers want
Document, not
SQL Database
Great
Programming
Language Support
Built-in Sharding
© 2020 Percona.
96
Check Out http://per.co.na/careers
© 2020 Percona.
97
Thank You!
Twitter: @percona @peterzaitsev

17 Things Developers Should Know About Databases