Allyourbase

Adventures in Building a Database
or
Abstraction is All We Have
Alex Scotti
Bloomberg LP

• Comdb2 is
• A Highly Available Clustered Relational
Database System
• Developed at Bloomberg
• Uses much open source, portions of
BerkeleyDB, SQLite, others
• Much custom code
• Stores 95% of all the data in Bloomberg

• Looking back on what made Comdb2 a success at
Bloomberg, I saw 4 big abstractions that we got
right
• Interestingly enough, I see only 1 of these
abstractions is common place in every system
today
• We started building a system to meet the goals I’ll
be referring to today as abstractions
• With the larger goal of letting application code be
simpler, faster to write, and more reliable

Abstraction is key
• Raise the level of abstraction as much as possible
(before performance becomes absolutely
unacceptable)
• Theres always a ‘different way’ to solve any problem
• Try to be in a spot where 95% of applications don't
need that different way
• Inflection point - system can never be fast enough for
everything. Chasing last percent involves usability
sacrifices making things worse for other 99!

Abstraction 1: Relational
Model
• We started off in 2004 building what would now
be recognized as a NoSQL system
• It had no schemas
• It had no data types
• It had almost nothing, aside from High Availability
• Clients could store anything, very “flexible”

• “Flexible” quickly becomes a euphemism for
“fragile” or “dangerous” or “a huge mess”
• Programs are storing binary data in the database?
• Do they all agree on how it’s encoded?
• Did the app writers understand endian issues?
• What about alignment issues?
• How do they change things when the apps are all
so tightly coupled now?

• How do you query your data?
• Write programs to navigate through the data row
by row using explicit indexes
• That’s hard
• Also fragile. What if we change the index
structure?
• That’s also slow. Lots of round trips to a remote
server

• We quickly realized the abstraction provided by
SQL was not to be ignored! (To be fair, all the
NoSQL systems are now realizing this also)
• Applications could be written with 1 line of code
expressing the same logic as what took hundreds
before
• Without errors!
• Without fragile coupling to database layout!
• With improved performance!

• “Rigid” schemas are a GOOD THING
• Some will have you believe “rigid” means “inflexible”
but it’s actually the opposite!
• Having strong typing and well defined schemas
allows for safe, flexible changes to running systems
without breaking things or downtime
• Pervasive type conversion whenever possible is key
• View “data types” as just another form of “constraint”
- The type has nothing to do with the data

• Abstract away the representation of data as
much as possible (Codd rules 8,9)
• Abstract away the physical locations of data as
much as possible (Codd rule 11)
• Support online schema changes for ALL
POSSIBLE CHANGES
• Don’t ever make a table unavailable, or read
only

Abstraction 2: Perfect
Availability
• What does it mean to be highly available?
• Let’s start by understanding what it means to not
be highly available
• What things do we take for granted working with
“non distributed systems?”
• Let’s imagine the world’s worst programming
language

INTEGER I
ASSIGN: I=5
RC = GETASSIGNMENTRC()
IF (RC = 0)
PRINT I
ELSE
GOTO ASSIGN
5

INTEGER I, J
ASSIGN: I = 4
IF (RC = 0)
J = I * 3
ELSE
GOTO ASSIGN
PRINT J
8

INTEGER I, J
ASSIGN: I = 4
IF (RC = 0)
MULT: J = I * 3
RC2 = GETMULTRC()
IF RC2 != 0
GOTO MULT
ELSE
GOTO ASSIGN
PRINT J
Now we loop forever…
Looks like multiplication
is down again

INTEGER I, J
ASSIGN: I = 4
IF (RC = 0)
MULT: J = I * 3
RC2 = GETMULTRC()
IF RC2 != 0
RETRY++
IF (RETRY < 10)
GOTO MULT // RETRY IT
ELSE
// IT’S DOWN LET’S DO BY HAND
X = 0
WHILE (X < 3)
J = J + I
X++
ENDWHILE
ELSE
GOTO ASSIGN
PRINT J
Oh jeez..
I hope loops are
working today.
I forgot to check

• This is silly
• Nobody would use such a terrible language!
• Yet this is exactly what programs that use
unreliable services start to look like
• Not even exaggerating
• Error handling and “fall back strategies”
dominate even the simplest applications if they
need to be “robust.”

• How to simplify?
• Make systems more available
• Let’s make a guarantee that the system will come back if you retry
enough
• Eliminate need for “alternate strategies” in applications
• And that’s the current state of affairs of HA databases
• Good, but not great. Lots of error handing and retrying in every
app
• Great thing about code that doesn't run except rarely?
• It never works

• HA Database contract
• When you get a good return code from commit,
your data is stored ‘durably.’ (hopefully in more
than one place)
• If the server you are talking to goes down,
another one should be available (soon or now)
for you to connect to and hopefully you should
still find your data there

• Let’s simply it further
• What if we transparently reconnect to other
servers when one fails and guarantee that the
data will be there?
• We just simplified the apps even more.
• Now they get occasional bad return codes and
call some database ‘retry’
• Still ugly code, but harder to screw up

• Can we do better?
• The “perfect availability” abstraction!
• Delete all the error handing and retrying from
your applications, let’s assume the DB is as
reliable as multiplication
• DB won’t give you an unexpected error from
server failure anymore (even when the server
fails)

• HA SQL
• Client transparently negotiates point in time when it
connects
• Client transparently reconnects to other node on
failure, using point in time to get back to exactly
where it was
• Client transparently requests in flight SQL (SELECT)
to be resumed after the EXACT ROW last delivered
• Client transparently re-issues writes (INSERT, etc)

• No bad return code back to the application from
any possible state of a transaction
• If in the middle of running a query
• If uncommitted writes
• If packet lost on COMMIT request/response
• No HA SQL database currently able to do this,
aside from Comdb2

• ACID
• What does the D mean?
• After you COMMIT you “cant lose the data.”
• Are you “durable” if you need to wait for the system to
come back after a crash?
• What if you need to “swing” to a backup server?
• What if the data center exploded? Did you lose the
data?
• HA really is just another way of looking at D.

Abstraction 3: No
Concurrency
• Concurrency causes many problems for
applications
• Very hard to reason about, very easy to make
errors
• The ideal system to program to (not the fastest!)
is one with no concurrency at all

• Serializability Theory
• We want to have concurrency in our database for
performance reasons
• We don’t want to have concurrency problems
• Systems that don't have concurrency by definition
have no concurrency problems
• If we can show a system that has concurrency to be
somehow “equivalent” to the one with no concurrency
then it too has no concurrency problems!

• Equivalent Histories
• Consider the “output” of the database a “history”
• A system with no concurrency can only produce a limited set of
possible histories from a given set of input
• Why more than one?
• Concurrent requests to the system may execute in non
deterministic order even though one at a time
• If the system WITH concurrency produces one of those histories, it
runs that workload with no concurrency problems
• If the system with concurrency produces a history that could have
come from the non concurrent system for ALL POSSIBLE INPUTS
then the concurrent system has no concurrency problems!

• A system like that is called “serializable”
• The system is concurrent but to the application
(user) acts like a system that has no
concurrency
• The abstraction is the system has no
concurrency

• Serializable systems are simple to reason about
and easy to write applications for
• Test / “prove” that your application works
correctly with one user - then it works fine as it
scales up
• at least it doesn’t have concurrency problems!

UPDATE dots SET color =
‘black’ WHERE color =
‘white’
UPDATE dots SET color =
‘white’ WHERE color =
‘black’
black white black white black white
black black black black black black
white white white white white white
OR
white black white black white black
VS

• Comdb2 not the only serializable SQL DB
• However…
• In a cluster of PostrgreSQL, not serializable on
any nodes other than “master.”
• In a cluster of Percona MySQL, not serializable
on any node
• Full read/write serializability for any app
connected to any node in a Comdb2 cluster

Abstraction 4: One big single
computer
• Single System Image (SSI)
• Distributed systems often “reveal” their nature to
application programmers
• Not desirable. Programmers don’t want to
think about the fact that software is running on
multiple machines
• What goes wrong?

• X=3
• Write X=4
Read X
• Do you get 4?
• What if your read happened on a different
machine?
• What if you told someone else to do that read?
• What if they did that read on a different machine?

• Not going to torture you again with WORLDS
WORST PROGRAMMING LANGUAGE
• but you can guess where this would go
• digresses into insanity quickly

• Most clustered database systems fail the simple
“read follows writes” test in one way or another
• PostgreSQL with full sync replication fails when
read occurs on other machine
• Percona MySQL Cluster fails when other
program told to perform read

• Comdb2 passes
• Clusterwide coherency model ensures that
after commit, data will always be visible to any
process on any computer
• How?

• On commit, wait for LSN to be acked by all nodes in
cluster
• Only ack an LSN after
• You have received the data
• You have processed lock list in the commit record
• Obtained all the locks needed to ensure that no
external observer can prove that this transaction
was NOT COMMITTED
• Racing reads will block on locks

• The data was NOT committed
• The btrees in memory containing the data were
NOT updated
• The log records containing the descriptions of the
changes to these btrees were NOT even
processed
• We ack back immediately after grabbing the
locks which are listed in the commit record

• “On commit, wait for LSN to be acked by all nodes
in cluster”
• But…
• We claimed High (perfect even!) Availability as a
design goal
• This design causes endless blocking when
nodes are down
• And total performance to be equivalent to the
worst performing machine!

• Coherency Algorithm:
• Wait for Commit LSN to be acked by all nodes
• But don’t wait forever
• Just wait longer than heartbeat time
• Get back first ack.. get back more acks..
• Use each time each node took so far in a heuristic to calculate how
long this should take in total
• Cull the outliers, don’t wait for them
• Drop connection to them!
• Mark them ‘incoherent’ and don’t wait for them when in this state

• On node that lost connection:
• Mark yourself as ‘incoherent’
• Refuse to serve ANY requests
• not read only - no service
• “read only” tempting but wrong
• write followed by read gets stale data when the read ends up
on node that just got marked incoherent!
• If you’re crashed - well you are crashed, you don't do anything
• If you're alive, get back into the cluster. Follow protocol to
become coherent and serve requests again

• Becoming coherent
• Same protocol used when a node starts cold or when it
recovers from being marked incoherent (transient
timeout)
• Watch the LSNs the cluster is up to on the nodes
• When your LSN is getting ‘close’, request other node to
‘loop back’ an internal transaction, waiting for the LSN to
be acked
• If that succeeds (without timeout!) then you can be
marked coherent (you are processing live data) and you
now can service requests

4 Great Abstractions
• Physical / Logical Independence (Relational
Model)
• Perfect Availability (HA SQL)
• No Concurrency (serializable)
• One single computer (SSI)
• Allows for simplified application logic, more
reliable applications, faster deployment

Allyourbase

More Related Content

What's hot

Viewers also liked

Similar to Allyourbase

Recently uploaded

Allyourbase