In my talk I will focus on a practical use case: task queue
application, using Tarantool as an application server and a
database.
The idea of the task queue is that producers put tasks (objects)
into a queue, and consumers take tasks, perform them, mark as
completed.
The queue must guarantee certain properties: if a consumer failed,
a task should return to the queue automatically, a task can't be
taken by more than one consumer, priorities on tasks should be
satisfied.
With Tarantool, a task queue is a distributed networked
application: there are multiple consumer/producer endpoints
(hosts) through which a user can interact with the queue.
The queue itself is a fault-tolerant distributed database:
every task is stored in Tarantool database and replicated
in multiple copies.
If a machine goes down, the state of a task is tracked on a
replica, and the user can continue working with the
queue through a replica.
Total power failure is also not an issue, since tasks are stored
persistently on disk with transactional semantics.
Performance of such an application is in hundreds of thousands of
transactions per second.
At the same time, the queue is highly customizable, since it's
written entirely in Lua, is a Lua rock, but the code is running
inside the database. This is the strength of Lua:
one size doesn't have to fit all, and you don't have to sacrifice
performance if you need customization.
The second part of the talk will be about implementation details,
performance numbers, a performance comparison with other queue
products (beanstalkd, rabbitmq) in particular, and an overview
of the implementation from language bindings point of view: how we
make database API available in Lua, what are the challenges and
performance hurdles of such binding.
My talk about Tarantool and Lua at Percona Live 2016
1. Tarantool - a Lua based database
engine and in-memory execution
grid
http://try.tarantool.org kostja@tarantool.org
2. Spoiler
● Tarantool is an open source in-memory database
● Try it at http://try.tarantool.org
● read more at http://bit.ly/1ShfmZD and
http://bit.ly/1QiGvcf
5. Maintaining ACID: isolation
● Isolation — concurrent execution of transactions
results in a state that would be obtained if
transactions were executed serially
● A schedule — a possible history of transaction
execution, establishing the order in which data
change operations occurred
Let X, Y, Z be data items
E = r1[x] w1[x] w2[y] r2[z]
6. If t1 uses X ensure X doesn't change until t1 ends
● Concurrent transactions work with disjoint sets of data
● The order in which a data item is concurrently modified
is restricted by locking
Two-Phase Locking Theorem: If all transactions in an
execution are two-phase locked, then the execution is
serializable.
Isolation: a classic answer
7. Fallacy of caching
page header
modification log
page trailer
page directory
compressed
data
BLOB pointers
empty space
page header
page trailer
row offset array
row rowrow
Row
row
row
row rowrow
trx id
field 1
roll pointer
field pointers
field 2 field n
8. Parallel computing is difficult
Classical databases use threads and locking for
concurrency:
● limited scalability
Classical clients use syncrhonous network protocols
● but we need http/2.0 for databases
9. Solution
● make the database 100% RAM resident
● static transactions run serially in a dedicated
thread
● No need for locking, latching!
● 1024 cores-ready: begin sharding even on a single
host
10. Maintaining the write ahead log
● t1 wrote X and began commit I/O
● t2 starts, reads X and begins a commit
● t1 commit fails (I/O error)
→ we need to roll back t2 on roll back of t1
(cascading rollback)
14. With shared state:
● locking (hello deadlocks, hotspots, convoying, starvation,
priority inversion) ← not composable
● wait-free algorithms ← limited parallelism
Without shared state:
● hardware transactional memory ← still is not here
● functional programming ← not for databases
● actor model ← yes!
Approaching concurrency
16. ● green threads
● CPU efficient memory management
● memory efficient data structures
● complex indexing (B-tree, R-tree)
● the actor runtime is available to developers
Actor model in Tarantool
17. ● server side scripting in Lua, C
● rich standard library:
I/O, JSON, crypto, http,
crypto, ...
● fibers & channels
● triggers
→ freedom to ship code to data
Not-in-your-database features
18. ● A copy-cat of beanstalkd API
● queue.put(), take(), ack(), delete(), release(),
bury(), kick(), peek()
● Important problems of queue management are taken care of:
– task priorities, timeouts, time to live
– poisoned tasks
– nested queues
Read the full docs at https://github.com/tarantool/queue
Application: queues
22. ● in-memory database is its own species
● it takes numerous insights and years of R&D to create
● in the end we have a fair 10x performance speed up in
certain types of workloads
● all of the above is spiced up in Tarantool with rich
application development functions
● the result is available at http://download.tarantool.org
Summary