RethinkDB - это распределенное документо-ориентированное хранилище данных с открытым исходным кодом. Данная система ориентирована на разработку систем обработки данных реального времени, позволяя клиентскому приложению подписываться на изменение тех или иных данных.
В данном докладе я бы хотел осветить не только вопросы разработки приложений на базе RethinkDB, но и поговорить о том, как все это работает. Мы поговорим о ReQL (язык запросов), “changefeeds”, индексах, шардинге, репликациях, а также затронем вопросы особенностей проектирования баз данных под данную платформу.
2. Что такое RethinkDB?
• Документно-ориентированная СУБД
• Данные в JSON
• Распределенная
• Open Source (AGPL v.3.0)
3. Зачем нам еще одна СУБД?
• Простой язык запросов
• Приложения реального времени
• Map-Reduce
• Картографические данные
• Шардинг и репликации «из коробки»
• Отказоустойчивость
7. Простые выборки
SELECT city, state
FROM zips
WHERE city LIKE 'C%'
ORDER BY state
r.db('samples')
.table('zips')
.filter(r.row('city')
.match('^C'))
.orderBy('state')
.pluck('city', 'state');
8. Индексы
SELECT *
FROM zips
WHERE state = 'MA'
r.db('samples').table('zips')
.indexCreate('state');
r.db('samples').table('zips')
.getAll('MA', {
index: 'state'
});
9. INNER JOIN
SELECT z.id
, z.city
, z.state_id
, s.name
FROM zips z
INNER JOIN states s ON
z.state_id = s.state_id
r.db('samples').table('zips')
.eqJoin('state',
r.db('samples')
.table('states'))
.pluck({ left: ['id', 'city', 'state' ],
right: ['name']
}).zip()
11. Хранение данных
• B-Tree/BTRFS
• ACID на уровне документа:
• write_acks: majority
• durability: hard
• read_mode: majority
• Changefeeds:
• read_mode только single
12. Доступность и шардинг
• Шардинг только по первичному ключу
• Шард доступен, если доступна большая половина реплик
• 1 шард – минимум 3 реплики
• Максимум 64 шарда
• Кластер – минимум 3 сервера
• Выбор главной реплики
• Ребалансировка шардов
18. Безопасность
• NoSQL NoSecurity
• Используйте только последние версии SDK
• Инъекции регулярных выражений в функции match()
• Инъекции JavaScript в функции js()
• Используйте TLS
• Пустой пароль администратора по умолчанию
• Невозможно установить пароль на web-интерфейс
19. Проект закрывается???
• Компания RethinkDB закрывается
• Текущая команда перешла в Stripe
• Проект будет open source
• Новая команда
• AGPL? Linux Foundation vs Apache Foundation?
• Compose (IBM) будет поддерживать RethinkDB
• http://slack.rethinkdb.com/
The cluster must have three or more servers. The table must be configured to have three or more replicas. A majority (greater than half) of replicas for the table must be available.
Inner and Outer join выполняются на стороне сервера
Выборка по первичному ключу. У каждой записи всегда есть поле id.
Simple indexes based on the value of a single field.
Compound indexes based on multiple fields.
Multi indexes based on arrays of values, created when the multi optional argument is true.
Geospatial indexes based on indexes of geometry objects, created when the geo optional argument is true.
Indexes based on arbitrary expressions.
How does RethinkDB index data?
When the user creates a table, they have the option of specifying the attribute that will serve as the primary key (if the primary key attribute isn’t specified, it defaults to ‘id’). When the user inserts a document into the table, if the document contains the primary key attribute, its value is used to index the document. Otherwise, a random unique ID is generated for the index automatically.
The primary key of each document is used by RethinkDB to place the document into an appropriate shard, and index it within that shard using a B-Tree data structure. Querying documents by primary key is extremely efficient, because the query can immediately be routed to the right shard and the document can be looked up in the B-Tree.
Does RethinkDB support secondary and compound indexes?
RethinkDB supports both secondary and compound indexes, as well as indexes that compute arbitrary expressions.
Сказать про innerJoin и outerJoin (LEFT OUTER JOIN)
Join tables using a field or function on the left-hand sequence matching primary keys or secondary indexes on the right-hand table. eqJoin is more efficient than other ReQL join types, and operates much faster. Documents in the result set consist of pairs of left-hand and right-hand documents, matched when the field on the left-hand side exists and is non-null and an entry with that field’s value exists in the specified index on the right-hand side.
r.js:
Google V8, no external modules (e.g. node.js require), not recommended because of performance. Default timeout 5 seconds.
Минусы:
1. Поддержка ACID, Схема данных – MySQL, Postgresql
2. Пишем много данных – Need high write availability and do not mind dealing with conflicts use Dynamo-like systems like Riak.
Btrfs (B-tree FS, «Better FS» или «Butter FS») — файловая система (ФС) для Linux, основанная на структурах Б-деревьев и работающая по принципу «копирование при записи» (copy-on-write). Опубликована компанией Oracle Corporation в 2007 году под лицензией GNU General Public License (GPL).
Данные –> Кэш –> Журнал –> Диск
durability: possible values are hard and soft. This option will override the table or query’s durability setting. In soft durability mode RethinkDB will acknowledge the write immediately after receiving it, but before the write has been committed to disk.
nonAtomic: if set to true, executes the update and distributes the result to replicas in a non-atomic fashion. This flag is required to perform non-deterministic updates, such as those that require reading data from another table.
Write acknowledgements are set per table with thewrite_acks setting, either using the config command or by writing to the table_config system table. The default is majority, meaning writes will be acknowledged when a majority of (voting) replicas have confirmed their writes. The other possible option is single, meaning writes will be acknowledged when a single replica acknowledges it.
Durability is set per table with the durability setting, again using either reconfigure or writing to the table_config system table. In hard durability mode, writes are committed to disk before acknowledgements are sent; in soft mode, writes are acknowledged immediately after being stored in memory. The soft mode is faster but slightly less resilient to failure. The default is hard.
Read mode is set per query via an optional argument, read_mode (or readMode), to table. It has three possible values:
single returns values that are in memory (but not necessarily written to disk) on the primary replica. This is the default.
majority will only return values that are safely committed on disk on a majority of replicas. This requires sending a message to every replica on each read, so it is the slowest but most consistent.
outdated will return values that are in memory on an arbitrarily-selected replica. This is the fastest but least consistent.
Note that changefeeds will ignore the read_mode flag, and will always behave as if it is set to single.
https://rethinkdb.com/docs/consistency/
Except for brief periods, a table will remain fully available as long as more than half of the voting replicas for each shard and for the table overall are available. If half or more of the voting replicas for a shard are lost, then read or write operations on that shard will fail. If half or more of the voting replicas of a shard are lost and cannot be reconnected, an emergency repair will need to be performed.
Reconfiguring a table (changing the number of shards, rebalancing, etc.) causes brief losses of availability at various points during the reconfiguration.
If the primary replica is lost but more than half of the voting replicas are still available, an arbitrary voting replica will be elected as primary. The new primary will appear in table_status, but the primary_replicafield of table_config will not change. If the old primary ever becomes available again, the system will switch back. When the primary changes there will be a brief period of unavailability.
RethinkDB uses a range sharding algorithm parameterized on the table’s primary key to partition the data. When the user states they want a given table to use a certain number of shards, the system examines the statistics for the table and finds the optimal set of split points to break up the table evenly. All sharding is currently done based on the table’s primary key, and cannot be done based on any other attribute (in RethinkDB the primary key and the shard key are effectively the same thing).
Split points will not automatically be changed after table creation, which means that if the primary keys are unevenly distributed, shards may become unbalanced. However, the user can manually rebalance shards when necessary, as well as reconfigure tables with new sharding and replication settings. Users cannot set split points for shards manually.
Минус: Распределенные вычисления – Hadoop
Если нужно подтверждение о доставке - RabbitMQ
changefeedQueueSize: the number of changes the server will buffer between client reads before it starts dropping changes and generates an error (default: 100,000).
squash:
True - When multiple changes to the same document occur before a batch of notifications is sent, the changes are “squashed” into one change. The client receives a notification that will bring it fully up to date with the server.
False - All changes will be sent to the client verbatim. This is the default.
N - A numeric value (floating point). Similar to true, but the server will wait n seconds to respond in order to squash as many changes together as possible, reducing network traffic. The first batch will always be returned immediately.
Только изменения документа или таблицы. Нельзя получать изменения по inner или outer join.