3. Who am I?
Name: Kazuho Oku (奥一穂)
Original Developer of Palmscape / Xiino
The oldest web browser for Palm OS
Working at Cybozu Labs since 2005
Research subsidiary of Cybozu, Inc. in Japan
3
Apr 22 2009 Using Q4M
4. About Cybozu, Inc.
Japan’s largest groupware vendor
Mostly provides as software products, not as
services
Some of our apps bundle MySQL as storage
4
Apr 22 2009 Using Q4M
5. About Pathtraq
Started in Aug. 2007
Web ranking service
One of Japan’s largest
like Alexa, but semi-realtime, and per-page
running on MySQL
Need for a fast and reliable message
relay
for communication between the main server and
content analysis server(s)
5
Apr 22 2009 Using Q4M
6. Design Goals of Q4M
Robust
Do not lose data on OS crash or power failure
Fast
Transfer thousands of messages per second
Easy to Use
Use SQL for access / maintenance
Integration into MySQL
no more separate daemons to take care of
6
Apr 22 2009 Using Q4M
7. What is a Message Queue?
7
Apr 22 2009 Using Q4M
8. What is a Message Queue?
Middleware for persistent asynchronous
communication
communicate between fixed pairs (parties)
a.k.a. Message Oriented Middleware
MQ is intermediate storage
RDBMS is persistent storage
Senders / receivers may go down
8
Apr 22 2009 Using Q4M
9. Minimal Configuration of a MQ
Senders and receivers access a single
queue
Sender Receiver
Queue
9
Apr 22 2009 Using Q4M
10. MQ and Relays
Separate queue for sender and receiver
Messages relayed between queues
Relay
Sender Receiver
Queue Queue
10
Apr 22 2009 Using Q4M
11. Merits of Message Relays
Destination can be changed easily
Relays may transfer messages to different
locations depending on their headers
Robustness against network failure
no loss or duplicates when the relay fails
Logging and Multicasting, etc.
11
Apr 22 2009 Using Q4M
12. Message Brokers
Publish / subscribe model
Separation between components and their
integration
Components read / write to predefined queues
Integration is definition of routing rules between
the message queues
Messages are often transformed (filtered) within
the relay agent
12
Apr 22 2009 Using Q4M
13. What about Q4M?
Q4M itself is a message queue
Can connect Q4M instances to create a
message relay
Provides API for creating message relays
and brokers
13
Apr 22 2009 Using Q4M
14. Performance of Q4M
over 7,000 mess/sec.
message size: avg. 512 bytes
syncing to disk
Outperforming most needs
if you need more, just scale out
Can coexist with other storage engines without
sacrificing their performance
see http://labs.cybozu.co.jp/blog/kazuhoatwork/2008/06/q4m_06_release_and_benchmarks.php
14
Apr 22 2009 Using Q4M
16. Asynchronous Updates
Mixi (Japan's one of the largest SNS)
uses Q4M to buffer writes to DB, to
offload peak demands
from http://alpha.mixi.co.jp/blog/?p=272
16
Apr 22 2009 Using Q4M
17. Connecting Distant Servers
Pathtraq uses Q4M to create a relay
between its database and content
analysis processes
→ Contents to be analyzed →
Content
MySQL conn. Analysis
Pathtraq
over SSL,gzip Processes
DB
← Results of the analysis ←
17
Apr 22 2009 Using Q4M
18. To Prefetch Data
livedoor Reader (web-based feed
aggregator) uses Q4M to prefetch data
from database to memcached
uses Q4M for scheduling web crawlers
as well
from http://d.hatena.ne.jp/mala/20081212/1229074359
18
Apr 22 2009 Using Q4M
19. Scheduling Web Crawlers
Web crawlers with retry-on-error
Sample code included in Q4M dist.
If failed to fetch, store URL in retry queue
Store Result
Read URL
Spiders
URL
DB
Request Queue Retry Queue
Re-
scheduler
19
Apr 22 2009 Using Q4M
23. Installing Q4M
Compatible with MySQL 5.1
Download from q4m.31tools.com
Binary releases available for some platforms
Installing from source:
requires source code of MySQL
./configure && make && make install
run support-files/install.sql
23
Apr 22 2009 Using Q4M
24. Configuration Options of Q4M
--with-sync=no|fsync|fdatasync|fcntl
Controls synchronization to disk
default: fdatasync on linux
--enable-mmap
Mmap’ed reads lead to higher throughput
default: yes
--with-delete=pwrite|msync
msyncrecommended on linux>=2.6.20 if you
need really high performance
24
Apr 22 2009 Using Q4M
26. The Model
Various publishers write to queue
Set of subscribers consume the entries in queue
Publisher
Q4M table
Publisher
Subscribers
Publisher
26
Apr 22 2009 Using Q4M
27. Creating a Q4M Table
ENGINE=QUEUE creates
mysql> CREATE TABLE qt (
-> id int(10) unsigned NOT NULL,
a Q4M table
-> message varchar(255) NOT NULL
-> ) ENGINE=QUEUE;
Query OK, 0 rows affected (0.42 sec)
No primary keys or
indexes
Sorted by insertion
order (it’s a queue)
27
Apr 22 2009 Using Q4M
28. Modifying Data on a Q4M Table
No restrictions for
mysql> INSERT INTO qt (id,message)
-> VALUES
INSERT and DELETE
-> (1,'Hello'),
-> (2,'Bonjour'),
-> (3,'Hola');
No support for UPDATE Query OK, 3 rows affected (0.02 sec)
mysql> SELECT * FROM qt;
+----+---------+
| id | message |
+----+---------+
| 1 | Hello |
| 2 | Bonjour |
| 3 | Hola |
+----+---------+
3 rows in set (0.00 sec)
28
Apr 22 2009 Using Q4M
29. SELECT from a Q4M Table
Works the same as
mysql> SELECT * FROM qt;
+----+---------+
| id | message |
other storage +----+---------+
| 1 | Hello |
engines | 2 | Bonjour |
| 3 | Hola |
+----+---------+
SELECT COUNT(*) is 3 rows in set (0.00 sec)
cached mysql> SELECT COUNT(*) FROM qt;
+----------+
| COUNT(*) |
+----------+
| 3|
+----------+
1 row in set (0.00 sec)
How to subscribe to a queue?
29
Apr 22 2009 Using Q4M
30. Calling queue_wait()
After calling, only one
mysql> SELECT * FROM qt;
+----+---------+
| id | message |
row becomes visible +----+---------+
| 1 | Hello |
from the connection | 2 | Bonjour |
| 3 | Hola |
+----+---------+
3 rows in set (0.00 sec)
mysql> SELECT queue_wait('qt');
+------------------+
| queue_wait('qt') |
+------------------+
| 1|
+------------------+
1 row in set (0.00 sec)
mysql> SELECT * FROM qt;
+----+---------+
| id | message |
+----+---------+
| 1 | Hello |
+----+---------+
1 row in set (0.00 sec)
30
Apr 22 2009 Using Q4M
31. OWNER Mode and NON-OWNER Mode
In OWNER mode, only the OWNED row
is visible
OWNED row becomes invisible from other connections
rows of other storage engines are visible
NON-OWNER Mode OWNER Mode
queue_wait()
1,'Hello' 1,'Hello'
2,'Bonjour'
queue_end()
3,'Hola'
queue_abort()
31
Apr 22 2009 Using Q4M
32. Returning to NON-OWNER mode
By calling
mysql> SELECT QUEUE_ABORT();
+---------------+
queue_abort, the
| QUEUE_ABORT() |
+---------------+
connection returns to
| 1|
+---------------+
NON-OWNER mode
1 row in set (0.00 sec)
mysql> SELECT * FROM qt;
+----+---------+
| id | message |
+----+---------+
| 1 | Hello |
| 2 | Bonjour |
| 3 | Hola |
+----+---------+
3 rows in set (0.01 sec)
32
Apr 22 2009 Using Q4M
33. Consuming a Row
By calling
mysql> SELECT queue_wait('qt');
(snip)
mysql> SELECT * FROM qt;
queue_end, the OWNED +----+---------+
| id | message |
row is deleted, and +----+---------+
| 1 | Hello |
connection returns to
+----+---------+
1 row in set (0.01 sec)
NON-OWNER mode mysql> SELECT queue_end();
+-------------+
| queue_end() |
+-------------+
| 1|
+-------------+
1 row in set (0.01 sec)
mysql> SELECT * FROM qt;
+----+---------+
| id | message |
+----+---------+
| 2 | Bonjour |
| 3 | Hola |
+----+---------+
2 rows in set (0.00 sec)
33
Apr 22 2009 Using Q4M
34. Writing a Subscriber
Call two functions: queue_wait, queue_end
Multiple subscribers can be run concurrently
each row in the queue is consumed only once
while (true) {
SELECT queue_wait('qt'); # switch to owner mode
rows := SELECT * FROM qt; # obtain data
if (count(rows) != 0) # if we have any data, then
handle_row(rows[0]); # consume the row
SELECT queue_end(); # erase the row from queue
}
34
Apr 22 2009 Using Q4M
35. Writing a Subscriber (cont'd)
Or call queue_wait as a condition
Warning: conflicts with trigger-based insertions
while (true) {
rows := SELECT * FROM qt WHERE queue_wait('qt');
if (count(rows) != 0)
handle_row(rows[0]);
SELECT queue_end();
}
35
Apr 22 2009 Using Q4M
36. The Model – with code
INSERT INTO queue ...
Publisher
while (true) {
rows := SELECT * FROM qt
WHERE queue_wait('qt');
if (count(rows) != 0)
handle_row(rows[0]);
INSERT INTO queue ...
SELECT queue_end();
}
Q4M table
Publisher
Subscribers
INSERT INTO queue ...
Publisher
36
Apr 22 2009 Using Q4M
38. queue_wait(table)
Enters OWNER mode
0〜1 row becomes OWNED
Enters OWNER mode even if no rows were
available
Default timeout: 60 seconds
Returns 1 if a row is OWNED (0 on timeout)
If called within OWNER mode, the
owned row is deleted
38
Apr 22 2009 Using Q4M
39. Revisiting Subscriber Code
Calls to queue_end just before
queue_wait can be omitted
while (true) {
rows := SELECT * FROM qt WHERE queue_wait('qt');
if (count(rows) != 0)
handle_row(rows[0]);
SELECT queue_end();
}
39
Apr 22 2009 Using Q4M
40. Conditional queue_wait()
Consume rows of certain condition
Rows that do not match will be left untouched
Only numeric columns can be checked
Fast - condition tested once per each row
examples:
SELECT queue_wait('table:(col_a*3)+col_b<col_c');
SELECT queue_wait('table:retry_count<5');
40
Apr 22 2009 Using Q4M
41. queue_wait(tbl_cond,[tbl_cond…,timeout])
Accepts multiple tables and timeout
Data searched from leftmost table to
right
Returns table index (the leftmost table is
1) of the newly owned row
Returns zero if no rows are being owned
example:
SELECT queue_wait('table_A','table_B',60);
41
Apr 22 2009 Using Q4M
42. Functions for Exiting OWNER Mode
queue_end
Deletes the owned row and exits OWNER mode
queue_abort
Releases (instead of deleting) the owned row and
exits OWNER mode
Close of a MySQL connection does the same thing
42
Apr 22 2009 Using Q4M
44. The Problem
Relay (or router) consists of more than 3
processes, 2 conns
No losses, no duplicates on crash or
disconnection
Q4M Table Q4M Table
Relay Program
(source) (dest.)
44
Apr 22 2009 Using Q4M
45. Internal Row ID
Every row have a internal row ID
invisible from Q4M table definition
monotonically increasing 64-bit integer
Used for detecting duplicates
Use two functions to skip duplicates
Data loss prevented by using queue_wait /
queue_end
45
Apr 22 2009 Using Q4M
46. queue_rowid()
Returns row ID of the OWNED row (if
any)
Returns NULL if no row is OWNED
Call when retrieving data from source
46
Apr 22 2009 Using Q4M
47. queue_set_srcid(src_tbl_id, mode, src_row_id)
Call before inserting a row to destination
table
Checks if the row is already inserted into the
table, and ignores next INSERT if true
Parameters:
src_tbl_id - id to determine source table (0〜63)
mode - quot;aquot; to drop duplicates, quot;wquot; to reset
src_row_id - row ID obtained from source table
47
Apr 22 2009 Using Q4M
48. Pseudo Code
Relays data from src_tbl to dest_tbl
while (true) {
# wait for data
SELECT queue_wait(src_tbl) =>src_db;
# read row and rowid
row := (SELECT * FROM src_tbl =>src_db);
rowid := (SELECT queue_rowid() =>src_db);
# insert the row after setting srcid
SELECT queue_set_srcid(src_tbl_id, 'a', rowid) =>dest_db;
INSERT INTO dest_tbl (row) =>dest_db;
48
Apr 22 2009 Using Q4M
49. q4m-forward
Simple forwarder script
installed into mysql-dir/bin
usage: q4m-forward [options] src_addrdest_addr
example:
% support-files/q4m-forward
quot;dbi:mysql:database=db1;table=tbl1;user=foo;password=XXXquot;
quot;dbi:mysql:database=db2;table=tbl2;host=bar;user=fooquot;
options:
--reset reset duplicate check info.
--sender=idx slot no. used for checking duplicates (0..63, default: 0)
--help
49
Apr 22 2009 Using Q4M
51. Things that Need to be Fixed
Table compactions is a blocking
operation
runs when live data becomes <25% of log file
very bad, though not as bad as it seems
it's fast since it's a sequential write operation
Relays are slow
since transfer is done row-by-row
Binlog does not work
since MQ replication should be synchronous
51
Apr 22 2009 Using Q4M
52. Future of Q4M
2-phase commit with other storage
engines (maybe)
queue consumption and InnoDB updates can
become atomic operation
52
Apr 22 2009 Using Q4M
53. Thank you
http://q4m.31tools.com/
53
Apr 22 2009 Using Q4M