PGroonga & Zulip Powered by Rabbit 2.2.1
PGroonga
&
Zulip
Kouhei Sutou ClearCode Inc.
Zulip & PGroonga Night
2017-09-06
PGroonga & Zulip Powered by Rabbit 2.2.1
PGroonga
Pronunciation: píːzí:lúnɡά
読み方:ぴーじーるんが
PostgreSQL extension
PostgreSQLの拡張機能
Fast full text search
高速全文検索機能
All languages are supported!
全言語対応!
PGroonga & Zulip Powered by Rabbit 2.2.1
Fast?(高速?)
Need to measure to confirm
確認するには測定しないと
Targets(測定対象)
textsearch (built-in)(組み込み)
pg_bigm (third party)(外部プロダク
ト)
PGroonga & Zulip Powered by Rabbit 2.2.1
PGroona and textsearch
0
0.2
0.4
0.6
0.8
1
1.2
1.4
PostgreSQL OR MySQL database America
Data: English Wikipedia
(Many records and large docs)
N records: About 5.3millions
Average text size: 6.4KiB
Elapsedtime(ms)
(Shorterisbetter)
Query
PGroonga textsearch
PGroonga & Zulip Powered by Rabbit 2.2.1
As fast as textsearch
textsearchと同じくらいの速さ
textsearch uses word based
full text search
textsearchは単語ベースの全文検索実装
PostgreSQL has enough
performance for the approach
PostgreSQLはこの方法では十分な性能を出せる
PGroonga & Zulip Powered by Rabbit 2.2.1
textsearch and Japanese
textsearchと日本語
Asian languages including
Japanese aren't supported
日本語を含むアジア圏の言語は非サポート
Need plugin(プラグインが必要)
Plugin exists but isn't
maintained
プラグインはあるがメンテナンスされていない
PGroonga & Zulip Powered by Rabbit 2.2.1
Japanese support
日本語対応
Need one of them(どちらかが必要)
N-gram approach support
N-gramというやり方のサポート
Japanese specific word based
approach support
日本語を考慮した単語ベースのやり方のサポート
PGroonga supports both of them
PGroonga & Zulip Powered by Rabbit 2.2.1
PostgreSQL and N-gram
PostgreSQLとN-gram
PostgreSQL is slow with N-
gram approach
PostgreSQLでN-gramというやり方を使うと遅い
N-gram approach:
pg_trgm (contrib)
Japanese isn't supported by default
デフォルトでは日本語に対応していない
pg_bigm (third-party)
PGroonga & Zulip Powered by Rabbit 2.2.1
PGroona and pg_bigm
0
0.5
1
1.5
2
2.5
3
311 14706 20389
Data: Japanese Wikipedia
(Many records and large documents)
N records: About 0.9millions
Average text size: 6.7KiB
Fast Fast
Elapsedtime(sec)
(Lowerisbetter)
N hits
PGroonga pg_bigm
PGroonga & Zulip Powered by Rabbit 2.2.1
PGroonga is fast stably
PGroongaは安定して速い
PostgreSQL needs "recheck"
for N-gram approach
PostgreSQLはN-gramのときは「recheck」が必要
Seq search after index search
インデックスサーチのあとにシーケンシャルサーチ
PGroonga doesn't need
PGroongaでは必要ない
Only index search
インデックスサーチだけでOK
PGroonga & Zulip Powered by Rabbit 2.2.1
Wrap up
まとめ
textsearch is fast but
Asian langs aren't supported
textsearchは速いけどアジア圏の言語を未サポート
pg_bigm supports Japanese
but is slow for large hits
pg_bigmは日本語対応だがヒット数が多くなると遅い
PGroonga is fast and
supports all languages
PGroongaは速くて全言語対応
PGroonga & Zulip Powered by Rabbit 2.2.1
FYI: textsearch, PGroonga
and Groonga
0
0.2
0.4
0.6
0.8
1
1.2
1.4
PostgreSQL OR MySQL database America
Data: English Wikipedia
(Many records and large docs)
N records: About 5.3millions
Average text size: 6.4KiB
Groonga is 30x faster than others
Elapsedtime(ms)
(Shorterisbetter)
Query
PGroonga Groonga textsearch
PGroonga & Zulip Powered by Rabbit 2.2.1
Zulip and PGroonga
Zulip uses textsearch by
default
Zulipはデフォルトでtextsearchを使用
Japanese isn't supported
日本語非対応
Zulip supports PGroonga as
option
ZulipでPGroongaも使うこともできる
Implemented by me
私が実装
PGroonga & Zulip Powered by Rabbit 2.2.1
Zulip: full text search
Zulipと全文検索
Zulip is chat tool
Zulipはチャットツール
Latency is important for UX
UX的にレイテンシーは大事
Index update is heavy
インデックスの更新は重い
Delay index update
インデックスの更新を後回しにしている
PGroonga & Zulip Powered by Rabbit 2.2.1
Delay index update
インデックス更新を後回し
CREATE TABLE zerver_message (
rendered_content text,
-- ... ↓Column for full text search
search_tsvector tsvector
); -- ↓Index for full text search
CREATE INDEX zerver_message_search_tsvector
ON zerver_message
USING gin (search_tsvector);
PGroonga & Zulip Powered by Rabbit 2.2.1
Delay index update
インデックス更新を後回し
-- Execute append_to_fts_update_log() on change
CREATE TRIGGER
zerver_message_update_search_tsvector_async
BEFORE INSERT OR UPDATE OF rendered_content
ON zerver_message
FOR EACH ROW
EXECUTE PROCEDURE append_to_fts_update_log();
PGroonga & Zulip Powered by Rabbit 2.2.1
Delay index update
インデックス更新を後回し
-- Insert ID to fts_update_log table
CREATE FUNCTION append_to_fts_update_log()
RETURNS trigger
LANGUAGE plpgsql AS $$
BEGIN
INSERT INTO fts_update_log (message_id)
VALUES (NEW.id);
RETURN NEW;
END
$$;
PGroonga & Zulip Powered by Rabbit 2.2.1
Delay index update
インデックス更新を後回し
-- Keep ID to be updated
CREATE TABLE fts_update_log (
id SERIAL PRIMARY KEY,
message_id INTEGER NOT NULL
);
PGroonga & Zulip Powered by Rabbit 2.2.1
Delay index update
インデックス更新を後回し
-- Execute do_notify_fts_update_log()
-- on INSERT
CREATE TRIGGER fts_update_log_notify
AFTER INSERT ON fts_update_log
FOR EACH STATEMENT
EXECUTE PROCEDURE
do_notify_fts_update_log();
PGroonga & Zulip Powered by Rabbit 2.2.1
Delay index update
インデックス更新を後回し
-- NOTIFY to fts_update_log channel!
CREATE FUNCTION do_notify_fts_update_log()
RETURNS trigger
LANGUAGE plpgsql AS $$
BEGIN
NOTIFY fts_update_log;
RETURN NEW;
END
$$;
PGroonga & Zulip Powered by Rabbit 2.2.1
Delay index update
インデックス更新を後回し
cursor.execute("LISTEN ftp_update_log") # Wait
cursor.execute("SELECT id, message_id FROM fts_update_log")
ids = []
for (id, message_id) in cursor.fetchall():
cursor.execute("UPDATE zerver_message SET search_tsvector = "
"to_tsvector('zulip.english_us_search', "
"rendered_content) "
"WHERE id = %s", (message_id,))
ids.append(id)
cursor.execute("DELETE FROM fts_update_log WHERE id = ANY(%s)",
(ids,))
PGroonga & Zulip Powered by Rabbit 2.2.1
PGroonga: index update
PGroongaとインデックス更新
PGroonga's index update is
fast too
PGroongaはインデックス更新も速い
PGroonga's search while
index update is still fast
PGroongaはインデックス更新中の検索も速い
PGroonga & Zulip Powered by Rabbit 2.2.1
Perf characteristics
性能の傾向
Searchthroughput
Update throughput
PGroonga
Searchthroughput
Update throughput
GIN
Keep
search performance
while many updates
Decrease
search performance
while updating
PGroonga & Zulip Powered by Rabbit 2.2.1
Update and lock
更新とロック
Update without read locks
参照ロックなしで更新
Write locks are required
書き込みロックは必要
PGroonga & Zulip Powered by Rabbit 2.2.1
GIN: Read/Write
GIN:読み書き
Conn1
Conn2
INSERT
start
SELECT
start
Blocked
INSERT
finish
SELECT
finish
GIN
Slow down!
PGroonga & Zulip Powered by Rabbit 2.2.1
PGroonga: Read/Write
PGroonga:読み書き
Conn1
Conn2
INSERT
start
SELECT
start
INSERT
finish
SELECT
finish
PGroonga
No slow down!
PGroonga & Zulip Powered by Rabbit 2.2.1
Wrap up
まとめ
Zulip: Low latency for UX
ZulipはUXのために低レイテンシーをがんばっている
Delay index update
インデックスの更新は後回し
PGroonga: Keeps fast search
with update
PGroongaは更新しながらでも高速検索を維持
Chat friendly characteristics
チャット向きの特性
PGroonga & Zulip Powered by Rabbit 2.2.1
More PGroonga features
PGroongaの機能いろいろ
Query expansion(クエリー展開)
Support synonyms(同義語検索をサポート)
Similar search(類似文書検索)
Find similar messages
類似メッセージ検索
Fuzzy search(あいまい検索)
Stemming(ステミング)

PGroonga & Zulip

  • 1.
    PGroonga & ZulipPowered by Rabbit 2.2.1 PGroonga & Zulip Kouhei Sutou ClearCode Inc. Zulip & PGroonga Night 2017-09-06
  • 2.
    PGroonga & ZulipPowered by Rabbit 2.2.1 PGroonga Pronunciation: píːzí:lúnɡά 読み方:ぴーじーるんが PostgreSQL extension PostgreSQLの拡張機能 Fast full text search 高速全文検索機能 All languages are supported! 全言語対応!
  • 3.
    PGroonga & ZulipPowered by Rabbit 2.2.1 Fast?(高速?) Need to measure to confirm 確認するには測定しないと Targets(測定対象) textsearch (built-in)(組み込み) pg_bigm (third party)(外部プロダク ト)
  • 4.
    PGroonga & ZulipPowered by Rabbit 2.2.1 PGroona and textsearch 0 0.2 0.4 0.6 0.8 1 1.2 1.4 PostgreSQL OR MySQL database America Data: English Wikipedia (Many records and large docs) N records: About 5.3millions Average text size: 6.4KiB Elapsedtime(ms) (Shorterisbetter) Query PGroonga textsearch
  • 5.
    PGroonga & ZulipPowered by Rabbit 2.2.1 As fast as textsearch textsearchと同じくらいの速さ textsearch uses word based full text search textsearchは単語ベースの全文検索実装 PostgreSQL has enough performance for the approach PostgreSQLはこの方法では十分な性能を出せる
  • 6.
    PGroonga & ZulipPowered by Rabbit 2.2.1 textsearch and Japanese textsearchと日本語 Asian languages including Japanese aren't supported 日本語を含むアジア圏の言語は非サポート Need plugin(プラグインが必要) Plugin exists but isn't maintained プラグインはあるがメンテナンスされていない
  • 7.
    PGroonga & ZulipPowered by Rabbit 2.2.1 Japanese support 日本語対応 Need one of them(どちらかが必要) N-gram approach support N-gramというやり方のサポート Japanese specific word based approach support 日本語を考慮した単語ベースのやり方のサポート PGroonga supports both of them
  • 8.
    PGroonga & ZulipPowered by Rabbit 2.2.1 PostgreSQL and N-gram PostgreSQLとN-gram PostgreSQL is slow with N- gram approach PostgreSQLでN-gramというやり方を使うと遅い N-gram approach: pg_trgm (contrib) Japanese isn't supported by default デフォルトでは日本語に対応していない pg_bigm (third-party)
  • 9.
    PGroonga & ZulipPowered by Rabbit 2.2.1 PGroona and pg_bigm 0 0.5 1 1.5 2 2.5 3 311 14706 20389 Data: Japanese Wikipedia (Many records and large documents) N records: About 0.9millions Average text size: 6.7KiB Fast Fast Elapsedtime(sec) (Lowerisbetter) N hits PGroonga pg_bigm
  • 10.
    PGroonga & ZulipPowered by Rabbit 2.2.1 PGroonga is fast stably PGroongaは安定して速い PostgreSQL needs "recheck" for N-gram approach PostgreSQLはN-gramのときは「recheck」が必要 Seq search after index search インデックスサーチのあとにシーケンシャルサーチ PGroonga doesn't need PGroongaでは必要ない Only index search インデックスサーチだけでOK
  • 11.
    PGroonga & ZulipPowered by Rabbit 2.2.1 Wrap up まとめ textsearch is fast but Asian langs aren't supported textsearchは速いけどアジア圏の言語を未サポート pg_bigm supports Japanese but is slow for large hits pg_bigmは日本語対応だがヒット数が多くなると遅い PGroonga is fast and supports all languages PGroongaは速くて全言語対応
  • 12.
    PGroonga & ZulipPowered by Rabbit 2.2.1 FYI: textsearch, PGroonga and Groonga 0 0.2 0.4 0.6 0.8 1 1.2 1.4 PostgreSQL OR MySQL database America Data: English Wikipedia (Many records and large docs) N records: About 5.3millions Average text size: 6.4KiB Groonga is 30x faster than others Elapsedtime(ms) (Shorterisbetter) Query PGroonga Groonga textsearch
  • 13.
    PGroonga & ZulipPowered by Rabbit 2.2.1 Zulip and PGroonga Zulip uses textsearch by default Zulipはデフォルトでtextsearchを使用 Japanese isn't supported 日本語非対応 Zulip supports PGroonga as option ZulipでPGroongaも使うこともできる Implemented by me 私が実装
  • 14.
    PGroonga & ZulipPowered by Rabbit 2.2.1 Zulip: full text search Zulipと全文検索 Zulip is chat tool Zulipはチャットツール Latency is important for UX UX的にレイテンシーは大事 Index update is heavy インデックスの更新は重い Delay index update インデックスの更新を後回しにしている
  • 15.
    PGroonga & ZulipPowered by Rabbit 2.2.1 Delay index update インデックス更新を後回し CREATE TABLE zerver_message ( rendered_content text, -- ... ↓Column for full text search search_tsvector tsvector ); -- ↓Index for full text search CREATE INDEX zerver_message_search_tsvector ON zerver_message USING gin (search_tsvector);
  • 16.
    PGroonga & ZulipPowered by Rabbit 2.2.1 Delay index update インデックス更新を後回し -- Execute append_to_fts_update_log() on change CREATE TRIGGER zerver_message_update_search_tsvector_async BEFORE INSERT OR UPDATE OF rendered_content ON zerver_message FOR EACH ROW EXECUTE PROCEDURE append_to_fts_update_log();
  • 17.
    PGroonga & ZulipPowered by Rabbit 2.2.1 Delay index update インデックス更新を後回し -- Insert ID to fts_update_log table CREATE FUNCTION append_to_fts_update_log() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN INSERT INTO fts_update_log (message_id) VALUES (NEW.id); RETURN NEW; END $$;
  • 18.
    PGroonga & ZulipPowered by Rabbit 2.2.1 Delay index update インデックス更新を後回し -- Keep ID to be updated CREATE TABLE fts_update_log ( id SERIAL PRIMARY KEY, message_id INTEGER NOT NULL );
  • 19.
    PGroonga & ZulipPowered by Rabbit 2.2.1 Delay index update インデックス更新を後回し -- Execute do_notify_fts_update_log() -- on INSERT CREATE TRIGGER fts_update_log_notify AFTER INSERT ON fts_update_log FOR EACH STATEMENT EXECUTE PROCEDURE do_notify_fts_update_log();
  • 20.
    PGroonga & ZulipPowered by Rabbit 2.2.1 Delay index update インデックス更新を後回し -- NOTIFY to fts_update_log channel! CREATE FUNCTION do_notify_fts_update_log() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN NOTIFY fts_update_log; RETURN NEW; END $$;
  • 21.
    PGroonga & ZulipPowered by Rabbit 2.2.1 Delay index update インデックス更新を後回し cursor.execute("LISTEN ftp_update_log") # Wait cursor.execute("SELECT id, message_id FROM fts_update_log") ids = [] for (id, message_id) in cursor.fetchall(): cursor.execute("UPDATE zerver_message SET search_tsvector = " "to_tsvector('zulip.english_us_search', " "rendered_content) " "WHERE id = %s", (message_id,)) ids.append(id) cursor.execute("DELETE FROM fts_update_log WHERE id = ANY(%s)", (ids,))
  • 22.
    PGroonga & ZulipPowered by Rabbit 2.2.1 PGroonga: index update PGroongaとインデックス更新 PGroonga's index update is fast too PGroongaはインデックス更新も速い PGroonga's search while index update is still fast PGroongaはインデックス更新中の検索も速い
  • 23.
    PGroonga & ZulipPowered by Rabbit 2.2.1 Perf characteristics 性能の傾向 Searchthroughput Update throughput PGroonga Searchthroughput Update throughput GIN Keep search performance while many updates Decrease search performance while updating
  • 24.
    PGroonga & ZulipPowered by Rabbit 2.2.1 Update and lock 更新とロック Update without read locks 参照ロックなしで更新 Write locks are required 書き込みロックは必要
  • 25.
    PGroonga & ZulipPowered by Rabbit 2.2.1 GIN: Read/Write GIN:読み書き Conn1 Conn2 INSERT start SELECT start Blocked INSERT finish SELECT finish GIN Slow down!
  • 26.
    PGroonga & ZulipPowered by Rabbit 2.2.1 PGroonga: Read/Write PGroonga:読み書き Conn1 Conn2 INSERT start SELECT start INSERT finish SELECT finish PGroonga No slow down!
  • 27.
    PGroonga & ZulipPowered by Rabbit 2.2.1 Wrap up まとめ Zulip: Low latency for UX ZulipはUXのために低レイテンシーをがんばっている Delay index update インデックスの更新は後回し PGroonga: Keeps fast search with update PGroongaは更新しながらでも高速検索を維持 Chat friendly characteristics チャット向きの特性
  • 28.
    PGroonga & ZulipPowered by Rabbit 2.2.1 More PGroonga features PGroongaの機能いろいろ Query expansion(クエリー展開) Support synonyms(同義語検索をサポート) Similar search(類似文書検索) Find similar messages 類似メッセージ検索 Fuzzy search(あいまい検索) Stemming(ステミング)