PGroonga is fast and flexible full text search extension for PostgreSQL. Zulip is a chat tool that uses PostgreSQL and PGroonga. This talk describes why PGroonga is suitable for Zulip.
PGroonga 2 – Make PostgreSQL rich full text search system backend!Kouhei Sutou
PGroonga 2.0 has been released with 2 years development since PGroonga 1.0.0. PGroonga 1.0.0 just provides fast full text search with all languages support. It's important because it's a lacked feature in PostgreSQL. PGroonga 2.0 provides more useful features to implement rich full text search system with PostgreSQL. This session shows how to implement rich full text search system with PostgreSQL!
This talk describes about PGroonga that resolves these problems.
"Big Data made easy with a Spark" is the presentation I gave for ATO (AllThingsOpen) 2018.
In this hands-on session, you will learn how to do a full Big Data scenario from ingestion to publication. You will see how we can use Java and Apache Spark to ingest data, perform some transformations, save the data. You will then perform a second lab where you will run your very first Machine Learning algorithm!
Practical Elasticsearch - real world use casesItamar
Elasticsearch - a search and real-time analytics server based on Apache Lucene - is gaining a lot of popularity lately, and is being used world-wide to power many sophisticated systems. While many use it for the "standard" stuff (that is, simple full-text search and real-time log analysis), there are some really interesting usage patterns that can prove useful in many real-world scenarios. In this talk we will briefly talk about Elasticsearch and its common use-cases, and then showcase some less common use-cases leveraging Elasticsearch in an interesting and often times innovating ways.
API analytics with Redis and Google Bigquery. NoSQL matters editionjavier ramirez
At teowaki we have a system for API use analytics using Redis as a fast intermediate store and bigquery as a big data backend. As a result, we can launch aggregated queries on our traffic/usage data in a few seconds and we can try and find for usage patterns that wouldn’t be obvious otherwise. In this session I will speak of the alternatives we evaluated and how we are using Redis and Bigquery to solve our problem.
Elasticsearch Distributed search & analytics on BigData made easyItamar
Elasticsearch is a cloud-ready, super scalable search engine which is gaining a lot of popularity lately. It is mostly known for being extremely easy to setup and integrate with any technology stack.In this talk we will introduce Elasticdearch, and start by looking at some of its basic capabilities. We will demonstrate how it can be used for document search and even log analytics for DevOps and distributed debugging, and peek into more advanced usages like the real-time aggregations and percolation. Obviously, we will make sure to demonstrate how Elasticsearch can be scaled out easily to work on a distributed architecture and handle pretty much any load.
PGroonga – Make PostgreSQL fast full text search platform for all languages!Kouhei Sutou
PostgreSQL has built-in full text search feature. But it supports only limited languages. For example, it doesn't support Japanese. pg_trgm bundled in PostgreSQL supports all languages including Japanese. But it has performance problems for large documents.
This talk describes about PGroonga that resolves these problems.
PGroonga 2 – Make PostgreSQL rich full text search system backend!Kouhei Sutou
PGroonga 2.0 has been released with 2 years development since PGroonga 1.0.0. PGroonga 1.0.0 just provides fast full text search with all languages support. It's important because it's a lacked feature in PostgreSQL. PGroonga 2.0 provides more useful features to implement rich full text search system with PostgreSQL. This session shows how to implement rich full text search system with PostgreSQL!
This talk describes about PGroonga that resolves these problems.
"Big Data made easy with a Spark" is the presentation I gave for ATO (AllThingsOpen) 2018.
In this hands-on session, you will learn how to do a full Big Data scenario from ingestion to publication. You will see how we can use Java and Apache Spark to ingest data, perform some transformations, save the data. You will then perform a second lab where you will run your very first Machine Learning algorithm!
Practical Elasticsearch - real world use casesItamar
Elasticsearch - a search and real-time analytics server based on Apache Lucene - is gaining a lot of popularity lately, and is being used world-wide to power many sophisticated systems. While many use it for the "standard" stuff (that is, simple full-text search and real-time log analysis), there are some really interesting usage patterns that can prove useful in many real-world scenarios. In this talk we will briefly talk about Elasticsearch and its common use-cases, and then showcase some less common use-cases leveraging Elasticsearch in an interesting and often times innovating ways.
API analytics with Redis and Google Bigquery. NoSQL matters editionjavier ramirez
At teowaki we have a system for API use analytics using Redis as a fast intermediate store and bigquery as a big data backend. As a result, we can launch aggregated queries on our traffic/usage data in a few seconds and we can try and find for usage patterns that wouldn’t be obvious otherwise. In this session I will speak of the alternatives we evaluated and how we are using Redis and Bigquery to solve our problem.
Elasticsearch Distributed search & analytics on BigData made easyItamar
Elasticsearch is a cloud-ready, super scalable search engine which is gaining a lot of popularity lately. It is mostly known for being extremely easy to setup and integrate with any technology stack.In this talk we will introduce Elasticdearch, and start by looking at some of its basic capabilities. We will demonstrate how it can be used for document search and even log analytics for DevOps and distributed debugging, and peek into more advanced usages like the real-time aggregations and percolation. Obviously, we will make sure to demonstrate how Elasticsearch can be scaled out easily to work on a distributed architecture and handle pretty much any load.
PGroonga – Make PostgreSQL fast full text search platform for all languages!Kouhei Sutou
PostgreSQL has built-in full text search feature. But it supports only limited languages. For example, it doesn't support Japanese. pg_trgm bundled in PostgreSQL supports all languages including Japanese. But it has performance problems for large documents.
This talk describes about PGroonga that resolves these problems.
Whats wrong with postgres | PGConf EU 2019 | Craig KerstiensCitus Data
Postgres is a powerful database, it continues to improve in terms of performance, extensibility, and more broadly in features. However it is not perfect.
Here I'll cover a highly opinionated view of all the areas Postgres falls flat, with some rough thought ideas on how we can make it better. Opinions are all informed by 10 years of interacting with customers running literally millions of databases for users.
Gophers Riding Elephants: Writing PostgreSQL tools in GoAJ Bahnken
This talk will start with an overview of Go, then dive into some examples of using it to work with Postgres. We'll show basics like running queries, then demonstrate how Go makes difficult things easy, such as inspecting Postgres's TCP wire protocol and providing retry mechanisms and monitoring for restores (including fun stories, tips and tricks).
Join us and see why Go - with it's powerful concurrency primitives and unbeatable performance - can be one of the most powerful tools in your toolbox.
RubyKaigi 2022 - Fast data processing with Ruby and Apache ArrowKouhei Sutou
I introduced Ruby and Apache Arrow integration including the "super fast large data interchange and processing" Apache Arrow feature at RubyKaigi Takeout 2021.
This talk introduces how we can use the "super fast large data interchange and processing" Apache Arrow feature in Ruby. Here are some use cases:
* Fast data retrieval (fast pluck) from DB such as MySQL and PostgreSQL for batch processes in a Ruby on Rails application
* Fast data interchange with JavaScript for dynamic visualization in a Ruby on Rails application
* Fast OLAP with in-process DB such as DuckDB and Apache Arrow DataFusion in a Ruby on Rails application or irb session
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache ArrowKouhei Sutou
To use Ruby for data processing widely, Apache Arrow support is important. We can do the followings with Apache Arrow:
* Super fast large data interchange and processing
* Reading/writing data in several famous formats such as CSV and Apache Parquet
* Reading/writing partitioned large data on cloud storage such as Amazon S3
This talk describes the followings:
* What is Apache Arrow
* How to use Apache Arrow with Ruby
* How to integrate with Ruby 3.0 features such as MemoryView and Ractor
Apache Arrow 1.0 - A cross-language development platform for in-memory dataKouhei Sutou
Apache Arrow is a cross-language development platform for in-memory data. You can use Apache Arrow to process large data effectively in Python and other languages such as R. Apache Arrow is the future of data processing. Apache Arrow 1.0, the first major version, was released at 2020-07-24. It's a good time to know Apache Arrow and start using it.
Apache Arrow - A cross-language development platform for in-memory dataKouhei Sutou
Apache Arrow is the future for data processing systems. This talk describes how to solve data sharing overhead in data processing system such as Spark and PySpark. This talk also describes how to accelerate computation against your large data by Apache Arrow.
csv, one of the standard libraries, in Ruby 2.6 has many improvements:
* Default gemified
* Faster CSV parsing
* Faster CSV writing
* Clean new CSV parser implementation for further improvements
* Reconstructed test suites for further improvements
* Benchmark suites for further performance improvements
These improvements are done without breaking backward compatibility.
This talk describes details of these improvements by a new csv maintainer.
Whats wrong with postgres | PGConf EU 2019 | Craig KerstiensCitus Data
Postgres is a powerful database, it continues to improve in terms of performance, extensibility, and more broadly in features. However it is not perfect.
Here I'll cover a highly opinionated view of all the areas Postgres falls flat, with some rough thought ideas on how we can make it better. Opinions are all informed by 10 years of interacting with customers running literally millions of databases for users.
Gophers Riding Elephants: Writing PostgreSQL tools in GoAJ Bahnken
This talk will start with an overview of Go, then dive into some examples of using it to work with Postgres. We'll show basics like running queries, then demonstrate how Go makes difficult things easy, such as inspecting Postgres's TCP wire protocol and providing retry mechanisms and monitoring for restores (including fun stories, tips and tricks).
Join us and see why Go - with it's powerful concurrency primitives and unbeatable performance - can be one of the most powerful tools in your toolbox.
RubyKaigi 2022 - Fast data processing with Ruby and Apache ArrowKouhei Sutou
I introduced Ruby and Apache Arrow integration including the "super fast large data interchange and processing" Apache Arrow feature at RubyKaigi Takeout 2021.
This talk introduces how we can use the "super fast large data interchange and processing" Apache Arrow feature in Ruby. Here are some use cases:
* Fast data retrieval (fast pluck) from DB such as MySQL and PostgreSQL for batch processes in a Ruby on Rails application
* Fast data interchange with JavaScript for dynamic visualization in a Ruby on Rails application
* Fast OLAP with in-process DB such as DuckDB and Apache Arrow DataFusion in a Ruby on Rails application or irb session
RubyKaigi Takeout 2021 - Red Arrow - Ruby and Apache ArrowKouhei Sutou
To use Ruby for data processing widely, Apache Arrow support is important. We can do the followings with Apache Arrow:
* Super fast large data interchange and processing
* Reading/writing data in several famous formats such as CSV and Apache Parquet
* Reading/writing partitioned large data on cloud storage such as Amazon S3
This talk describes the followings:
* What is Apache Arrow
* How to use Apache Arrow with Ruby
* How to integrate with Ruby 3.0 features such as MemoryView and Ractor
Apache Arrow 1.0 - A cross-language development platform for in-memory dataKouhei Sutou
Apache Arrow is a cross-language development platform for in-memory data. You can use Apache Arrow to process large data effectively in Python and other languages such as R. Apache Arrow is the future of data processing. Apache Arrow 1.0, the first major version, was released at 2020-07-24. It's a good time to know Apache Arrow and start using it.
Apache Arrow - A cross-language development platform for in-memory dataKouhei Sutou
Apache Arrow is the future for data processing systems. This talk describes how to solve data sharing overhead in data processing system such as Spark and PySpark. This talk also describes how to accelerate computation against your large data by Apache Arrow.
csv, one of the standard libraries, in Ruby 2.6 has many improvements:
* Default gemified
* Faster CSV parsing
* Faster CSV writing
* Clean new CSV parser implementation for further improvements
* Reconstructed test suites for further improvements
* Benchmark suites for further performance improvements
These improvements are done without breaking backward compatibility.
This talk describes details of these improvements by a new csv maintainer.
PostgreSQL Conference Japan 2017の資料です。
PostgreSQLが苦手な全文検索機能を強力に補完する拡張機能PGroongaの最新情報を紹介します。PGroongaは単なる全文検索モジュールではありません。PostgreSQLで簡単にリッチな全文検索システムを実現するための機能一式を提供します。PGroongaは最新PostgreSQLへの対応も積極的です。例としてロジカルレプリケーションと組み合わせた活用方法も紹介します。
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
UiPath Test Automation using UiPath Test Suite series, part 4
PGroonga & Zulip
1. PGroonga & Zulip Powered by Rabbit 2.2.1
PGroonga
&
Zulip
Kouhei Sutou ClearCode Inc.
Zulip & PGroonga Night
2017-09-06
2. PGroonga & Zulip Powered by Rabbit 2.2.1
PGroonga
Pronunciation: píːzí:lúnɡά
読み方:ぴーじーるんが
PostgreSQL extension
PostgreSQLの拡張機能
Fast full text search
高速全文検索機能
All languages are supported!
全言語対応!
3. PGroonga & Zulip Powered by Rabbit 2.2.1
Fast?(高速?)
Need to measure to confirm
確認するには測定しないと
Targets(測定対象)
textsearch (built-in)(組み込み)
pg_bigm (third party)(外部プロダク
ト)
4. PGroonga & Zulip Powered by Rabbit 2.2.1
PGroona and textsearch
0
0.2
0.4
0.6
0.8
1
1.2
1.4
PostgreSQL OR MySQL database America
Data: English Wikipedia
(Many records and large docs)
N records: About 5.3millions
Average text size: 6.4KiB
Elapsedtime(ms)
(Shorterisbetter)
Query
PGroonga textsearch
5. PGroonga & Zulip Powered by Rabbit 2.2.1
As fast as textsearch
textsearchと同じくらいの速さ
textsearch uses word based
full text search
textsearchは単語ベースの全文検索実装
PostgreSQL has enough
performance for the approach
PostgreSQLはこの方法では十分な性能を出せる
6. PGroonga & Zulip Powered by Rabbit 2.2.1
textsearch and Japanese
textsearchと日本語
Asian languages including
Japanese aren't supported
日本語を含むアジア圏の言語は非サポート
Need plugin(プラグインが必要)
Plugin exists but isn't
maintained
プラグインはあるがメンテナンスされていない
7. PGroonga & Zulip Powered by Rabbit 2.2.1
Japanese support
日本語対応
Need one of them(どちらかが必要)
N-gram approach support
N-gramというやり方のサポート
Japanese specific word based
approach support
日本語を考慮した単語ベースのやり方のサポート
PGroonga supports both of them
8. PGroonga & Zulip Powered by Rabbit 2.2.1
PostgreSQL and N-gram
PostgreSQLとN-gram
PostgreSQL is slow with N-
gram approach
PostgreSQLでN-gramというやり方を使うと遅い
N-gram approach:
pg_trgm (contrib)
Japanese isn't supported by default
デフォルトでは日本語に対応していない
pg_bigm (third-party)
9. PGroonga & Zulip Powered by Rabbit 2.2.1
PGroona and pg_bigm
0
0.5
1
1.5
2
2.5
3
311 14706 20389
Data: Japanese Wikipedia
(Many records and large documents)
N records: About 0.9millions
Average text size: 6.7KiB
Fast Fast
Elapsedtime(sec)
(Lowerisbetter)
N hits
PGroonga pg_bigm
10. PGroonga & Zulip Powered by Rabbit 2.2.1
PGroonga is fast stably
PGroongaは安定して速い
PostgreSQL needs "recheck"
for N-gram approach
PostgreSQLはN-gramのときは「recheck」が必要
Seq search after index search
インデックスサーチのあとにシーケンシャルサーチ
PGroonga doesn't need
PGroongaでは必要ない
Only index search
インデックスサーチだけでOK
11. PGroonga & Zulip Powered by Rabbit 2.2.1
Wrap up
まとめ
textsearch is fast but
Asian langs aren't supported
textsearchは速いけどアジア圏の言語を未サポート
pg_bigm supports Japanese
but is slow for large hits
pg_bigmは日本語対応だがヒット数が多くなると遅い
PGroonga is fast and
supports all languages
PGroongaは速くて全言語対応
12. PGroonga & Zulip Powered by Rabbit 2.2.1
FYI: textsearch, PGroonga
and Groonga
0
0.2
0.4
0.6
0.8
1
1.2
1.4
PostgreSQL OR MySQL database America
Data: English Wikipedia
(Many records and large docs)
N records: About 5.3millions
Average text size: 6.4KiB
Groonga is 30x faster than others
Elapsedtime(ms)
(Shorterisbetter)
Query
PGroonga Groonga textsearch
13. PGroonga & Zulip Powered by Rabbit 2.2.1
Zulip and PGroonga
Zulip uses textsearch by
default
Zulipはデフォルトでtextsearchを使用
Japanese isn't supported
日本語非対応
Zulip supports PGroonga as
option
ZulipでPGroongaも使うこともできる
Implemented by me
私が実装
14. PGroonga & Zulip Powered by Rabbit 2.2.1
Zulip: full text search
Zulipと全文検索
Zulip is chat tool
Zulipはチャットツール
Latency is important for UX
UX的にレイテンシーは大事
Index update is heavy
インデックスの更新は重い
Delay index update
インデックスの更新を後回しにしている
15. PGroonga & Zulip Powered by Rabbit 2.2.1
Delay index update
インデックス更新を後回し
CREATE TABLE zerver_message (
rendered_content text,
-- ... ↓Column for full text search
search_tsvector tsvector
); -- ↓Index for full text search
CREATE INDEX zerver_message_search_tsvector
ON zerver_message
USING gin (search_tsvector);
16. PGroonga & Zulip Powered by Rabbit 2.2.1
Delay index update
インデックス更新を後回し
-- Execute append_to_fts_update_log() on change
CREATE TRIGGER
zerver_message_update_search_tsvector_async
BEFORE INSERT OR UPDATE OF rendered_content
ON zerver_message
FOR EACH ROW
EXECUTE PROCEDURE append_to_fts_update_log();
17. PGroonga & Zulip Powered by Rabbit 2.2.1
Delay index update
インデックス更新を後回し
-- Insert ID to fts_update_log table
CREATE FUNCTION append_to_fts_update_log()
RETURNS trigger
LANGUAGE plpgsql AS $$
BEGIN
INSERT INTO fts_update_log (message_id)
VALUES (NEW.id);
RETURN NEW;
END
$$;
18. PGroonga & Zulip Powered by Rabbit 2.2.1
Delay index update
インデックス更新を後回し
-- Keep ID to be updated
CREATE TABLE fts_update_log (
id SERIAL PRIMARY KEY,
message_id INTEGER NOT NULL
);
19. PGroonga & Zulip Powered by Rabbit 2.2.1
Delay index update
インデックス更新を後回し
-- Execute do_notify_fts_update_log()
-- on INSERT
CREATE TRIGGER fts_update_log_notify
AFTER INSERT ON fts_update_log
FOR EACH STATEMENT
EXECUTE PROCEDURE
do_notify_fts_update_log();
20. PGroonga & Zulip Powered by Rabbit 2.2.1
Delay index update
インデックス更新を後回し
-- NOTIFY to fts_update_log channel!
CREATE FUNCTION do_notify_fts_update_log()
RETURNS trigger
LANGUAGE plpgsql AS $$
BEGIN
NOTIFY fts_update_log;
RETURN NEW;
END
$$;
21. PGroonga & Zulip Powered by Rabbit 2.2.1
Delay index update
インデックス更新を後回し
cursor.execute("LISTEN ftp_update_log") # Wait
cursor.execute("SELECT id, message_id FROM fts_update_log")
ids = []
for (id, message_id) in cursor.fetchall():
cursor.execute("UPDATE zerver_message SET search_tsvector = "
"to_tsvector('zulip.english_us_search', "
"rendered_content) "
"WHERE id = %s", (message_id,))
ids.append(id)
cursor.execute("DELETE FROM fts_update_log WHERE id = ANY(%s)",
(ids,))
22. PGroonga & Zulip Powered by Rabbit 2.2.1
PGroonga: index update
PGroongaとインデックス更新
PGroonga's index update is
fast too
PGroongaはインデックス更新も速い
PGroonga's search while
index update is still fast
PGroongaはインデックス更新中の検索も速い
23. PGroonga & Zulip Powered by Rabbit 2.2.1
Perf characteristics
性能の傾向
Searchthroughput
Update throughput
PGroonga
Searchthroughput
Update throughput
GIN
Keep
search performance
while many updates
Decrease
search performance
while updating
24. PGroonga & Zulip Powered by Rabbit 2.2.1
Update and lock
更新とロック
Update without read locks
参照ロックなしで更新
Write locks are required
書き込みロックは必要
27. PGroonga & Zulip Powered by Rabbit 2.2.1
Wrap up
まとめ
Zulip: Low latency for UX
ZulipはUXのために低レイテンシーをがんばっている
Delay index update
インデックスの更新は後回し
PGroonga: Keeps fast search
with update
PGroongaは更新しながらでも高速検索を維持
Chat friendly characteristics
チャット向きの特性
28. PGroonga & Zulip Powered by Rabbit 2.2.1
More PGroonga features
PGroongaの機能いろいろ
Query expansion(クエリー展開)
Support synonyms(同義語検索をサポート)
Similar search(類似文書検索)
Find similar messages
類似メッセージ検索
Fuzzy search(あいまい検索)
Stemming(ステミング)