More Related Content Similar to PipelineDBとは? (20) PipelineDBとは?2. Agenda
• Self Introduce
• What’s PipelineDB
• What’s Continuous Query
• Continuous Transform/Trigger
• DEMO
• Enterprise Edition
• Tips
3. Self Introduce
• Twitter: @tamtam180
• Works
– SquareEnix
• PlayOnline, FF-XIV
–Server Programmer
– SmartNews
• Advertising
–Software Engineer
5. What’s PipelineDB
• OSS Database (+Enterprise Edition)
– GPLv3
• Support Continuous Query
• on PostgreSQL as extension
– 0.8.x on 9.4, 0.9.x on 9.5
– No special client libraries
• Support probabilistic data structure & algorithm
– Bloom-filter, hyperloglog, Count-Min sketch,
– FSS Top-K, T-Digest
7. What’s Continuous Query
• RDB
Timestamp ChannelID CampaignID UserID Sales
2016/06/18 13:41:10 1000 10 100 30
2016/06/18 13:43:15 1000 10 101 20
2016/06/18 13:47:20 1001 11 123 15
2016/06/18 14:10:10 1000 12 100 30
2016/06/19 14:15:30 1002 14 101 20
2016/06/19 15:16:30 1003 11 100 15
2016/06/19 16:17:56 1001 14 123 30
Aggregate
8. What’s Continuous Query
• RDB
Timestamp ChannelID CampaignID UserID Sales
2016/06/18 13:41:10 1000 10 100 30
2016/06/18 13:43:15 1000 10 101 20
2016/06/18 13:47:20 1001 11 123 15
2016/06/18 14:10:10 1000 12 100 30
2016/06/19 14:15:30 1002 14 101 20
2016/06/19 15:16:30 1003 11 100 15
2016/06/19 16:17:56 1001 14 123 30
Aggregate
SELECT
TO_CHAR(timestamp, ‘YYYY-MM-DD’) as ymd,
campaignId, SUM(sales)
FROM clicks
WHERE
timestamp < NOW() - INTERVAL ’-3 day’
GROUP BY ymd, campaignId
9. What’s Continuous Query
• PipelineDB
Stream
CV
CV
data record
CREATE STREAM stream_name (
timestamp TIMESTAMP,
channelId BIGINT,
campaignId BIGINT,
userId BIGINT,
sales BIGINT
);
CREATE CONTINUOUS VIEW cv_name WITH(max_age=‘3 days’) AS
SELECT
TO_CHAR(timestamp, ‘YYYY-MM-DD’) as ymd,
campaignId, SUM(sales)
FROM
stream_name
GROUP BY
ymd, campaignId;
10. What’s Continuous Query
• Continuous Query: STREAM
CREATE STREAM stream_name (
timestamp TIMESTAMP,
channelId BIGINT,
campaignId BIGINT,
userId BIGINT,
sales BIGINT
);
15. What’s Continuous Query
• Continuous View
CREATE CONTINUOUS VIEW cv_name AS
SELECT
TO_CHAR(timestamp, ‘YYYY-MM-DD’) as ymd,
campaignId, SUM(sales)
FROM
stream_name
WHERE
arrival_timestamp > clock_timestamp() - interval ‘3 days’
GROUP BY
ymd, campaignId;
Deprecated
16. What’s Continuous Query
• Continuous View
CREATE CONTINUOUS VIEW cv_name WITH(max_age=‘3 days’)
AS
SELECT
TO_CHAR(timestamp, ‘YYYY-MM-DD’) as ymd,
campaignId, SUM(sales)
FROM
stream_name
GROUP BY
ymd, campaignId;
18. What’s Continuous Query
• Data Insert
INSERT INTO stream_name (timestamp, campaignId, sales)
VALUES
(‘2016-07-22 11:00:01’, 100, 25),
(‘2016-07-22 11:00:02’, 101, 20),
(‘2016-07-22 11:00:03’, 101, 22)
;
19. What’s Continuous Query
• Also use COPY statement
COPY stream_name (timestamp, campaignId, sales)
FROM ‘/some/path/file.csv’
COPY stream_name (timestamp, campaignId, sales)
FROM STDIN
23. On SmartNews Ads
• 1 Column JSONB
CREATE STREAM imp_stream ( item JSONB );
CREATE STREAM vimp_stream ( item JSONB );
CREATE STREAM click_stream ( item JSONB );
24. On SmartNews Ads
• Create Counter table per stream
CREATE CONTINUOUS VIEW imp_count
WITH(max_age='7 day', step_factor=1)
AS
SELECT
(TO_CHAR(TO_TIMESTAMP((item::jsonb->>'timestamp')::bigint),
'YYYY-MM-DD HH24:00:00'))::timestamp as dt,
COUNT(*) as cnt
FROM imp_stream
GROUP BY dt;
STREAMに紐付くCVが1つも無いとINSERT時に
警告が出まくる
26. On SmartNews Ads
• Consumer等は弄る必要が無く、
• 後はCVを定義していくだけ
STREAM
Consumer
STREAM
CV-1
CV-1
CV-3
CV-2
JSON
29. HLL
• Distinct count => HLLで算出
CREATE CONTINUOUS VIEW imp_count
WITH(max_age='7 day', step_factor=1)
AS
SELECT
(TO_CHAR(TO_TIMESTAMP((item::jsonb->>'timestamp')::bigint),
'YYYY-MM-DD HH24:00:00'))::timestamp as dt,
COUNT(distinct (item->>'uuid')::text) as uuid_ucnt
FROM imp_stream
GROUP BY dt;
exact_count_distinctを使えば正確な値も算出
できる
31. HLL
• 1時間単位でHLLのまま保持
CREATE CONTINUOUS VIEW test_cv WITH(max_age='30 days')
AS
SELECT
to_char(to_timestamp((item->>'timestamp')::bigint + 3600*9), 'YYYY-
MM-DD') as ymd_jst,
date_part('hour', to_timestamp((item->>'timestamp')::bigint +
3600*9))::integer as h_jst,
hll_agg((item->>'uuid')::text) as uuid_agg,
FROM test_stream
GROUP BY
ymd_jst, h_jst;
37. Continuous Transform
• TRANSFORMを定義する
CREATE CONTINUOUS TRANSFORM xxx_etl AS
SELECT item::jsonb
FROM xxx_stream
WHERE
to_timestamp((item->'obj'-
>>'timestamp_sec')::bigint) > clock_timestamp() -
interval '7 days’
AND (item->'obj'->>'flag')::bigint = 1
THEN EXECUTE PROCEDURE
pipeline_stream_insert('xxx_stream_etl')
pipeline_stream_insert
は組み込みで定義されている
自分でも定義できる
40. Continuous Trigger
• Trigger
– いわゆるトリガーです
– あるキャンペーンの消化金額が日予算を超えたらアラートを通知
– Impression SmoothingがBehindしている時に通知
• とかが出来る
– 通知
• 別のテーブルにレコードを挿入
• HTTP通信をしてWebHookを叩く
• EMAILを投げる(MailGunとか使う)
サンプルを作る時間が無かった!!
47. Tips: timestamp
• clock_timestamp()
• Only one time in statement
Current date and time
(changes during statement execution)
ERROR: clock_timestamp() may only appear
once in a WHERE clause
49. We are hiring!
• SmartNewsではエンジニアを募集しているみたいです
–広告エンジニア
–フロントエンドエンジニア
–iOS/Androidエンジニア
–プロダクティビティエンジニア
–機械学習/自然言語処理エンジニア
–などなど
• http://about.smartnews.com/ja/careers/