大解密!用 PostgreSQL 提升 350 倍的
Funliday 推薦景點計算速度
Kewang
Kewang
● 王慕羣 Kewang
● Java / JavaScript
● HBase / PostgreSQL / MongoDB / ElasticSearch
● Git / DevOps
●
熱愛開源
LinkedinLinkedin kewangtwkewangtw
SlideShareSlideShare kewangkewang
GmailGmail cpckewangcpckewang
FacebookFacebook Kewang 的資訊進化論Kewang 的資訊進化論
devopsday taipeidevopsday taipei '17'17
hadoopconhadoopcon '14 '15'14 '15
jcconfjcconf '16 '17 '18'16 '17 '18
modernwebmodernweb '18 '19 '20'18 '19 '20
GitHubGitHub kewangkewang
FunlidayFunliday kewangkewang
coscupcoscup '20'20
mopconmopcon '14 '20'14 '20
4
推薦景點
5
推薦景點
6
推薦景點
7
推薦景點
8
技術演進
9
V1 2019-02
10
Summary V1 2019-02
11
Summary
● GiST index
V1 2019-02
12
Summary
● GiST index
● Cluster
V1 2019-02
13
Summary
● GiST index
● Cluster
● Nearest-Neighbor Search
V1 2019-02
14
Summary
● GiST index
● Cluster
● Nearest-Neighbor Search
● Advisory lock
V1 2019-02
15
Summary
● GiST index
● Cluster
● Nearest-Neighbor Search
● Advisory lock
● Cache
V1 2019-02
16
Sequence diagram
17
Sequence diagram V1 2019-02
client AP Redis DB
18
Sequence diagram V1 2019-02
client AP Redis DB
get POIs
19
Sequence diagram V1 2019-02
client AP Redis DB
get POIs get cache from Redis
20
Sequence diagram V1 2019-02
client AP Redis DB
get POIs get cache from Redis
return cache
21
Sequence diagram V1 2019-02
client AP Redis DB
get POIs get cache from Redis
return cacheif hits, return POIs
22
Sequence diagram V1 2019-02
client AP Redis DB
get POIs get cache from Redis
return cacheif hits, return POIs
if misses, calculate POIs from DB
23
Sequence diagram V1 2019-02
client AP Redis DB
get POIs get cache from Redis
return cacheif hits, return POIs
if misses, calculate POIs from DB
return calculated results
24
Sequence diagram V1 2019-02
client AP Redis DB
get POIs get cache from Redis
return cacheif hits, return POIs
if misses, calculate POIs from DB
return calculated results
store cache to Redis
25
Sequence diagram V1 2019-02
client AP Redis DB
get POIs get cache from Redis
return cacheif hits, return POIs
if misses, calculate POIs from DB
return calculated results
store cache to Redis
store OK
26
Sequence diagram V1 2019-02
client AP Redis DB
get POIs get cache from Redis
return cacheif hits, return POIs
if misses, calculate POIs from DB
return calculated results
store cache to Redis
store OK
return POIs
27
Advisory lock
28
Advisory lock
client
V1 2019-02
server
29
Advisory lock
client
req A (search Taipei city)
V1 2019-02
server
T
30
Advisory lock
client
req A (search Taipei city)
req B (search Taipei city)
V1 2019-02
server
calculate
T
T+1
31
Advisory lock
client
req A (search Taipei city)
res B (data processing...)
req B (search Taipei city)
V1 2019-02
server
calculate
T
T+1
T+3
32
Advisory lock
client
req A (search Taipei city)
res B (data processing...)
req B (search Taipei city)
res A (calculated)
V1 2019-02
server
calculate
T
T+1
T+3
T+10
33
Table DDL
34
Table DDL V1 2019-02
35
Cluster
36
Before cluster V1 2019-02
37
Before cluster - query plan V1 2019-02
38
Statistical correlation V1 2019-02
39
Statistical correlation V1 2019-02
correlation 愈接近 1 ,用 index 的成本愈低
40
Statistical correlation V1 2019-02
correlation 愈接近 1 ,用 index 的成本愈低
如果沒有基本運算子就算不出 correlation
41
Running cluster V1 2019-02
42
Running cluster - lock V1 2019-02
43
Running cluster - lock
● Rebuild table
V1 2019-02
44
Running cluster - lock
● Rebuild table
– Access Exclusive Lock
V1 2019-02
45
Running cluster - lock
● Rebuild table
– Access Exclusive Lock
● Rebuild index
V1 2019-02
46
Running cluster - lock
● Rebuild table
– Access Exclusive Lock
● Rebuild index
– Access Share Lock
V1 2019-02
47
Running cluster - lock
● Rebuild table
– Access Exclusive Lock
● Rebuild index
– Access Share Lock
– Access Exclusive Lock
V1 2019-02
48
After cluster 1 V1 2019-02
49
After cluster 1 - query plan V1 2019-02
50
After cluster 1 - query plan V1 2019-02
51
After cluster 2 V1 2019-02
52
After cluster 2 - query plan V1 2019-02
53
After cluster 2 - query plan V1 2019-02
54
Nearest-Neighbor Search
55
Nearest-Neighbor Search 1 V1 2019-02
56
Nearest-Neighbor Search 1 V1 2019-02
57
Nearest-Neighbor Search 1 V1 2019-02
full table scan because of ST_Distance
58
Nearest-Neighbor Search 2 V1 2019-02
59
Nearest-Neighbor Search 2 V1 2019-02
60
Nearest-Neighbor Search 2 V1 2019-02
speed up because of ST_Expand
61
Nearest-Neighbor Search 3 V1 2019-02
62
Nearest-Neighbor Search 3 V1 2019-02
63
Nearest-Neighbor Search 3 V1 2019-02
<-> KNN
64
V2 2019-09 late
65
Summary V2 2019-09 late
66
Summary
● Use POI history to more precise
V2 2019-09 late
67
Summary
● Use POI history to more precise
● Remove duplicate POI from KNN and POI history via uniq
function
V2 2019-09 late
68
V2.1 2020-06 late
69
Summary V2.1 2020-06 late
70
Summary
● Extract city_data (2000M) from poi_data (25000M) to speed up
V2.1 2020-06 late
71
V2.2 2020-07 late
72
Summary V2.2 2020-07 late
73
Summary
● Remove unnecessary OSM POI
V2.2 2020-07 late
74
Summary
● Remove unnecessary OSM POI
– drinking_water
V2.2 2020-07 late
75
Summary
● Remove unnecessary OSM POI
– drinking_water
– place_of_worship
V2.2 2020-07 late
76
Summary
● Remove unnecessary OSM POI
– drinking_water
– place_of_worship
– basketball, football, volleyball
V2.2 2020-07 late
77
Summary
● Remove unnecessary OSM POI
– drinking_water
– place_of_worship
– basketball, football, volleyball
– parking
V2.2 2020-07 late
78
Summary
● Remove unnecessary OSM POI
– drinking_water
– place_of_worship
– basketball, football, volleyball
– parking
● Expired time
V2.2 2020-07 late
79
Summary
● Remove unnecessary OSM POI
– drinking_water
– place_of_worship
– basketball, football, volleyball
– parking
● Expired time
– KNN cache has expired after 14d
V2.2 2020-07 late
80
Summary
● Remove unnecessary OSM POI
– drinking_water
– place_of_worship
– basketball, football, volleyball
– parking
● Expired time
– KNN cache has expired after 14d
– POI history cache has expired after 1d
V2.2 2020-07 late
81
V3 2020-09-22 late
82
Summary V3 2020-09-22 late
83
Summary
● Read Redis at first, if not exists, set refresh true
V3 2020-09-22 late
84
Summary
● Read Redis at first, if not exists, set refresh true
● Read DB second, if not exists, calculate and store DB & Redis
V3 2020-09-22 late
85
Summary
● Read Redis at first, if not exists, set refresh true
● Read DB second, if not exists, calculate and store DB & Redis
● Set instead of uniq function
V3 2020-09-22 late
86
Summary
● Read Redis at first, if not exists, set refresh true
● Read DB second, if not exists, calculate and store DB & Redis
● Set instead of uniq function
● L2 & L3 cache
V3 2020-09-22 late
87
Summary
● Read Redis at first, if not exists, set refresh true
● Read DB second, if not exists, calculate and store DB & Redis
● Set instead of uniq function
● L2 & L3 cache
● Refresher to scan refresh true and calculate
V3 2020-09-22 late
88
Sequence diagram - API
89
Sequence diagram - API
client AP RedisDB
V3 2020-09-22 late
90
Sequence diagram - API
client AP RedisDB
get POIs
V3 2020-09-22 late
91
Sequence diagram - API
client AP RedisDB
get POIs get cache
V3 2020-09-22 late
92
Sequence diagram - API
client AP RedisDB
get POIs get cache
return cache
V3 2020-09-22 late
93
Sequence diagram - API
client AP RedisDB
get POIs get cache
return cache
if hits, return POIs
V3 2020-09-22 late
94
Sequence diagram - API
client AP RedisDB
get POIs get cache
return cache
if hits, return POIs
if misses, set refresh=true
V3 2020-09-22 late
95
Sequence diagram - API
client AP RedisDB
get POIs get cache
return cache
if hits, return POIs
if misses, set refresh=true
set OK
V3 2020-09-22 late
96
Sequence diagram - API
client AP RedisDB
get POIs get cache
return cache
if hits, return POIs
if misses, set refresh=true
set OK
get cache
V3 2020-09-22 late
97
Sequence diagram - API
client AP RedisDB
get POIs get cache
return cache
if hits, return POIs
if misses, set refresh=true
set OK
get cache
V3 2020-09-22 late
return cache
98
Sequence diagram - API
client AP RedisDB
get POIs get cache
return cache
if hits, return POIs
if misses, set refresh=true
set OK
get cache
if hits, return POIs
V3 2020-09-22 late
return cache
99
Sequence diagram - API
client AP RedisDB
get POIs get cache
return cache
if hits, return POIs
if misses, set refresh=true
set OK
get cache
if hits, return POIs
V3 2020-09-22 late
if misses, calculate POIs
return cache
100
Sequence diagram - API
client AP RedisDB
get POIs get cache
return cache
if hits, return POIs
if misses, set refresh=true
set OK
get cache
if hits, return POIs
V3 2020-09-22 late
if misses, calculate POIs
return cache
return POI IDs
101
Sequence diagram - API
client AP RedisDB
get POIs get cache
return cache
if hits, return POIs
if misses, set refresh=true
set OK
get cache
if hits, return POIs
V3 2020-09-22 late
if misses, calculate POIs
return cache
return POI IDs
store cache
102
Sequence diagram - API
client AP RedisDB
get POIs get cache
return cache
if hits, return POIs
if misses, set refresh=true
set OK
get cache
if hits, return POIs
V3 2020-09-22 late
if misses, calculate POIs
return cache
return POI IDs
store cache
store OK
103
Sequence diagram - API
client AP RedisDB
get POIs get cache
return cache
if hits, return POIs
if misses, set refresh=true
set OK
get cache
if hits, return POIs
V3 2020-09-22 late
if misses, calculate POIs
return cache
return POI IDs
store cache
store cache
store OK
104
Sequence diagram - API
client AP RedisDB
get POIs get cache
return cache
if hits, return POIs
if misses, set refresh=true
set OK
get cache
if hits, return POIs
V3 2020-09-22 late
if misses, calculate POIs
return cache
return POI IDs
store cache
store cache
store OK
store OK
105
Sequence diagram - API
client AP RedisDB
get POIs get cache
return cache
if hits, return POIs
if misses, set refresh=true
set OK
get cache
if hits, return POIs
V3 2020-09-22 late
if misses, calculate POIs
return cache
return POI IDs
store cache
store cache
return POIs
store OK
store OK
106
Sequence diagram - refresher
107
Sequence diagram - refresher
city IDs refresher RedisDB
V3 2020-09-22 late
108
Sequence diagram - refresher
city IDs refresher RedisDB
run
V3 2020-09-22 late
109
Sequence diagram - refresher
city IDs refresher RedisDB
run calculate POIs
V3 2020-09-22 late
110
Sequence diagram - refresher
city IDs refresher RedisDB
run calculate POIs
return POI IDs
V3 2020-09-22 late
111
Sequence diagram - refresher
city IDs refresher RedisDB
run calculate POIs
return POI IDs
store cache
V3 2020-09-22 late
112
Sequence diagram - refresher
city IDs refresher RedisDB
run calculate POIs
return POI IDs
store cache
store OK
V3 2020-09-22 late
113
Sequence diagram - refresher
city IDs refresher RedisDB
run calculate POIs
return POI IDs
store cache
store OK
V3 2020-09-22 late
store cache
114
Sequence diagram - refresher
city IDs refresher RedisDB
run calculate POIs
return POI IDs
store cache
store OK
V3 2020-09-22 late
store cache
store OK
115
Sequence diagram - refresher
city IDs refresher RedisDB
run calculate POIs
return POI IDs
store cache
store OK
V3 2020-09-22 late
store cache
store OK
set refresh=false
116
Sequence diagram - refresher
city IDs refresher RedisDB
run calculate POIs
return POI IDs
store cache
store OK
V3 2020-09-22 late
store cache
store OK
set refresh=false
set OK
117
Sequence diagram - refresher
city IDs refresher RedisDB
run calculate POIs
return POI IDs
store cache
store OK
done
V3 2020-09-22 late
store cache
store OK
set refresh=false
set OK
118
V3.1 2020-09-23 early
119
Summary V3.1 2020-09-23 early
120
Summary
● Store back existing cache from Redis to DB
V3.1 2020-09-23 early
121
Sequence diagram
122
Sequence diagram
client AP RedisDB
V3.1 2020-09-23 early
123
Sequence diagram
client AP RedisDB
run
V3.1 2020-09-23 early
124
Sequence diagram
client AP RedisDB
run get cache
V3.1 2020-09-23 early
125
Sequence diagram
client AP RedisDB
run get cache
return cache
V3.1 2020-09-23 early
126
Sequence diagram
client AP RedisDB
run get cache
return cache
if hits, get cache
V3.1 2020-09-23 early
127
Sequence diagram
client AP RedisDB
run get cache
return cache
if hits, get cache
return cache
V3.1 2020-09-23 early
128
Sequence diagram
client AP RedisDB
run get cache
return cache
if hits, get cache
return cache
if misses, store cache
V3.1 2020-09-23 early
129
Sequence diagram
client AP RedisDB
run get cache
return cache
if hits, get cache
return cache
if misses, store cache
store OK
V3.1 2020-09-23 early
130
Sequence diagram
client AP RedisDB
run get cache
return cache
if hits, get cache
return cache
done
if misses, store cache
store OK
V3.1 2020-09-23 early
131
V3.2 2020-09-23 mid
132
Summary V3.2 2020-09-23 mid
133
Summary
● POI history cache TTL from 1d to 14d
V3.2 2020-09-23 mid
134
Summary
● POI history cache TTL from 1d to 14d
– Balance diversity
V3.2 2020-09-23 mid
135
Summary
● POI history cache TTL from 1d to 14d
– Balance diversity
– Flatten burst activities
V3.2 2020-09-23 mid
136
Heatmap
137
Heatmap V3.2 2020-09-23 mid
138
V3.3 2020-09-24 early
139
Summary V3.3 2020-09-24 early
140
Summary
● Measure execution time
V3.3 2020-09-24 early
141
Summary
● Measure execution time
– Add New Relic custom attribute
V3.3 2020-09-24 early
142
New Relic custom attributes
143
New Relic custom attributes V3.3 2020-09-24 early
144
New Relic custom attributes V3.3 2020-09-24 early
145
New Relic custom attributes V3.3 2020-09-24 early
146
V3.4 2020-09-24 mid
147
Summary V3.4 2020-09-24 mid
148
Summary
● Add result cache for language code and city id at AP
V3.4 2020-09-24 mid
149
Summary
● Add result cache for language code and city id at AP
● Measure execution time
V3.4 2020-09-24 mid
150
Summary
● Add result cache for language code and city id at AP
● Measure execution time
– Add New Relic custom segment
V3.4 2020-09-24 mid
151
Sequence diagram
152
Sequence diagram
client AP DBLRU cache
(with POI IDs, city ID,language)
V3.4 2020-09-24 mid
(at AP)
153
Sequence diagram
client AP DBLRU cache
get results
(with POI IDs, city ID,language)
V3.4 2020-09-24 mid
(at AP)
154
Sequence diagram
client AP DBLRU cache
get results
(with POI IDs, city ID,language)
return results
V3.4 2020-09-24 mid
(at AP)
155
Sequence diagram
client AP DBLRU cache
get results
if hits, return POIs
(with POI IDs, city ID,language)
return results
V3.4 2020-09-24 mid
(at AP)
156
Sequence diagram
client AP DBLRU cache
get results
if hits, return POIs
if misses, build results
(with POI IDs, city ID,language)
return results
V3.4 2020-09-24 mid
(at AP)
157
Sequence diagram
client AP DBLRU cache
get results
if hits, return POIs
if misses, build results
return results
(with POI IDs, city ID,language)
return results
V3.4 2020-09-24 mid
(at AP)
158
Sequence diagram
client AP DBLRU cache
get results
if hits, return POIs
if misses, build results
return results
store results
(with POI IDs, city ID,language)
return results
V3.4 2020-09-24 mid
(at AP)
159
Sequence diagram
client AP DBLRU cache
get results
if hits, return POIs
if misses, build results
return results
store results
store OK
(with POI IDs, city ID,language)
return results
V3.4 2020-09-24 mid
(at AP)
160
Sequence diagram
client AP DBLRU cache
get results
if hits, return POIs
if misses, build results
return results
store results
return POIs store OK
(with POI IDs, city ID,language)
return results
V3.4 2020-09-24 mid
(at AP)
161
New Relic custom segments
162
Before custom segments V3.4 2020-09-24 mid
163
After custom segments V3.4 2020-09-24 mid
164
After custom segments V3.4 2020-09-24 mid
165
V3.5 2020-09-24 late
166
Summary V3.5 2020-09-24 late
167
Summary
● Merge join: 70s
V3.5 2020-09-24 late
168
Summary
● Merge join: 70s
– where and group together
V3.5 2020-09-24 late
169
Summary
● Merge join: 70s
– where and group together
● Hash join: 10s
V3.5 2020-09-24 late
170
Summary
● Merge join: 70s
– where and group together
● Hash join: 10s
– where first then group
V3.5 2020-09-24 late
171
Merge join
172
Before optimization: 70s V3.5 2020-09-24 late
173
Before optimization: 70s V3.5 2020-09-24 late
174
Before optimization: 70s V3.5 2020-09-24 late
https://explain.depesz.com/s/Om1c
175
Before optimization: 70s V3.5 2020-09-24 late
176
Before optimization: 70s V3.5 2020-09-24 late
177
Hash join
178
After optimization: 10s V3.5 2020-09-24 late
179
After optimization: 10s V3.5 2020-09-24 late
180
After optimization: 10s V3.5 2020-09-24 late
https://explain.depesz.com/s/C9yZ
181
After optimization: 10s V3.5 2020-09-24 late
182
After optimization: 10s V3.5 2020-09-24 late
183
V3.6 2020-09-25 early
184
Summary V3.6 2020-09-25 early
185
Summary
● Remove duplicate middleware: 1ms
V3.6 2020-09-25 early
186
V3.7 2020-09-27 early
187
Summary V3.7 2020-09-27 early
188
Summary
● Find city from location via GiST index
V3.7 2020-09-27 early
189
Summary
● Find city from location via GiST index
● Query POI history via B-tree index
V3.7 2020-09-27 early
190
GiST index
191
Find city from location V3.7 2020-09-27 early
192
Before optimization: 400ms V3.7 2020-09-27 early
193
Before optimization: 400ms V3.7 2020-09-27 early
194
Before optimization: 400ms V3.7 2020-09-27 early
https://explain.depesz.com/s/y5Vv
195
Create GiST index V3.7 2020-09-27 early
196
After optimization: 0.4ms V3.7 2020-09-27 early
197
After optimization: 0.4ms V3.7 2020-09-27 early
198
After optimization: 0.4ms V3.7 2020-09-27 early
https://explain.depesz.com/s/QqmT
199
B-tree index
200
Query POI history V3.7 2020-09-27 early
201
Before optimization: 10s V3.7 2020-09-27 early
202
Before optimization: 10s V3.7 2020-09-27 early
203
Before optimization: 10s V3.7 2020-09-27 early
https://explain.depesz.com/s/7LHy
204
Create B-tree index V3.7 2020-09-27 early
205
After optimization: 1s V3.7 2020-09-27 early
206
After optimization: 1s V3.7 2020-09-27 early
207
After optimization: 1s V3.7 2020-09-27 early
https://explain.depesz.com/s/o59U
208
References
● digoal/blog
● PostgreSQL cluster table using index
● 27. Nearest-Neighbour Searching - Introduction to PostGIS
209

大解密!用 PostgreSQL 提升 350 倍的 Funliday 推薦景點計算速度