Redisconf19: Real-time spatiotemporal data utilization for future mobility services

1Copyright©2019 NTT Corp. All Rights Reserved.
Real-time spatiotemporal data utilization
for future mobility services
Atsushi ISOMURA
NTT(Nippon Telegraph and Telephone) Corp., Researcher

About me
- Atsushi Isomura
- Tokyo, Japan
- Work
- NTT Software Innovation Center
- OSS, in-memory, data storage, distribution, etc.
- “Spatio-Temporal data processing”
- Free time
- Develop smart-phone apps
- Nintendo Switch (Splatoon2 : rank X, SSB Ultimate : Elite)
- Baseball
- Drive cars (Desire : Drive without traffic jam)

1. Motivation
2. Spatio-temporal data utilization in redis
3. Proposal
4. Performance
5. ST-code generation tips / Sample codes
#Links of codes and slides at the end.
INDEX

1. Motivation

1-1. Background
These IoT devices keep INCREASING!
ref1 : 2016 estimation of Fuji Keizai Marketing Research & Consulting Group
ref2 : 2016 estimation of Yano Research Institute
0
2
4
6
2016年 2020年
[hundred million]
‘16 ‘20
2.6
5.4
0
1
2
3
2016年 2020年
[hundred million]
‘16 ‘20
1.1
3.2

1-1. Background
IoT sensorsIoT devices
What’s the difference?

1-1. Background
1. They MOVE every moment!
Latitude Longitude Time Value
37.800 -122.402 2019/4/2 12:30:15 ID:1234, 30km/h
37.798 -122.400 2019/4/2 12:30:16 ID:1234, 31km/h
… … … …
ST(Spatio-Temporal) Data

1-1. Background
2. The density CHANGES by location & time
Metropolis
Suburb
Metropolis
Suburb
High
Low
Low
High

1-2. Future mobility services
Example1 : Nearby car crash alert
car
crash
alert
Broken car
on abc Street
real-time
view
Alert !
Crash!

redisconf19 ended
train arrived
Example2 : Optimal routing for taxis
taxi
waiting
events
party closed Input
- location of waiting people
- event information
- traffic jam
- etc.
Calculate optimal
route automatically

I need to send this
package NOW!
Nearest drone available
Example3 : Drone package delivery
drone
package

1-3. IoT devices’ features
IoT devices Important features
1. MOVE
2. Density CHANGES
Related services require
- real-time response
- ST-data insert
- ST-data search

1-4. Requirements and current technology
1. Insert bunch of ST-data in real-time (<10ms)
2. Search by ST-range query in real-time (<100ms)
3. Distribute data equally regardless of density changes
- All requirements must be satisfied
Data store AppsCars
1. over 20M rec/s[1]
[1] : Fuji Keizai Marketing Research “Connected car related markets and telematics strategy 2017”
(Estimation only in Japan)
2. lng:x1~x2 lat:y1~y2 time:t1~t2
Value
No matured technology that could satisfy all requirements.
ST-range query
3.

1-5. Which data store to use?
Of course we selected “redis”
We searched for…
- blazingly fast performance
- geo features
- secondary indexing
- data distribution
We studied from RedisConf…
redisconf17
Using “Geohash-encoding” & “Sorted-set”
enable ST-data management in redis

2. ST-data utilization in redis

2-1. Related commands
https://redis.io/commands
Geo related commands Sorted-set related commands
Utilize “Geohash[1]”
encoding algorithm
[1] : http://geohash.org/

2-2. What’s “Geohash”?
2-dimensional
longitude(x), latitude(y)
1-dimensional (Geohash)
x1y1x2y2x3y3 … xnyn
length : short=wide long=narrow
0
00
01
0
10
101100
010
011
10
1
1
11
San Francisco(x, y)
lv.1 0 1
lv.1 lv.2 0 1 0 1
0
0
1
1
0
1
Morton-curve[1]
level-1 level-2
n : length of each dimension
01 10 …

2-2. What’s “Geohash”?
★Useful feature
Prefix match = Range query of longitude & latitude
0
00
01
0
10
101100
010
011
1
1
11
10…
1001…
100110…

2-3. Insert/Search requirements
- Insert : longitude(x), latitude(y), time(t), and value
- Search : range query of location and time
x y t value
37.798° -122.402° April 2nd 2019 14:10:15 30 km/h
… … … …
Query : Search all values of…
- GEOHASH with prefix of ‘x1y1…xqyq ’
- TIMESTAMP between t1 and t2
q : length of each dimension for prefix search

>ZADD time_a geohash_a “ID, …”
(integer) 1
>GEOADD time_a geohash_a “ID, …”
(integer) 1
2-4. Possible Key-Value design
-Key
Timestamp
(string)
-Score
Geohash
(int)
-Value
time_a
geohash_a ID, …
… …
time_b
… …
… …
… … …
-Key
Geohash
(string)
-Score
Timestamp
(int)
-Value
geohash_a
time_a ID, …
… …
geohash_b
… …
… …
…
Pattern 1. Time key sorted by Geohash Pattern 2. Geohash key sorted by Time
Either of them works fine
>ZADD geohash_a time_a “ID, …”
(integer) 1

2-5. How to search by range
>ZRANGEBYSCORE t1 x1y1…xqyq…00 x1y1…xqyq…11
>ZRANGEBYSCORE t1+1 x1y1…xqyq…00 x1y1…xqyq…11
…
>KEYS x1y1…xqyq*
(return list[i] of all keys that start with x1y1…xqyq )
>ZRANGEBYSCORE list[0] t1 t2
…
>ZRANGEBYSCORE list[i] t1 t2
query by circle : GEORADIUS instead of ZRANGEBYSCORE
-Key
Timestamp
(string)
-Score
Geohash
(int)
-Value
time_a
geohash_a ID, …
… …
-Key
Geohash
(string)
-Score
Timestamp
(int)
-Value
geohash_a
time_a ID, …
… …
Pattern 1 Pattern 2
(q : length of each dimension for query)

>KEYS x1y1…xqyq*
(return list[i] of all keys that start with x1y1…xqyq )
…
>ZRANGEBYSCORE list[i] t1 t2
…
2-6. Range query takes time
Pattern 1 Pattern 2
Turn around time/Query 1.3 s 535 s
Simple test by using 5 redis-servers
(concurrent connections : 256, number of values : 10 million, search only)
Pattern 1 Pattern 2
[1] : https://redis.io/commands/KEYS
Search too many Keys.
Slow!
Danger![1] Too slow!

…
Pattern 1
Turn around time/Query 1.3 s
Pattern 1
Slow!
It takes more than 1s.
Let’s reduce the Keys

…
Pattern 1
Turn around time/Query 1.3 s
Pattern 1
Slow!
It takes more than 1s.
Let’s reduce the Keys
Wait!
Problem is left!

2-8. Another problem?
Suppose that…
- Tons of cars send data continuously
- Applications require current data
- Multiple Redis-servers are available
AppsCars
redis1
redis2
redis3…
redisN
What will happen?
-Key
Timestamp
(string)
-Score
Geohash
(int)
-Value
time_a
geohash_a ID, …
… …
Pattern 1

redis1
redis2
redis3…
redisN
2-8. Load concentration (intensive access)
current timestamp key
Idle
busy
We send
current data!
We need
current data!
AppsCars

2-8. Load concentration
1 2 3 4
24
24 redis-servers
( concurrent connections : 256 (data insertion only) )
Cannot use CPU resource efficiently
CPU usage (%)
User/System
usage(%)Idle(%)
0
100
spike
0
50

2-9. Problems we need to solve
Problem 1.
- ST-range query is slow due to
- searching too many Keys
- using the “KEYS” command
Problem 2.
- ST-data insert is inefficient due to
- load concentration

3. Proposal

3-1. Applying “ST-code”
0
00
01
0
10
101100
010
011
1
1
11
Morton-curve transform for longitude, latitude, and time
timestamp
[1] Jan Jezek, “STCode : The Text Encoding Algorithm for Laitute/Longitude/Time”,
Springer International Publishing Switzerland 2014
ST-code[1] : x1y1t1 x2y2t2 x3y3t3 … xnyntn
prefix match = range query
timestamp
Min.
timestamp
Max.
current time
0 1
1110
100 101

-Key -Score -Value
PRE-code_a
SUF-code_a ID, …
… …
… … …
ST-code : x1y1t1 x2y2t2 x3y3t3 … xnyntn
split
PRE-code : x1y1t1 … xsysts
(express WIDE st-range)
SUF-code : xs+1ys+1ts+1 … xnyntn
(express NARROW st-range)
>ZADD PRE-code_a SUF-code_a “ID5, …”
(integer) 1
s : where you split
Don’t make me use the
KEYS command!

-Key -Score -Value
PRE-code_a
SUF-code_a ID5, …
… …
… …
>ZRANGEBYSCORE PRE-code_a
xs+1ys+1ts+1…xqyqtq…000 xs+1ys+1ts+1…xqyqtq…111
Very Fast! Problem solved!?
(restriction : s < q)
s : where you split
q : length of each dimension for prefix search
ST-range query only in one command!

Problems we need to solve
Problem 1. (Solved by ST-code!)
Problem 2. (not yet)
search only 1 key
“ZRANGEBYSCORE”

Don’t forget about this…

3-2. Limited node distribution
insert
• Select multiple nodes based on the hashed value of ST-code(PRE-code).
• Insert to “one” of the selected nodes.
• Search from “all” of the selected nodes.
San Francisco, 7:00
7:03
…
7:00
7:01
7:02
search
7:00～7:01
San Francisco,
ST-range query
avoid load concentration efficient search
#works as above when applying ST-code(PRE-code) as Key
time
selected nodes

Problems we need to solve
Problem 1. (Solved by ST-code!)
Problem 2. (Solved by Limited node distribution)
search only 1 key
“ZRANGEBYSCORE”
load distribution

3-3. Architecture Overview
(A)ST-code & (B)Limited node distribution are applied.
calculate ST-code
split ST-code into PRE-code & SUF-code
calculate hashed value of PRE-code
calculate insert/search node number
1 2 3 4 5
PRE-code ⇒ “Key”
SUF-code ⇒ “Score”
PRE-code ⇒ “Key”
SUF-code ⇒ range query of “Score”
Cars (insert) Application (search)
time lat lng value
Redis
(B)
ST-code value ST-code
PRE-code
valuenode num
valuetime lat lng
(A)
SUF-code
PRE-code SUF-code node num PRE-code SUF-code

4. Performance

4-1. Compared methods
-Key -Score -Value
PRE-code_a
SUF-code_a …
… …
… … …
1. ST-key method (ST-code & Limited node distribution)
2. Time-key method
-Key -Score -Value
time_a
geohash_a …
… …
… … …

4-2. Experimental conditions
Concurrency
(max)
Data size
(KB)
Redis server nodes
“selected nodes”
for proposed method
insert 640
10 24 8
search 320
Data inserted (10 million data) Data searched (100,000 query)
time range : 15min
area : 3km2
(1) : http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml
dense/sparse
depending on area(2)
time Current timestamp
longitude
NY Taxi open data(1)
latitude
value ID, speed, etc.
(2) : referred from https://toddwschneider.com/posts/analyzing-1-1-billion-nyc-taxi-and-uber-trips-with-a-vengeance/

4-3. System configuration
Client:8 physical machines Server:4 physical machines (24 redis processes)
# Software version
1
Client
Jedis 2.9.0
2 Test Program(TP) -
2 MW Java 1.8.0
3 Server redis 5.0.3
4 OS Ubuntu 16.04.1 LTS
# Specification
1 server all in common
Intel Xeon E5-2618Lv4 10 core 2.2GHz 25M cash x1
256GB DDR4 1.2v ECC REG DIMM （32GBx8）
SSD : 2.5inch 480GB SATA3 ×2
HDD : 2.5inch 1TB 7200rpm SATA3 ×2
3 NW Infiniband SW
【Mellanox IB Switch】
MSB7800-ES2F Switch-IB™-2 based EDR InfiniBand
1U Switch, 36 QSFP28ports
Client#1
Jedis
MW
OS
TP
Server#1
OS
Redis1
Infiniband SW
Redis2 Redis3 Redis4 Redis5 Redis6
Server#4
OS
Redis19 Redis20 Redis21 Redis22 Redis23 Redis24
Client#2
Jedis
MW
OS
TP
Client#7
Jedis
MW
OS
TP
Client#8
Jedis
MW
OS
TP
…
…

4-4. Insert performance
• ST-key method is 13 times better in throughput, 12 times better in turn around time(TAT).
Throughput (rec/s)
0
20000
40000
60000
80000
ST-key Time-key
Average TAT (ms/rec)
0
10
20
30
40
ST-key Time-key
×12
(concurrency : 256)
×1376000
5000 3
40

4-4. Insert performance (CPU resource)
ST-key Time-key
Average CPU usage of all servers 81% 5%
• ST-key method can fully use CPU resource of servers.
• ST-key method distributed processing load to servers equally.
ST-key Time-key
fully used spike
User/System
usage(%)
Idle(%)
User/System
usage(%)Idle(%)
0
100
0
70
0
100
0
50

4-5. Search performance
• ST-key method is 5 times better in throughput and TAT.
(concurrency : 256)
0
1000
2000
3000
4000
ST-key Time-key
0
100
200
300
400
ST-key Time-key
Throughput (query/s) Average TAT (ms/query)
×5×53500
680 70
360

4-5. Search performance (CPU resource)
• ST-key method enables better performance with less CPU usage.
ST-key Time-key
ST-key Time-key
Average CPU usage of all servers 51% 65%
40
User/System
usage(%)
Idle(%)
User/System
usage(%)
Idle(%)
100
0
0
50
less CPU
0
100
0

4-6. Results in summary
All requirements are satisfied
1. Insert bunch of ST-data in real-time (<10ms)
2. Search by ST-range query in real-time (<100ms)
3. Distribute data equally regardless of density changes
ST-key method
AppsCars
Value
ST-range query
redis
1. 3.3ms/insert 2. 70ms/query3. Distribution

5. ST-code generation tips /
Demo(console)

5-1. ST-code generation ( stencode/stencode_naive.py)
def st_encode(lon_input, lat_input, time_input, precision=96):
lon_interval, lat_interval, time_interval = (-90.0, 90.0), (-180.0, 180.0), (0.0, 2018304000.0)
st_code = ‘’
loop = 0
while len(st_code) < precision:
if loop%3 ==0:
mid = (lon_interval[0] + lon_interval[1]) / 2
if lon_input > mid:
lon_interval = (mid, lon_interval[1])
st_code += '1'
else:
lon_interval = (lon_interval[0], mid)
st_code += '0'
elif loop%3 == 1:
mid = (lat_interval[0] + lat_interval[1]) / 2
if lat_input > mid:
lat_interval = (mid, lat_interval[1])
st_code += '1'
else:
lat_interval = (lat_interval[0], mid)
st_code += '0'
else :
mid = (time_interval[0] + time_interval[1]) / 2
if time_input > mid:
time_interval = (mid, time_interval[1])
st_code += '1'
else:
time_interval = (time_interval[0], mid)
st_code += '0'
loop += 1
return st_code
Too naïve!!!
Too slow!!!

input = [lon_input, lat_input, time_input]
maxmin = [(-90.0, 90.0), (-180.0, 180.0), (0.0, 2018304000.0)]
def st_encode_FAST(input, maxmin, precision=96):
bins=[]
precision = int(precision/3)
for (i, m) in zip (input, maxmin):
tmp = (i-m[0])/(m[1]-m[0])*(2**precision)
tmp = format(int(tmp),'b')
n_lost = precision-len(tmp)
bins.append('0' * n_lost + tmp)
st_code = ''.join(b1+b2+b3 for b1,b2,b3 in zip(bins[0],bins[1],bins[2]))
return st_code
5-1. ST-code generation ( stencode/stencode_fast.py)
Much faster

5-2. Demo (console)
- Data insert (st_insert.py)
- Data search (st_search.py)
redis client
PyPIredis
MW
OS
st_insert.py st_search.py
redis server
redis
OS
- key : PRE CODE
- score : SUF CODE
- value : ID, lat, lng, time
- key : PRE CODE
- score : SUF CODE
- value : ID, lat, lng, time

Redisconf19: Real-time spatiotemporal data utilization for future mobility services

Recommended

Recommended

More Related Content

Similar to Redisconf19: Real-time spatiotemporal data utilization for future mobility services

Similar to Redisconf19: Real-time spatiotemporal data utilization for future mobility services (20)

Recently uploaded

Recently uploaded (20)

Redisconf19: Real-time spatiotemporal data utilization for future mobility services

Editor's Notes