Scylla Summit 2018: Nauto - An Online Method for Merging Time Ranges on Top of Scylla

An online method for merging
time ranges on top of Scylla
Rohit Saboo
ML Engineering Lead, Nauto Inc.

Presenter bio
▪ Leading an ML engineering team at Nauto -- working on
various things such as finding trips, driver identification,
lossless sensor data compression.
▪ Previously a founding engineer at a search startup, and in
Google search and robotics.
▪ MS., Ph.D in Computer Science from UNC Chapel Hill, and
B.Tech, Computer Science, IIT Madras.
▪ I enjoy hiking, photography, swimming, and biking in my
spare time.

▪ Identifies risky driving using
machine learning
▪ Enables fleet managers to
correlate internal driver
behaviour with external vehicle
movements
▪ Coaching drivers for safer roads
Nauto

Trips and Identifying the Driver
▪ Record trip taken.
▪ Identify the driver taking the
trip.
Device takes
● snapshot & crops image
● GPS, speed ...
● vehicle state
N2
Upload

Merging Time Ranges
(Building Trips)

GPS
Moving/Stopped
Speed
Data Flow
N2
Kafka
worker 1
Sharded by
Device
worker i
worker n
Get
neighboring
ranges
Save merged
range
(Delete old)

8:00 8:05 8:10 8:15 8:20 8:25 8:30 8:35 8:40 8:45 8:50 8:55 9:00 9:05
8:00am 8:12am 8:18am 8:35am 8:45am 8:55am
Time Ranges (Trips)

Time ranges (trips) table
CREATE TABLE device_trips (
version int,
id text,
bucket timestamp,
end_ms timestamp,
start_ms timestamp,
details blob,
PRIMARY KEY ((version, id, bucket), end, start)
) WITH CLUSTERING ORDER BY (end DESC, start DESC)

version int,
id text,
bucket timestamp,
end_ms timestamp,
start_ms timestamp,
details blob,
The id of the time series. In our case, the vehicle/device id.

version int,
id text,
bucket timestamp,
end_ms timestamp,
start_ms timestamp,
details blob,
For bucketing the time ranges into manageable partitions. Truncated from end_ms.

version int,
id text,
bucket timestamp,
end_ms timestamp,
start_ms timestamp,
details blob,
The start and end of the time range.

version int,
id text,
bucket timestamp,
end_ms timestamp,
start_ms timestamp,
details blob,
To simplify running multiple versions of the algorithms or experiments.

version int,
id text,
bucket timestamp,
end_ms timestamp,
start_ms timestamp,
details blob,
Application-specific.
For us, a marshaled protobuf containing gps, speed, etc. along the trip
(with json type now available, it may be preferable to use json.)

version int,
id text,
bucket timestamp,
end_ms timestamp,
start_ms timestamp,
details blob,
Partitioned by time series id and bucketed to manage partition size.
Clustering order chosen to make the common case of data arriving in-order optimal.

By utilizing a sharded worker pool for merging,
neither are there races nor locks are necessary.
As a result, there will (almost) always be at most
2 neighboring time ranges:
one before and one after

8:00 8:05 8:10 8:15 8:20 8:25 8:30 8:35 8:40 8:45 8:50 8:55 9:00 9:05
8:00am 8:12am
Merging

8:00 8:05 8:10 8:15 8:20 8:25 8:30 8:35 8:40 8:45 8:50 8:55 9:00 9:05
8:00am 8:12am
8:40am 8:55am
Merging
end_ms ≥ 8:35am start_ms ≤ 9:00am

8:00 8:05 8:10 8:15 8:20 8:25 8:30 8:35 8:40 8:45 8:50 8:55 9:00 9:05
8:00am 8:12am 8:40am 8:55am
Merging

8:00 8:05 8:10 8:15 8:20 8:25 8:30 8:35 8:40 8:45 8:50 8:55 9:00 9:05
8:00am 8:12am
8:15am 8:35am
8:40am 8:55am
Merging
end_ms ≥ 8:10am

8:00 8:05 8:10 8:15 8:20 8:25 8:30 8:35 8:40 8:45 8:50 8:55 9:00 9:05
8:00am 8:12am 8:40am 8:55am
Merging
8:00am 8:55am

8:00 8:05 8:10 8:15 8:20 8:25 8:30 8:35 8:40 8:45 8:50 8:55 9:00 9:05
Merging
8:00am 8:55am

8:00am 8:55am
Merging
8:00 8:05 8:10 8:15 8:20 8:25 8:30 8:35 8:40 8:45 8:50 8:55 9:00 9:05

The Team
Rohit Saboo, Adam Sowinski, Christian Merkwirth
Yingyi Hu, Karol Kokoszka, Mykola Terelia
and many others

Thank You
Any Questions ?
Please stay in touch
rohit@nauto.com
@NAUTODriver

Scylla Summit 2018: Nauto - An Online Method for Merging Time Ranges on Top of Scylla

Recommended

Recommended

More Related Content

More from ScyllaDB

More from ScyllaDB (20)

Recently uploaded

Recently uploaded (20)

Scylla Summit 2018: Nauto - An Online Method for Merging Time Ranges on Top of Scylla