Nauto devices are installed in fleets to help improve driving behavior as well as fleet managers to know who is driving well and who is not. One of the fundamental components around which everything revolves is the notion of “trips” -- a trip being the time when the vehicle started to when it came a full stop parked. Vehicle states such as moving and stopped are inferred from accelerometer data and send to the cloud servers over LTE connections sampled at very short intervals.
These states are then combined in an online algorithm that builds trip segments, extending or combining them as and when we get more state updates. Further each trip segment records attributes such as the route and speed. In order to be able to do this at scale, we create, merge, and delete these trips as they grow in a time-series store on Scylla. The web servers directly serve these routes out of Scylla.
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Scylla Summit 2018: Nauto - An Online Method for Merging Time Ranges on Top of Scylla
1. An online method for merging
time ranges on top of Scylla
Rohit Saboo
ML Engineering Lead, Nauto Inc.
2. Presenter bio
▪ Leading an ML engineering team at Nauto -- working on
various things such as finding trips, driver identification,
lossless sensor data compression.
▪ Previously a founding engineer at a search startup, and in
Google search and robotics.
▪ MS., Ph.D in Computer Science from UNC Chapel Hill, and
B.Tech, Computer Science, IIT Madras.
▪ I enjoy hiking, photography, swimming, and biking in my
spare time.
9. Time ranges (trips) table
CREATE TABLE device_trips (
version int,
id text,
bucket timestamp,
end_ms timestamp,
start_ms timestamp,
details blob,
PRIMARY KEY ((version, id, bucket), end, start)
) WITH CLUSTERING ORDER BY (end DESC, start DESC)
10. Time ranges (trips) table
CREATE TABLE device_trips (
version int,
id text,
bucket timestamp,
end_ms timestamp,
start_ms timestamp,
details blob,
PRIMARY KEY ((version, id, bucket), end, start)
) WITH CLUSTERING ORDER BY (end DESC, start DESC)
The id of the time series. In our case, the vehicle/device id.
11. Time ranges (trips) table
CREATE TABLE device_trips (
version int,
id text,
bucket timestamp,
end_ms timestamp,
start_ms timestamp,
details blob,
PRIMARY KEY ((version, id, bucket), end, start)
) WITH CLUSTERING ORDER BY (end DESC, start DESC)
For bucketing the time ranges into manageable partitions. Truncated from end_ms.
12. Time ranges (trips) table
CREATE TABLE device_trips (
version int,
id text,
bucket timestamp,
end_ms timestamp,
start_ms timestamp,
details blob,
PRIMARY KEY ((version, id, bucket), end, start)
) WITH CLUSTERING ORDER BY (end DESC, start DESC)
The start and end of the time range.
13. Time ranges (trips) table
CREATE TABLE device_trips (
version int,
id text,
bucket timestamp,
end_ms timestamp,
start_ms timestamp,
details blob,
PRIMARY KEY ((version, id, bucket), end, start)
) WITH CLUSTERING ORDER BY (end DESC, start DESC)
To simplify running multiple versions of the algorithms or experiments.
14. Time ranges (trips) table
CREATE TABLE device_trips (
version int,
id text,
bucket timestamp,
end_ms timestamp,
start_ms timestamp,
details blob,
PRIMARY KEY ((version, id, bucket), end, start)
) WITH CLUSTERING ORDER BY (end DESC, start DESC)
Application-specific.
For us, a marshaled protobuf containing gps, speed, etc. along the trip
(with json type now available, it may be preferable to use json.)
15. Time ranges (trips) table
CREATE TABLE device_trips (
version int,
id text,
bucket timestamp,
end_ms timestamp,
start_ms timestamp,
details blob,
PRIMARY KEY ((version, id, bucket), end, start)
) WITH CLUSTERING ORDER BY (end DESC, start DESC)
Partitioned by time series id and bucketed to manage partition size.
Clustering order chosen to make the common case of data arriving in-order optimal.
16. By utilizing a sharded worker pool for merging,
neither are there races nor locks are necessary.
As a result, there will (almost) always be at most
2 neighboring time ranges:
one before and one after