Uber operates in the complex physical world. One of the challenges of providing a reliable service is detecting highly geolocalized and dynamic scenarios in real-time, such as spatial hotspot, demand/supply imbalanced neighborhoods and etc. The problem is hard because the global scale of Uber is massive, neighborhoods and traffic characteristics are localized and the time to detection needs to be under low latency to be actionable. To solve this problem, Uber engineers have built the situation detection platform powered by Apache Flink and CEP library. In this talk, I will cover i) How we evolve our end-to-end solution to this date by leveraging Apache Flink, Uber’s hexagonal spatial indexing system H3 and large scale clustering algorithms. ii) How we aggregate billions of events across the globe and derive geospatial semantics through CEP pattern matching. iii) Challenges involved in scaling the platform and the various techniques we employed.
2. Uber | Geospatial Situation Detection through FlinkCEP
● Marketplace
● Observability Problem
● Large-Scale Clustering
● Situation Detection through Pattern Matching
● Tips, Tricks and Lessons Learned
Outline
3. Uber | Geospatial Situation Detection through FlinkCEP
Marketplace
● Modeling the physical world
● Global Logistics Network
● Real-Time Decision Engine
4. Uber | Geospatial Situation Detection through FlinkCEP
Marketplace
Dynamic Pricing
Forecasting
Driver Positioning
Intelligent Dispatch Marketplace Health
Marketplace Platform & Data
Fares
Driver / Rider Pricing
5. Uber | Geospatial Situation Detection through FlinkCEP
Photo: Jessica Christian / The Chronicle
Observability Problem
6. Uber | Geospatial Situation Detection through FlinkCEP
Source: Giphy
[https://giphy.com/gifs/FmNXeuoadNTpe]
Scaling Observability
● 700+ Cities
● Local Heterogeneity
● Space and Time Dimensions
● Real-Time Constraints
7. Uber | Geospatial Situation Detection through FlinkCEP
Problem
EDGE ZOOM IN PATTERN MATCH OBSERVER
8. Uber | Geospatial Situation Detection through FlinkCEP
Detecting the Region
● Similar Characteristics
● Connected Region
● Arbitrary Shape
● Cheap in Computation
9. Uber | Geospatial Situation Detection through FlinkCEP
Detecting the Region through clustering
● K-means ?
● Density-based clustering ?
10. Uber | Geospatial Situation Detection through FlinkCEP
Density-based Spatial Clustering with Noise
● Epsilon Ball Rule
● Worst Case
○ O(n²) !
○ O(n log n)
(auxiliary data structure)
11. Uber | Geospatial Situation Detection through FlinkCEP
How can we do better ?
12. Uber | Geospatial Situation Detection through FlinkCEP
H3 : Hexagonify the World !
https://h3geo.org/#/
13. Uber | Geospatial Situation Detection through FlinkCEP
Credited to Nick Rabinowitz
14. Uber | Geospatial Situation Detection through FlinkCEP
Credited to Nick Rabinowitz
Uniform Adjacency
15. Uber | Geospatial Situation Detection through FlinkCEP
Credited to Nick Rabinowitz
DBSCAN in Hexagons
16. Uber | Geospatial Situation Detection through FlinkCEP
Data SIO, NOAA, U.S. Navy, NGA, GEBCO,
Image Landsat / Copernicus
Image IBCAO
Credited to Nick Rabinowitz
17. Uber | Geospatial Situation Detection through FlinkCEP
Junior, M.R., Souza, B.J., & Endler, M. (2019). DG2CEP: a near
real-time on-line algorithm for detecting spatial clusters large
data streams through complex event processing. Journal of
Internet Services and Applications, 10, 1-28.
Low-Latency Clustering on Streams
18. Uber | Geospatial Situation Detection through FlinkCEP
Low-Latency Clustering on Streams
19. Uber | Geospatial Situation Detection through FlinkCEP
Clustering on streaming data
● Create
● Update
● Merge (expensive!)
20. Uber | Geospatial Situation Detection through FlinkCEP
Clustering on streaming data
● Disperse
● Split (expensive!)
21. Uber | Geospatial Situation Detection through FlinkCEP
Static clusters are not good enough to capture
marketplace dynamics
22. Uber | Geospatial Situation Detection through FlinkCEP
Situation
CLUSTERS IN MOTION
t1 t2 t3