[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale

Rivian Internal
P R O P R I E T A R Y A N D C O N F I D E N T I A L | D O N O T D I S T R I B U T E | 1
Real-time analytics at IoT scale
Pramod Immaneni
Principal Architect, Data Platform

Our vehicles
An electric SUV
to rule them all
Vehicle of
the Year
Included 10 Best Trucks
and SUVs awards for 2023
2022 Truck
of the Year
Electric Vehicle
of the Year
“Best ownership experience” among premium
battery electric vehicles. “R1T scored higher
overall than any other vehicle in the study”
R1T and R1S earned
the highest safety
rating from IIHS

Data and ML Landscape
Analytical Queries Near Real-time Streams Event Processing Purpose-built Pipelines
Dashboards Models Applications
Vehicle Telemetry Hi/Lo Frequency, Service, Charging, …
Data and AI Platform

Platform Architecture
Real-time Platform
Unity
Catalog
Lakehouse Platform
Event Processor
Low-latency store
Standardization pipelines
Data & ML pipelines Delta Tables

Real-time
Platform
7
Real-time Telemetry
Low Latency Queries
Timely Actions
Availability of fresh vehicle telemetry
data to applications
Processing queries with sub-second
latencies to power interactive
dashboards. Supporting time-series and
OLAP queries
Taking actions on data in motion for
timely response

Standardization
Pipeline
Push Notification
PagerDuty
Event Bridge
Standardization,
validation of data
schema and adding
vehicle context
Data preparation for storage
in real-time distributed data store
Event Watch action platform, some examples
• When a critical event is detected a PagerDuty
alert is sent out.
• OTA status such as success/fail/ready to install
are pushed to Mobile.
• A change event is sent when a controller is
swapped on the vehicle by detecting a change
in device id – Stateful
Event Watch
Service
Filtering, Dedup
Sessions, Aggregations
Custom pipelines
• Trip/Session detection
• Geofence entry and exit detection
Telemetry Data
Real-time Processing
Distributed OLAP Database
Query
Service
Real-time queries
• Time-series
• Bucketed Aggregates
• Slicing and dicing
Kafka message queue used for
data handoff
Streaming pipelines built on
Apache Flink

Clustering Platform
Stream Processing Engine
Streaming
Pipeline
Infrastructure/Cloud
Event
Watch
Streaming
Pipeline
AWS
Kubernetes
(EKS)
Apache Flink
Streaming Layer
Data Processing
Business Logic
Application Spec
Templates
Mgmt
Layers
Streaming stack

• Analyzing data and taking business actions as soon as data is produced or
available.
• Message by message (event) processing contrast to batch processing.
• Enables large scale real-time pipelines that are potentially running forever*.
Input from live
sources or stores (MQ
(Kafka), HTTP/Socket,
Files etc.)
Unbounded,
continuous data
streams
Batch can be processed as
stream (but a stream is not a
batch)
(In-memory)
Processing with
temporal boundaries
(windows)
Support for event time
semantics
Stateful operations:
Aggregation, Rules, …
-> Analytics
Output to stores, live
dashboards,
downstream
applications
Stream Processing

Direct Acyclic Graph
(DAG)
• Application logic broken down into stages - Operator
• Multiple instances of each stage
• Data Tuples are sent in a continuous Stream between the Operators
• Operators connected with Streams to form a DAG
Operators
Stream of Data Tuples
Instances of an operator
Streaming Pipeline/Application

Geofence Detection
Vehicle
Stream
Geo-
fence
Stream
Geo-
Hash
Match
level-1
Geo-
Hash
Match
level-2
Geo-
Hash
Match
level-8
. . . Bounding
Polygon
Match
https://www.geospatialworld.net/blogs/polygeohasher-an-optimized-way-to-create-geohashes/
Geohash
• Geohash is a hierarchical spatial mapping system
• Sava Centar - srywbvnhkp9v
• Two locations close to each other share common prefix
• Longer the match the closer they are
• Geofence matching can be sped up by iteratively matching
more characters of the prefix
• Iterations can be pipelined for higher throughput
• Each stage can be scaled to handle more vehicles

Event Watch
• A low-latency platform for event filtering, change detection and actions
• Out of the box supported specifications
• Sessionization, Dedup, Staleness, Streaming SQL
• Notifications via PagerDuty, Push Notifications, EventBridge or Email
• Supports pluggability of custom actions with BYOC
• Built on Apache Flink a stateful, distributed and fault tolerant stream processing engine
• 2M events/sec peak, <100ms avg latency

Specifications
• Enable staleness to discard late arriving data
• Keep streams current and avoid outdated event detection
"staleness": {
"signal": "sound_alarm",
"ttl_ms": 3600000
}
Streaming SQL
• Describe stream subscription using familiar SQL.
• Output is a continuous stream of matching events
"query": "select `id`, `timestamp` from stream where `sound_alarm`='true'"
Staleness check

Specifications
• Virtual session on event detection
• Query other signals in context of the session
• Supports TTL on the dependency signal
Deduplication
• Identify and discard duplicates
• Useful to avoid triggering duplicate notifications
• Provide TTL for deduplication at ms granularity
Sessions
"dedup": true,
"dedup_ttl_ms": 86400000
"query" : "select `id`, `range_threshold`, `pet_mode_status`from stream
where `range_threshold` = 'VEHICLE_RANGE_CRITICALLY_LOW'",
…
"dependency": {
"type": "thermal",
"subtype": "hvac_settings",
"signal": "pet_mode_status",
"values": ["On"]
},

Real-time Query Service
● Service and remote diagnostics
○ looking at telemetry data before an appointment can identify the issue
○ replacement parts can be ordered in advance
○ sample telemetry:
■ current OTA version
■ diagnostic error codes
■ core vehicle data
● Fleet management
○ fleet customers can view telemetry data for their fleet
○ aggregating data per day or per hour
○ sample telemetry:
■ state of charge
■ charging status
■ energy added in charge session
■ estimated range
■ odometer
■ current location
Telemetry schema preparation
Distributed OLAP Database
Query
Service

Performance
• Horizontal data capacity scaling by adding data
nodes
• Scale usage by adding query nodes
Capability
Scalability
Extensibility
• Sub-second query responses enables interactive data
exploration and reporting applications
• Query both real-time and historical data together
• OLAP and timeseries queries
• Flexible schema, keys, metrics and rollups
• Data tiering and retention with QoS
• Extensible with plugins – Kafka, S3, parquet, sketches
• Highly configurable at component level
• Built-in streaming ingestion of real-time
data and batch ingestion from data lakes
• Apache open-source community driven model
• Data can accumulate over several months or years
and tables can grow into billions of rows
Apache Druid – Real-time OLAP
• Fault tolerant, self healing and balancing
• Data tiering and retention with QoS

Kubernetes Deployment
https://imply.io/druid-architecture-concepts/
• Data Nodes
• Historicals have bulk of the
data
• Middle Managers (MM) hold
data during ingestion
• Query Nodes
• Brokers process user queries
by aggregating results from
MMs and Historicals
• Routers route queries to
brokers or master nodes
• Master Nodes
• Coordinators manage
historicals, data assignment
and auto-compaction
• Overlord manage MMs and
real-time ingestion tasks
Data Nodes – Store data, including real-time data being ingested and respond to queries.
Query Nodes – Process user queries utilizing the Data Nodes.
Master Nodes – Manage coordination, manage ingestion tasks and work assignment, data
distribution, and recovery aspects.
Metadata Storage used for storing information about data segments and dynamic configuration
Deep Storage is where data lands up for permanent storage – Data Lake (S3)

Producer
Producer
Producer
Producer
Kafka
Partition 1
Partition N
Ingestion Task 1
Ingestion Task 2
Ingestion Task 3
Ingestion Task 4
Ingestion Task M
Deep Storage
Druid
Query Brokers
Historicals
App 1
App 2
Finalized Segments
Partial Segment Data
Segments
Historicals
• Druid continually ingests streaming Telemetry data from Kafka in real-time ~ 2M events/sec
• Multiple tasks running in parallel ingest from different Kafka Partitions and Brokers
• Tasks create data segments which are collection of events (rows) and indexes for search
• Tasks publish finalized segments to Deep Storage, which are picked up and served by Historicals
• Tasks run on Middle Managers and auto-recovered on failure
Real-time Ingestion

[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale

More Related Content

Similar to [DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale

More from DataScienceConferenc1

Recently uploaded

[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale