Keepin’ It Real(-Time) With Nadine Farah | Current 2022

Nadine Farah, Senior Developer Advocate
@heyerrrbody
in/nadinefarah
Current 2022, Austin
Keepin' it real(-time): How to generate
instant, actionable insights on streaming data

2
2
Section Duration
Streaming data is on the rise 📈 5 min
🤔 Real-time analytics on streaming data challenges 20 minutes
🛠 Rockset deep dive + build a real-time customer 360 app with
recommendations
10 minutes
Q&A 5 minutes
Agenda

Soo.. about me
3
● 3+ years focusing on real-time analytics & streaming data
● Recently led a workshop on real-time analytics with Kafka
● Lead Rockset’s developer initiatives
● Best friends with Ferro and my dog, Romeo
in/nadinefarah/
@nfarah86

Streaming data is on the rise 📈

Kafka by the #’s
9% Increase in downloads of
open-source Kakfa
40K Companies using Kafka
Source: GH Insights

🤔 Real-time analytics and streaming
data challenges

Real-time analytics and streaming data
challenges
● Handling bursty traffic
● Handling out-of-order events
● Schema changes
● Query speed and flexibility

Challenge 1: bursty traffic
Handle bursty traffic

Less efficient ways of scaling bursty data
traffic
● Manual reconfigurations
○ You create bottlenecks when you want to scale up because
it’s not triggered automatically
● Tightly coupled compute and storage
○ Ingesting data and querying data affect each other → writing
large amounts of data affects the reads and v.v. ‘
○ Data has to be moved closer to memory to make use of the
available resources

ALT architecture: more efficient on scaling
bursty data traffic
ALT architecture review

Challenge 2: handling out-of-order events
Handle out-of-order events in real-time analytics

Solve out-of-order events
Different types of databases
handles this differently mutable
immutable

Copy on write: less efficient in updating a
field
● 1 way to solve handling out of order events for
immutable databases is copy on write. Example: data
warehouses
○ Any updates require both writing new data and
rewriting already-written adjacent data in order to
store everything correctly to disk in the right time
order
○ requires a significant amount of processing power
and time

Problems with immutability for real-time
analytics
● Usage and volume of streaming data is increasing
● Immutable databases can’t do updates, inserts, and
deletes:
○ append-only
● Shift from batch to streaming:
○ Data apps are having tighter SLAs for query and
data latency. So, apps need an efficient real-time
database to handle the nuances and volume of
streaming data

Database mutability: make field changes in
place

Challenge 3: Schema changes
● Hard to update rigid ETL jobs on nested JSON objects
with new fields being updated, added, or deleted
● Relational databases can take a performance hit if you
query JSON data without conversion to SQL fields
● It’s hard to work with nested JSON objects right away
because you have to build processes in order to flatten
it

Strong dynamic typing and indexing makes it
easier to work with schema changes
● Database that can index nested JSON docs
● Database that supports strong dynamic typing so you
can query fields with multiple data types
● Database that easily turns nested JSON to a SQL table
at runtime without having to do prior transformations

Challenge 4: running queries with low
efficiency
● For user-facing analytics, where the access pattern can
be unknown, defining all the indexes can be a challenge
● Columnar stores that use brute-force scans are slow
and are not ideal when you are constantly querying data
because you have to throw more compute resources in
order to get faster speeds

Auto-indexing reduces compute resources
for querying data
● You’ll need a database that automatically creates
indexes so you don’t have to create or manage it
manually and when the steaming data changes
● In the presence of indexes, lesser compute is needed to
serve the query

SQL: best for complex analytics
● NoSQL databases:
○ Easy lookups
○ Have to learn a new language
○ No JOIN support at scale
○ Struggles on complex aggregations
● SQL databases:
○ Easy to JOIN (at scale), aggregate,
and search

Batch architecture: streaming data
challenges
Inefficient ingest
Eg: Expensive MERGE operations
for processing inserts, updates,
deletes
Query Latency
> 1 min
Data Latency
>1 hour
Slow, expensive user-
facing analytics
Time-Consuming ETL Jobs
Eg: pre-aggregations
Microsoft SQL
Server
Postgres
Amazon S3 Google Cloud Azure Blob BigQuery
Snowflake Redshift Databricks
MongoDB DynamoDB
Kafka Kinesis Google Pub/Sub Azure Event
Hubs
Oracle MySQL
Inefficient queries
Eg: Expensive full table scans,
indexing tuning
🐌

Real-time architecture: purpose built for
data applications
Fast, Efficient User-
facing Analytics
Efficient Upserts
Mutable at field level to avoid
MERGE operations
Cloud-native
Compute-storage separation to avoid over-provisioning
Query Latency
< 1 second
Data Latency
< 2 seconds
Amazon S3 Google Cloud Azure Blob BigQuery
Snowflake Redshift Databricks
MongoDB DynamoDB
Kafka Kinesis Google Pub/Sub Azure Event
Hubs
Microsoft SQL
Server
Postgres
Oracle MySQL
Fast Queries
Converged Index avoids
SCAN operations
Ingest Rollups
Transform and pre-aggregate to
reduce storage 10-100x

Real-time analytics at your fingertips
● Ability to handle bursty traffic
● Out-of-order data
● Strong dynamic schema
● No ETL jobs
● Serverless architecture

Rockset is the real-time analytics platform built for the
cloud.
Rockset enables sub-second queries on real-time data.
Build user-facing analytics with surprising efficiency.

Whatnot: real-time personalization
https://rockset.com/blog/how-rockset-turbocharges-real-time-
personalization-for-our-live-shopping/

Real-time Customer 360 with Recommendations
Rockset workshop

33
33
Get started with $300 in free credits
https://console.rockset.com/create

rockset.com/docs
Booth S3 at The Austin Convention
Center
…. Or come find me and let’s chat about
Kafka and real-time analytics over a tasty
beverage- on me!

Keepin’ It Real(-Time) With Nadine Farah | Current 2022

Recommended

Recommended

More Related Content

Similar to Keepin’ It Real(-Time) With Nadine Farah | Current 2022

Similar to Keepin’ It Real(-Time) With Nadine Farah | Current 2022 (20)

More from HostedbyConfluent

More from HostedbyConfluent (20)

Recently uploaded

Recently uploaded (20)

Keepin’ It Real(-Time) With Nadine Farah | Current 2022