SlideShare a Scribd company logo
1 of 35
Download to read offline
Nadine Farah, Senior Developer Advocate
@heyerrrbody
in/nadinefarah
Current 2022, Austin
Keepin' it real(-time): How to generate
instant, actionable insights on streaming data
2
2
Section Duration
Streaming data is on the rise 📈 5 min
🤔 Real-time analytics on streaming data challenges 20 minutes
🛠 Rockset deep dive + build a real-time customer 360 app with
recommendations
10 minutes
Q&A 5 minutes
Agenda
Soo.. about me
3
● 3+ years focusing on real-time analytics & streaming data
● Recently led a workshop on real-time analytics with Kafka
● Lead Rockset’s developer initiatives
● Best friends with Ferro and my dog, Romeo
in/nadinefarah/
@nfarah86
Streaming data is on the rise 📈
Kafka by the #’s
9% Increase in downloads of
open-source Kakfa
40K Companies using Kafka
Source: GH Insights
Impact more … with Kafka
🤔 Real-time analytics and streaming
data challenges
Real-time analytics and streaming data
challenges
● Handling bursty traffic
● Handling out-of-order events
● Schema changes
● Query speed and flexibility
Challenge 1: bursty traffic
Handle bursty traffic
Less efficient ways of scaling bursty data
traffic
● Manual reconfigurations
○ You create bottlenecks when you want to scale up because
it’s not triggered automatically
● Tightly coupled compute and storage
○ Ingesting data and querying data affect each other → writing
large amounts of data affects the reads and v.v. ‘
○ Data has to be moved closer to memory to make use of the
available resources
ALT architecture: more efficient on scaling
bursty data traffic
ALT architecture review
Challenge 2: handling out-of-order events
Handle out-of-order events in real-time analytics
Solve out-of-order events
Different types of databases
handles this differently mutable
immutable
Copy on write: less efficient in updating a
field
● 1 way to solve handling out of order events for
immutable databases is copy on write. Example: data
warehouses
○ Any updates require both writing new data and
rewriting already-written adjacent data in order to
store everything correctly to disk in the right time
order
○ requires a significant amount of processing power
and time
Problems with immutability for real-time
analytics
● Usage and volume of streaming data is increasing
● Immutable databases can’t do updates, inserts, and
deletes:
○ append-only
● Shift from batch to streaming:
○ Data apps are having tighter SLAs for query and
data latency. So, apps need an efficient real-time
database to handle the nuances and volume of
streaming data
Database mutability: make field changes in
place
Challenge 3: Schema changes
● Hard to update rigid ETL jobs on nested JSON objects
with new fields being updated, added, or deleted
● Relational databases can take a performance hit if you
query JSON data without conversion to SQL fields
● It’s hard to work with nested JSON objects right away
because you have to build processes in order to flatten
it
Strong dynamic typing and indexing makes it
easier to work with schema changes
● Database that can index nested JSON docs
● Database that supports strong dynamic typing so you
can query fields with multiple data types
● Database that easily turns nested JSON to a SQL table
at runtime without having to do prior transformations
Challenge 4: running queries with low
efficiency
● For user-facing analytics, where the access pattern can
be unknown, defining all the indexes can be a challenge
● Columnar stores that use brute-force scans are slow
and are not ideal when you are constantly querying data
because you have to throw more compute resources in
order to get faster speeds
Auto-indexing reduces compute resources
for querying data
● You’ll need a database that automatically creates
indexes so you don’t have to create or manage it
manually and when the steaming data changes
● In the presence of indexes, lesser compute is needed to
serve the query
SQL: best for complex analytics
● NoSQL databases:
○ Easy lookups
○ Have to learn a new language
○ No JOIN support at scale
○ Struggles on complex aggregations
● SQL databases:
○ Easy to JOIN (at scale), aggregate,
and search
Batch architecture: streaming data
challenges
Inefficient ingest
Eg: Expensive MERGE operations
for processing inserts, updates,
deletes
Query Latency
> 1 min
Data Latency
>1 hour
Slow, expensive user-
facing analytics
Time-Consuming ETL Jobs
Eg: pre-aggregations
Microsoft SQL
Server
Postgres
Amazon S3 Google Cloud Azure Blob BigQuery
Snowflake Redshift Databricks
MongoDB DynamoDB
Kafka Kinesis Google Pub/Sub Azure Event
Hubs
Oracle MySQL
Inefficient queries
Eg: Expensive full table scans,
indexing tuning
🐌
Real-time architecture: purpose built for
data applications
Fast, Efficient User-
facing Analytics
Efficient Upserts
Mutable at field level to avoid
MERGE operations
Cloud-native
Compute-storage separation to avoid over-provisioning
Query Latency
< 1 second
Data Latency
< 2 seconds
Amazon S3 Google Cloud Azure Blob BigQuery
Snowflake Redshift Databricks
MongoDB DynamoDB
Kafka Kinesis Google Pub/Sub Azure Event
Hubs
Microsoft SQL
Server
Postgres
Oracle MySQL
Fast Queries
Converged Index avoids
SCAN operations
Ingest Rollups
Transform and pre-aggregate to
reduce storage 10-100x
Rockset deep dive & demo 🛠
Real-time analytics at your fingertips
● Ability to handle bursty traffic
● Out-of-order data
● Strong dynamic schema
● No ETL jobs
● Serverless architecture
Rockset is the real-time analytics platform built for the
cloud.
Rockset enables sub-second queries on real-time data.
Build user-facing analytics with surprising efficiency.
28
28
Whatnot: real-time personalization
https://rockset.com/blog/how-rockset-turbocharges-real-time-
personalization-for-our-live-shopping/
QUICK DEMO
Real-time Customer 360 with Recommendations
Rockset workshop
33
33
Get started with $300 in free credits
https://console.rockset.com/create
Q&A
rockset.com/docs
Booth S3 at The Austin Convention
Center
…. Or come find me and let’s chat about
Kafka and real-time analytics over a tasty
beverage- on me!

More Related Content

Similar to Keepin’ It Real(-Time) With Nadine Farah | Current 2022

The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futuremarkgrover
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesKarthik Murugesan
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformDatabricks
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQLWSO2
 
Azure saturday pn 2018
Azure saturday pn 2018Azure saturday pn 2018
Azure saturday pn 2018Marco Pozzan
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapA Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapItai Yaffe
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyAlluxio, Inc.
 
Data Con LA 2022 - Making real-time analytics a reality for digital transform...
Data Con LA 2022 - Making real-time analytics a reality for digital transform...Data Con LA 2022 - Making real-time analytics a reality for digital transform...
Data Con LA 2022 - Making real-time analytics a reality for digital transform...Data Con LA
 
Dynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDataWorks Summit
 
Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016kbajda
 
Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1GurinderG
 
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Databricks
 
Structured Streaming in Spark
Structured Streaming in SparkStructured Streaming in Spark
Structured Streaming in SparkDigital Vidya
 
SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastSQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastDatabricks
 

Similar to Keepin’ It Real(-Time) With Nadine Farah | Current 2022 (20)

The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slides
 
Gcp dataflow
Gcp dataflowGcp dataflow
Gcp dataflow
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis Platform
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
 
Azure saturday pn 2018
Azure saturday pn 2018Azure saturday pn 2018
Azure saturday pn 2018
 
Cloud arch patterns
Cloud arch patternsCloud arch patterns
Cloud arch patterns
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapA Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's Roadmap
 
SAP HANA_class1.pptx
SAP HANA_class1.pptxSAP HANA_class1.pptx
SAP HANA_class1.pptx
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
 
Data Con LA 2022 - Making real-time analytics a reality for digital transform...
Data Con LA 2022 - Making real-time analytics a reality for digital transform...Data Con LA 2022 - Making real-time analytics a reality for digital transform...
Data Con LA 2022 - Making real-time analytics a reality for digital transform...
 
AmazonRedshift
AmazonRedshiftAmazonRedshift
AmazonRedshift
 
Druid @ branch
Druid @ branch Druid @ branch
Druid @ branch
 
Dynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the fly
 
Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016
 
BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
 
Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1
 
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
 
Structured Streaming in Spark
Structured Streaming in SparkStructured Streaming in Spark
Structured Streaming in Spark
 
SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastSQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at Comcast
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonHostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesHostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaHostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonHostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonHostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyHostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersHostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubHostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonHostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLHostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceHostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondHostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsHostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemHostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksHostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTopCSSGallery
 
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxCyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxMasterG
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...FIDO Alliance
 
CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)Wonjun Hwang
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityVictorSzoltysek
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxFIDO Alliance
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsLeah Henrickson
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform EngineeringMarcus Vechiato
 
How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistandanishmna97
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfdanishmna97
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptxFIDO Alliance
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireExakis Nelite
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingScyllaDB
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxFIDO Alliance
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuidePixlogix Infotech
 
الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهMohamed Sweelam
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024Lorenzo Miniero
 

Recently uploaded (20)

Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxCyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistan
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهله
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 

Keepin’ It Real(-Time) With Nadine Farah | Current 2022

  • 1. Nadine Farah, Senior Developer Advocate @heyerrrbody in/nadinefarah Current 2022, Austin Keepin' it real(-time): How to generate instant, actionable insights on streaming data
  • 2. 2 2 Section Duration Streaming data is on the rise 📈 5 min 🤔 Real-time analytics on streaming data challenges 20 minutes 🛠 Rockset deep dive + build a real-time customer 360 app with recommendations 10 minutes Q&A 5 minutes Agenda
  • 3. Soo.. about me 3 ● 3+ years focusing on real-time analytics & streaming data ● Recently led a workshop on real-time analytics with Kafka ● Lead Rockset’s developer initiatives ● Best friends with Ferro and my dog, Romeo in/nadinefarah/ @nfarah86
  • 4. Streaming data is on the rise 📈
  • 5. Kafka by the #’s 9% Increase in downloads of open-source Kakfa 40K Companies using Kafka Source: GH Insights
  • 6. Impact more … with Kafka
  • 7. 🤔 Real-time analytics and streaming data challenges
  • 8. Real-time analytics and streaming data challenges ● Handling bursty traffic ● Handling out-of-order events ● Schema changes ● Query speed and flexibility
  • 9. Challenge 1: bursty traffic Handle bursty traffic
  • 10. Less efficient ways of scaling bursty data traffic ● Manual reconfigurations ○ You create bottlenecks when you want to scale up because it’s not triggered automatically ● Tightly coupled compute and storage ○ Ingesting data and querying data affect each other → writing large amounts of data affects the reads and v.v. ‘ ○ Data has to be moved closer to memory to make use of the available resources
  • 11. ALT architecture: more efficient on scaling bursty data traffic ALT architecture review
  • 12. Challenge 2: handling out-of-order events Handle out-of-order events in real-time analytics
  • 13. Solve out-of-order events Different types of databases handles this differently mutable immutable
  • 14. Copy on write: less efficient in updating a field ● 1 way to solve handling out of order events for immutable databases is copy on write. Example: data warehouses ○ Any updates require both writing new data and rewriting already-written adjacent data in order to store everything correctly to disk in the right time order ○ requires a significant amount of processing power and time
  • 15. Problems with immutability for real-time analytics ● Usage and volume of streaming data is increasing ● Immutable databases can’t do updates, inserts, and deletes: ○ append-only ● Shift from batch to streaming: ○ Data apps are having tighter SLAs for query and data latency. So, apps need an efficient real-time database to handle the nuances and volume of streaming data
  • 16. Database mutability: make field changes in place
  • 17. Challenge 3: Schema changes ● Hard to update rigid ETL jobs on nested JSON objects with new fields being updated, added, or deleted ● Relational databases can take a performance hit if you query JSON data without conversion to SQL fields ● It’s hard to work with nested JSON objects right away because you have to build processes in order to flatten it
  • 18. Strong dynamic typing and indexing makes it easier to work with schema changes ● Database that can index nested JSON docs ● Database that supports strong dynamic typing so you can query fields with multiple data types ● Database that easily turns nested JSON to a SQL table at runtime without having to do prior transformations
  • 19. Challenge 4: running queries with low efficiency ● For user-facing analytics, where the access pattern can be unknown, defining all the indexes can be a challenge ● Columnar stores that use brute-force scans are slow and are not ideal when you are constantly querying data because you have to throw more compute resources in order to get faster speeds
  • 20. Auto-indexing reduces compute resources for querying data ● You’ll need a database that automatically creates indexes so you don’t have to create or manage it manually and when the steaming data changes ● In the presence of indexes, lesser compute is needed to serve the query
  • 21. SQL: best for complex analytics ● NoSQL databases: ○ Easy lookups ○ Have to learn a new language ○ No JOIN support at scale ○ Struggles on complex aggregations ● SQL databases: ○ Easy to JOIN (at scale), aggregate, and search
  • 22. Batch architecture: streaming data challenges Inefficient ingest Eg: Expensive MERGE operations for processing inserts, updates, deletes Query Latency > 1 min Data Latency >1 hour Slow, expensive user- facing analytics Time-Consuming ETL Jobs Eg: pre-aggregations Microsoft SQL Server Postgres Amazon S3 Google Cloud Azure Blob BigQuery Snowflake Redshift Databricks MongoDB DynamoDB Kafka Kinesis Google Pub/Sub Azure Event Hubs Oracle MySQL Inefficient queries Eg: Expensive full table scans, indexing tuning 🐌
  • 23. Real-time architecture: purpose built for data applications Fast, Efficient User- facing Analytics Efficient Upserts Mutable at field level to avoid MERGE operations Cloud-native Compute-storage separation to avoid over-provisioning Query Latency < 1 second Data Latency < 2 seconds Amazon S3 Google Cloud Azure Blob BigQuery Snowflake Redshift Databricks MongoDB DynamoDB Kafka Kinesis Google Pub/Sub Azure Event Hubs Microsoft SQL Server Postgres Oracle MySQL Fast Queries Converged Index avoids SCAN operations Ingest Rollups Transform and pre-aggregate to reduce storage 10-100x
  • 24.
  • 25. Rockset deep dive & demo 🛠
  • 26. Real-time analytics at your fingertips ● Ability to handle bursty traffic ● Out-of-order data ● Strong dynamic schema ● No ETL jobs ● Serverless architecture
  • 27. Rockset is the real-time analytics platform built for the cloud. Rockset enables sub-second queries on real-time data. Build user-facing analytics with surprising efficiency.
  • 28. 28 28
  • 29.
  • 32. Real-time Customer 360 with Recommendations Rockset workshop
  • 33. 33 33 Get started with $300 in free credits https://console.rockset.com/create
  • 34. Q&A
  • 35. rockset.com/docs Booth S3 at The Austin Convention Center …. Or come find me and let’s chat about Kafka and real-time analytics over a tasty beverage- on me!