SlideShare a Scribd company logo
1 of 55
1
Hadoop Made Fast
Why Virtual Reality
Needed Stream Processing
to Survive
Greg Fodor, Co-founder, AltspaceVR
Gehrig Kunz, Technical Product Marketing, Confluent
2Confidential
Streaming in Action Series
You are here!
August 16th
Pandora Plays Nicely
Everywhere with Real-Time
Data Pipelines
Watch on Confluent.io
3
A look at today
A Streaming Platform is Hadoop Made Fast
● Hadoop was a good idea, it has its flaws
● How a streaming platform can look like Hadoop
● Companies are using a streaming platform
Stream Processing with Kafka for Virtual Reality
● An example of Kafka with VR
● Challenges VR has that require stream processing
● Examples where it helps
● Why stream processing with Kafka makes sense
4
Interest in Hadoop
5
Good idea, Hadoop is
● Get all the datas
● Perform analysis, explore data
● Perfect for understanding your business
6
But today is different
Star Wars is good, again.
And the apps we build require
constant data.
7
Bringing it to today
Get all the datas
Process data as it arrives
Power your business
git commit -m “Today you want to”
With Hadoop you wanted to
Get all the datas
Explore historical data
Understanding your business
8
What this looks like in practice
9
What this looks like in practice
Ingest a stream
of data.
Process and act on it as it arrives.
Power your business.
1
2
3
10
Kafka’s Streams API
● Kafka’s Streams API: A lightweight library for
performing stream processing
• Aggregations, Sessions, Windowing, Joins,
et al
● Build apps, not clusters
Client
Server
Runs outside
Kafka brokers!
11
Build scalable, fault-tolerant apps
Client
Server
12
Build today’s apps quicker
13
Kafka, stream processing for developers
Deploy apps – not clusters – that are:
● Real-time
● Elastic
● Fault-tolerant
● Teams can be more efficient
● Provide a better, new experience to users
14
Kafka, stream processing for developers
Deploy apps – not clusters – that are:
● Real-time
● Elastic
● Fault-tolerant
● Teams can be more efficient
● Provide a better, new experience to users
Virtual reality, anyone?
Psst, Greg.
15
The best shared VR platform
https://altvr.com/kafka
16
Use cases
https://altvr.com/kafka
17
VR Mirroring + Capture
https://altvr.com/kafka
18
“Real” Reggie
“VIP” Room
https://altvr.com/kafka
19
“Real” Reggie
“VIP” Room
“Mirrored” Reggies
Room 1 Room 2 Room 3 Room 4
https://altvr.com/kafka
20
Use cases for capture/replay
21
22https://altvr.com/kafka
23
24
25
Kafka’s Streams API
26
Kafka’s Streams API
Stream processing: it’s not just for analytics!
27
Kafka’s Streams API
• Independent capacity
• Arbitrary transformations
• Flexible and simple ops
28
Kafka’s Streams API
• Build cohesive, re-usable topologies
• Design for extensibility
• Apply patterns + avoid pitfalls
29
Job #1: Game Streams
30
Game Streams
Create a logical stream across Photon servers
• Real-time netdata transformation
• Routing between Photon servers
• Stateful, due to Photon protocol
31
“Mirror User A to room R2”
32
6 months later: “Capture User A”
33
Job #2: Playbacks
34
Playbacks
Replays captured data
• Load capture data (Kafka/S3)
• Timed emission
• Checkpointing, looping, filtering
35
“Playback capture to room R2”
36
“Mirror User A to room R2”
37
Kafka’s Streams API
• Build cohesive, re-usable topologies
• Design for extendibility
• Apply patterns + avoid pitfalls
GameStreams job allows:
• User capture/mirroring
• Interactable object capture/mirroring
• VoIP, avatar transforms, VR emojis payloads
• Entire room capture/mirroring
38
Kafka’s Streams API
• Build cohesive, re-usable topologies
• Design for extendibility
• Apply patterns + avoid pitfalls
GameStreams job allows:
• Design names, record types generically
• Build in mechanisms for parameterization + control
• Use avro and schema registry
• Job code is not throwaway! Build accordingly
39
Patterns + Pitfalls
40
Patterns + Pitfalls
41
Config KTables
• Drive job behavior via OLTP state
• In our case, users interact with Rails API to control mirroring + captures
42
KIP-99 Global Tables
https://cwiki.apache.org/confluence/display/KAFKA/KIP-
99%3A+Add+Global+Tables+to+Kafka+Streams
43
Prefer declarative OLTP table state
Database tables state should describe “how the world should be” not “steps to perform”
Job’s duty is to make the world look like the one desired
“A stream should exist from playback A to room B” not
“Right now, create a stream from playback A to room B”
Straightforward to test + verify: does desired world match up with reality?
Easier to reason about in failure cases
44
Keep consistent topic naming
Kafka Stream jobs involve a lot of source + intermediate topics
We prefer:
[<data source>|<job application id>]-<avro record type>[_<specifier>]-<partition key>
Ex:
oltp_db-user-user_id
job_playbacks-photon_instantiations-game_stream_id
45
RocksDB range scans
Did you know that RocksDB stores keys lexicographically sorted?
Kafka Streams exposes range() queries on persistent state stores!
46
Example: Scheduled tasks
Keys in “tasks” topic are a composite key of <timestamp, id>
Allows range queries for upcoming tasks (local to partition, obviously)
47
Dark staging jobs
Eventually you will need to deploy a staging version of a job into prod for integration testing
while known-good version is serving users.
Ensure you bake in the necessary degree of freedom! (Duplicate topics, application ids, etc.)
48
Patterns + Pitfalls
49
KTable rematerialization
Cold nodes read *entire* KTable transaction log for each KTable on startup. (Of course!)
Not something you’re likely to experience except during a failure.
You could be in for a surprise!
Easy to force a rematerialization to test: stop job, remove state dir from job work directory,
restart.
(But you should probably check your xlog topic sizes first)
In our case, AWS EBS I/O throttling caused us to be unable to bring a fresh node up!
Ensure topic xlog doesn’t grow unbounded:
- Ensure you delete dead keys explicitly and have proper compaction policies set on xlog topics
- Or, use set up topic rentention policies if data can be purged after time duration
50
Reset switches + flushing
Sometimes KTables topics or entries need to be forcibly rematerialized/flushed/read from
beginning.
For example: KTable topic exists before first job run. Or, something broke.
Handy to build in mechanisms to:
- Reset consumer offsets to zero
- For OLTP/Connect-backed KTable data, force a no-op update to database record(s) to flush
- In Rails, ActiveRecord#flush
May be less necessary in newer versions of Kafka Streams (ex due to KAFKA-4114 + bugfixes)
Handy topic consumer group offset resetter routine, pass in job Properties:
https://gist.github.com/gfodor/a4f5e4721e959766e75e4c901bf42890
51
Streaming for VR
Kafka Streams has been amazing for us.
Shown so far, we have jobs for:
• VR Mirror/Capture/Playback
• Presence
• Scheduled tasks
We are also using it for:
• Real time game telemetry ET
• VR Capture archival to S3
• Real-time push messaging
52
From batch to real-time
● Provides similar concepts to Hadoop
● Streaming platform is right for today’s applications
○ Distributed storage, Stream processing, Publish/Subscribe model
53
A streaming platform can be ‘Hadoop Made Fast’
● Use Kafka as a ‘source of truth’
● Process data as it arrives
● Power real-time experiences (like VR)
54Confidential
Streaming in Action Series
You are here
August 16th
Pandora Plays Nicely
Everywhere with Real-Time
Data Pipelines
Watch on Confluent.io
55Confidential
Download Confluent Open Source
Join the Confluent Slack community
Check out Kafka Summit!
August 28th in San Francisco
Thanks!

More Related Content

What's hot

Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Kai Wähner
 

What's hot (20)

Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matter
 
Kafka Summit NYC 2017 - Venice: A Distributed Database on top of Kafka
Kafka Summit NYC 2017 - Venice: A Distributed Database on top of KafkaKafka Summit NYC 2017 - Venice: A Distributed Database on top of Kafka
Kafka Summit NYC 2017 - Venice: A Distributed Database on top of Kafka
 
Simplify Governance of Streaming Data
Simplify Governance of Streaming Data Simplify Governance of Streaming Data
Simplify Governance of Streaming Data
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDB
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
 
Etl is Dead; Long Live Streams
Etl is Dead; Long Live StreamsEtl is Dead; Long Live Streams
Etl is Dead; Long Live Streams
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?
 
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
 
Kafka Summit NYC 2017 - Stream it Together: 3 Realities of Modern Programming
Kafka Summit NYC 2017 - Stream it Together: 3 Realities of Modern ProgrammingKafka Summit NYC 2017 - Stream it Together: 3 Realities of Modern Programming
Kafka Summit NYC 2017 - Stream it Together: 3 Realities of Modern Programming
 
Introduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterIntroduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matter
 
Kafka Summit SF 2017 - Database Streaming at WePay
Kafka Summit SF 2017 - Database Streaming at WePayKafka Summit SF 2017 - Database Streaming at WePay
Kafka Summit SF 2017 - Database Streaming at WePay
 
Stream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsStream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka Streams
 
How Yelp Leapt to Microservices with More than a Message Queue
How Yelp Leapt to Microservices with More than a Message QueueHow Yelp Leapt to Microservices with More than a Message Queue
How Yelp Leapt to Microservices with More than a Message Queue
 
Putting the Micro into Microservices with Stateful Stream Processing
Putting the Micro into Microservices with Stateful Stream ProcessingPutting the Micro into Microservices with Stateful Stream Processing
Putting the Micro into Microservices with Stateful Stream Processing
 
KSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache KafkaKSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache Kafka
 
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analytics
 
Real-world Streaming Architectures
Real-world Streaming ArchitecturesReal-world Streaming Architectures
Real-world Streaming Architectures
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NET
 

Similar to Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive

28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines
Timothy Spann
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
confluent
 

Similar to Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive (20)

Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
 
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VR
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VRKafka Summit NYC 2017 Hanging Out with Your Past Self in VR
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VR
 
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
 
Building a derived data store using Kafka
Building a derived data store using KafkaBuilding a derived data store using Kafka
Building a derived data store using Kafka
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
 
Open Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOCOpen Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOC
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream Processor
 
28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines
 
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure
 
OpenStack Preso: DevOps on Hybrid Infrastructure
OpenStack Preso: DevOps on Hybrid InfrastructureOpenStack Preso: DevOps on Hybrid Infrastructure
OpenStack Preso: DevOps on Hybrid Infrastructure
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
Building scalable data with kafka and spark
Building scalable data with kafka and sparkBuilding scalable data with kafka and spark
Building scalable data with kafka and spark
 
Real time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseReal time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and Couchbase
 
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.jsTensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
 
Kafka Explainaton
Kafka ExplainatonKafka Explainaton
Kafka Explainaton
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache Kafka
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
 
Cloud: From Unmanned Data Center to Algorithmic Economy using Openstack
Cloud: From Unmanned Data Center to Algorithmic Economy using OpenstackCloud: From Unmanned Data Center to Algorithmic Economy using Openstack
Cloud: From Unmanned Data Center to Algorithmic Economy using Openstack
 
Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...
Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...
Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...
 
Jug - ecosystem
Jug -  ecosystemJug -  ecosystem
Jug - ecosystem
 

More from confluent

More from confluent (20)

Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 

Recently uploaded

AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
Alluxio, Inc.
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
Max Lee
 

Recently uploaded (20)

Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdfMicrosoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
 
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024
 
5 Reasons Driving Warehouse Management Systems Demand
5 Reasons Driving Warehouse Management Systems Demand5 Reasons Driving Warehouse Management Systems Demand
5 Reasons Driving Warehouse Management Systems Demand
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 
AI Hackathon.pptx
AI                        Hackathon.pptxAI                        Hackathon.pptx
AI Hackathon.pptx
 
What need to be mastered as AI-Powered Java Developers
What need to be mastered as AI-Powered Java DevelopersWhat need to be mastered as AI-Powered Java Developers
What need to be mastered as AI-Powered Java Developers
 
The Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion ProductionThe Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion Production
 
SQL Injection Introduction and Prevention
SQL Injection Introduction and PreventionSQL Injection Introduction and Prevention
SQL Injection Introduction and Prevention
 
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
KLARNA -  Language Models and Knowledge Graphs: A Systems ApproachKLARNA -  Language Models and Knowledge Graphs: A Systems Approach
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
 
OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024
 
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
 
How to pick right visual testing tool.pdf
How to pick right visual testing tool.pdfHow to pick right visual testing tool.pdf
How to pick right visual testing tool.pdf
 
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdf
 
CompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdfCompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdf
 
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
 

Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive

  • 1. 1 Hadoop Made Fast Why Virtual Reality Needed Stream Processing to Survive Greg Fodor, Co-founder, AltspaceVR Gehrig Kunz, Technical Product Marketing, Confluent
  • 2. 2Confidential Streaming in Action Series You are here! August 16th Pandora Plays Nicely Everywhere with Real-Time Data Pipelines Watch on Confluent.io
  • 3. 3 A look at today A Streaming Platform is Hadoop Made Fast ● Hadoop was a good idea, it has its flaws ● How a streaming platform can look like Hadoop ● Companies are using a streaming platform Stream Processing with Kafka for Virtual Reality ● An example of Kafka with VR ● Challenges VR has that require stream processing ● Examples where it helps ● Why stream processing with Kafka makes sense
  • 5. 5 Good idea, Hadoop is ● Get all the datas ● Perform analysis, explore data ● Perfect for understanding your business
  • 6. 6 But today is different Star Wars is good, again. And the apps we build require constant data.
  • 7. 7 Bringing it to today Get all the datas Process data as it arrives Power your business git commit -m “Today you want to” With Hadoop you wanted to Get all the datas Explore historical data Understanding your business
  • 8. 8 What this looks like in practice
  • 9. 9 What this looks like in practice Ingest a stream of data. Process and act on it as it arrives. Power your business. 1 2 3
  • 10. 10 Kafka’s Streams API ● Kafka’s Streams API: A lightweight library for performing stream processing • Aggregations, Sessions, Windowing, Joins, et al ● Build apps, not clusters Client Server Runs outside Kafka brokers!
  • 11. 11 Build scalable, fault-tolerant apps Client Server
  • 13. 13 Kafka, stream processing for developers Deploy apps – not clusters – that are: ● Real-time ● Elastic ● Fault-tolerant ● Teams can be more efficient ● Provide a better, new experience to users
  • 14. 14 Kafka, stream processing for developers Deploy apps – not clusters – that are: ● Real-time ● Elastic ● Fault-tolerant ● Teams can be more efficient ● Provide a better, new experience to users Virtual reality, anyone? Psst, Greg.
  • 15. 15 The best shared VR platform https://altvr.com/kafka
  • 17. 17 VR Mirroring + Capture https://altvr.com/kafka
  • 19. 19 “Real” Reggie “VIP” Room “Mirrored” Reggies Room 1 Room 2 Room 3 Room 4 https://altvr.com/kafka
  • 20. 20 Use cases for capture/replay
  • 21. 21
  • 23. 23
  • 24. 24
  • 26. 26 Kafka’s Streams API Stream processing: it’s not just for analytics!
  • 27. 27 Kafka’s Streams API • Independent capacity • Arbitrary transformations • Flexible and simple ops
  • 28. 28 Kafka’s Streams API • Build cohesive, re-usable topologies • Design for extensibility • Apply patterns + avoid pitfalls
  • 29. 29 Job #1: Game Streams
  • 30. 30 Game Streams Create a logical stream across Photon servers • Real-time netdata transformation • Routing between Photon servers • Stateful, due to Photon protocol
  • 31. 31 “Mirror User A to room R2”
  • 32. 32 6 months later: “Capture User A”
  • 34. 34 Playbacks Replays captured data • Load capture data (Kafka/S3) • Timed emission • Checkpointing, looping, filtering
  • 36. 36 “Mirror User A to room R2”
  • 37. 37 Kafka’s Streams API • Build cohesive, re-usable topologies • Design for extendibility • Apply patterns + avoid pitfalls GameStreams job allows: • User capture/mirroring • Interactable object capture/mirroring • VoIP, avatar transforms, VR emojis payloads • Entire room capture/mirroring
  • 38. 38 Kafka’s Streams API • Build cohesive, re-usable topologies • Design for extendibility • Apply patterns + avoid pitfalls GameStreams job allows: • Design names, record types generically • Build in mechanisms for parameterization + control • Use avro and schema registry • Job code is not throwaway! Build accordingly
  • 41. 41 Config KTables • Drive job behavior via OLTP state • In our case, users interact with Rails API to control mirroring + captures
  • 43. 43 Prefer declarative OLTP table state Database tables state should describe “how the world should be” not “steps to perform” Job’s duty is to make the world look like the one desired “A stream should exist from playback A to room B” not “Right now, create a stream from playback A to room B” Straightforward to test + verify: does desired world match up with reality? Easier to reason about in failure cases
  • 44. 44 Keep consistent topic naming Kafka Stream jobs involve a lot of source + intermediate topics We prefer: [<data source>|<job application id>]-<avro record type>[_<specifier>]-<partition key> Ex: oltp_db-user-user_id job_playbacks-photon_instantiations-game_stream_id
  • 45. 45 RocksDB range scans Did you know that RocksDB stores keys lexicographically sorted? Kafka Streams exposes range() queries on persistent state stores!
  • 46. 46 Example: Scheduled tasks Keys in “tasks” topic are a composite key of <timestamp, id> Allows range queries for upcoming tasks (local to partition, obviously)
  • 47. 47 Dark staging jobs Eventually you will need to deploy a staging version of a job into prod for integration testing while known-good version is serving users. Ensure you bake in the necessary degree of freedom! (Duplicate topics, application ids, etc.)
  • 49. 49 KTable rematerialization Cold nodes read *entire* KTable transaction log for each KTable on startup. (Of course!) Not something you’re likely to experience except during a failure. You could be in for a surprise! Easy to force a rematerialization to test: stop job, remove state dir from job work directory, restart. (But you should probably check your xlog topic sizes first) In our case, AWS EBS I/O throttling caused us to be unable to bring a fresh node up! Ensure topic xlog doesn’t grow unbounded: - Ensure you delete dead keys explicitly and have proper compaction policies set on xlog topics - Or, use set up topic rentention policies if data can be purged after time duration
  • 50. 50 Reset switches + flushing Sometimes KTables topics or entries need to be forcibly rematerialized/flushed/read from beginning. For example: KTable topic exists before first job run. Or, something broke. Handy to build in mechanisms to: - Reset consumer offsets to zero - For OLTP/Connect-backed KTable data, force a no-op update to database record(s) to flush - In Rails, ActiveRecord#flush May be less necessary in newer versions of Kafka Streams (ex due to KAFKA-4114 + bugfixes) Handy topic consumer group offset resetter routine, pass in job Properties: https://gist.github.com/gfodor/a4f5e4721e959766e75e4c901bf42890
  • 51. 51 Streaming for VR Kafka Streams has been amazing for us. Shown so far, we have jobs for: • VR Mirror/Capture/Playback • Presence • Scheduled tasks We are also using it for: • Real time game telemetry ET • VR Capture archival to S3 • Real-time push messaging
  • 52. 52 From batch to real-time ● Provides similar concepts to Hadoop ● Streaming platform is right for today’s applications ○ Distributed storage, Stream processing, Publish/Subscribe model
  • 53. 53 A streaming platform can be ‘Hadoop Made Fast’ ● Use Kafka as a ‘source of truth’ ● Process data as it arrives ● Power real-time experiences (like VR)
  • 54. 54Confidential Streaming in Action Series You are here August 16th Pandora Plays Nicely Everywhere with Real-Time Data Pipelines Watch on Confluent.io
  • 55. 55Confidential Download Confluent Open Source Join the Confluent Slack community Check out Kafka Summit! August 28th in San Francisco Thanks!