SlideShare a Scribd company logo
Twitch Plays Pokémon:
Twitch’s Chat Architecture
John Rizzo
Sr Software Engineer
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
twitch-pokemon
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
About Me
www.twitch.tv
Twitch Introduction
www.twitch.tv
Twitch Introduction
www.twitch.tv
• Over 800k concurrent users
• Tens of BILLIONS of daily messages
• ~10 Servers
• 2 Engineers
• 19 Amp Energy drinks per day
Twitch Introduction
www.twitch.tv
Architecture Overview
www.twitch.tv
Video Edge API Edge Chat Edge
fChannel
Page
Video Backend Services Chat
Twitch Plays Pokémon Strikes!
www.twitch.tv
Public Reaction
Media outlets have described the proceedings of the game as
being "mesmerizing", "miraculous" and "beautiful chaos", with
one viewer comparing it to "watching a car crash in slow
motion” - Wikipedia
“Some users are hailing the development as part of internet
history…” – BBC
–TwitchChatTeam
www.twitch.tv
10:33pm, February 14: TPP hits 5k concurrent viewers
9:42am, February 15: TPP hits 20k
8:21pm, February 16: Time to take preemptive action
Twitch Plays Pokémon Strikes!
www.twitch.tv
We separate servers into several clusters
• Robust in the face of failure
• Can tune each cluster for its specific purpose
Chat Server Organization
www.twitch.tv
Main
Cluster
Event
Cluster
Group
Cluster
Test
Cluster
8:21pm, February 16: Move TPP onto the event cluster
Twitch Plays Pokémon Strikes!
www.twitch.tv
5:43pm, February 16: TPP hits 50k users, chat system
starts to show signs of stress (but…it’s on the event cluster!)
8:01am, February 17: Twitch chat engineers panic, rush to
the office
9:35am, February 17: Investigation begins
Twitch Plays Pokémon Strikes!
www.twitch.tv
Debugging principle: Start investigating upstream, move
downstream as necessary
Solving Twitch Plays Pokémon
www.twitch.tv
Edge Server: Sits at the “edge” of the public internet and Twitch’s
internal network
Chat Architecture Overview
www.twitch.tv
User Chat Edge
Public Internet Internal Twitch Network
Flash socket
Ingestion
Distribution
Any user:room:edge tuple is valid
Solving Twitch Plays Pokémon
www.twitch.tv
user9
user1:lirik
user2:degentp
user3:witwix
…
user4:witwix
user5:armorra
user6:degentp
…
user2:witwix
user7:cep21
user8:lirik
…
A Note on Instrumentation
www.twitch.tv
Chat Service
Remote statsd
collector
Graphite
cluster
UDP UDP
HTTP
Solving Twitch Plays Pokémon
So let’s take a look at our edge server logs…
Feb 18 06:54:04 tmi_edge: [clue] Timed out (after 5s) while writing request for:
privmsg
Feb 18 06:54:04 tmi_edge: [clue] Message successfully sent to #degentp
Feb 18 06:54:04 tmi_edge: [clue] Timed out (after 5s) while writing request for:
privmsg
Feb 18 06:54:04 tmi_edge: [clue] Timed out (after 5s) while writing request for:
privmsg
Feb 18 06:54:04 tmi_edge: [clue] Timed out (after 5s) while writing request for:
privmsg
Feb 18 06:54:04 tmi_edge: [clue] Timed out (after 5s) while writing request for:
privmsg
Feb 18 06:54:04 tmi_edge: [clue] Timed out (after 5s) while writing request for:
privmsg
Feb 18 06:54:05 tmi_edge: [clue] Timed out (after 5s) while writing request for:
privmsg
Feb 18 06:54:04 tmi_edge: [clue] Message successfully sent to #paragusrants
www.twitch.tv
Solving Twitch Plays Pokémon
Let’s dissect one of these log lines
Feb 18 06:54:04 chat_edge: [clue] Timed out (after 5s) while writing request for:
privmsg
www.twitch.tv
Solving Twitch Plays Pokémon
Let’s dissect one of these log lines
Feb 18 06:54:04 chat_edge: [clue] Timed out (after 5s) while writing request for:
privmsg
www.twitch.tv
Timestamp - when this action was recorded on the server
Solving Twitch Plays Pokémon
Let’s dissect one of these log lines
Feb 18 06:54:04 chat_edge: [clue] Timed out (after 5s) while writing request for:
privmsg
www.twitch.tv
Server - the name of the server that is generating this log file
Solving Twitch Plays Pokémon
Let’s dissect one of these log lines
Feb 18 06:54:04 chat_edge: [clue] Timed out (after 5s) while writing request for:
privmsg
www.twitch.tv
Remote service - the name of the service that edge is connecting
to
Solving Twitch Plays Pokémon
Let’s dissect one of these log lines
Feb 18 06:54:04 chat_edge: [clue] Timed out (after 5s) while writing request for:
privmsg
www.twitch.tv
Event detail
Why did the clue service take so long to respond? Also, what is
the clue service?
Ingestion
Message Ingestion
www.twitch.tv
User Chat Edge
Clue
Public Internet Internal Twitch Network
Distribution
HTTP
Solving Twitch Plays Pokémon
Let’s dissect one of these log lines
Feb 18 06:54:04 tmi_edge: [clue] Timed out (after 5s) while writing request for:
privmsg
www.twitch.tv
Clue server took longer than 5 seconds to process this message.
Why?
Solving Twitch Plays Pokémon
Clue logs…
Feb 18 06:54:04 chat_clue: WARNING:tornado.general:Connect error on fd 10:
ECONNREFUSED
Feb 18 06:54:04 chat_clue: WARNING:tornado.general:Connect error on fd 15:
ECONNREFUSED
Feb 18 06:54:05 chat_clue: WARNING:tornado.general:Connect error on fd 9:
ECONNREFUSED
Feb 18 06:54:05 chat_clue: WARNING:tornado.general:Connect error on fd 18:
ECONNREFUSED
www.twitch.tv
Not very useful…but we get some info. Clue’s connections are
being refused.
Which machine is clue failing to connect to? Why?
Let’s take a step back…these errors are happening on both main
AND the event clusters. Why?
Are there common services or dependencies?
• Databases (store chat color, badges, bans, etc)
• Cache (to speed up database access)
• Services (authentication, user data, etc)
Investigating Clue
www.twitch.tv
Investigating Clue
www.twitch.tv
Chat:Clue Rails Video
PGBouncer
Postgre
s
Redis
Can rule out databases and services – rest of site is functional
Let’s look closer at our cache – this one is specific to chat servers
Investigating Clue
www.twitch.tv
Redis: where do we start investigating?
Strategy: start high-level then drill down
Investigating Redis
www.twitch.tv
Redis server isn’t being stressed very hard
Investigating Redis
www.twitch.tv
Let’s look at how Clue is using Redis…
Clue Configuration
www.twitch.tv
DB_SERVER=db.internal.twitch.tv
DB_NAME=twitch_db
DB_TIMEOUT=1s
CACHE_SERVER=localhost
CACHE_PORT=2000
CACHE_MAX_CONNS=10
CACHE_TIMEOUT=100ms
…
…
Clue Configuration
www.twitch.tv
DB_SERVER=db.internal.twitch.tv
DB_NAME=twitch_db
DB_TIMEOUT=1s
CACHE_SERVER=localhost
CACHE_PORT=2000
CACHE_MAX_CONNS=10
CACHE_TIMEOUT=100ms
…
…
Looks like our whole cache contains only one local instance?
Redis is single-process and single-threaded!
Redis Configuration
www.twitch.tv
$ ps aux | grep redis
13909 0.0 0.0 2434840 796 s000 S+ grep redis
Redis doesn’t seem to be running locally - what listens on port
2000?
Redis Configuration
www.twitch.tv
$ netstat -lp | grep 2000
tcp 0 0 localhost:2000 *:* LISTEN 2109/haproxy
HAProxy!
HAProxy
www.twitch.tv
• Limits for #connections, requests, etc
• Robust instrumentation
Attribution:
Shareholic.com
Redis Configuration
www.twitch.tv
Are we load balancing across many Redis instances?
DB_SERVER=db.internal.twitch.tv
DB_NAME=twitch_db
DB_TIMEOUT=1s
CACHE_SERVER=localhost
CACHE_PORT=2000
CACHE_MAX_CONNS=10
CACHE_TIMEOUT=100ms
…
…
Redis Configurtaion
www.twitch.tv
Are we load balancing across many Redis instances?
class twitch::haproxy::listeners::chat_redis (
$haproxy_instance = ‘chat-backend',
$proxy_name = 'chat-redis',
$servers = [
'redis2.internal.twitch.tv:6379',
],
...
...
...
Redis Configurtaion
www.twitch.tv
Are we load balancing across many Redis instances?
class twitch::haproxy::listeners::chat_redis (
$haproxy_instance = ’chat-backend',
$proxy_name = 'chat-redis',
$servers = [
'redis2.internal.twitch.tv:6379',
],
...
...
...
We are not load balancing across several instances
Investigating Redis
$ top
Tasks: 281 total, 1 running, 311 sleeping, 0 stopped, 0 zombie
Cpu(s): 10.3%us, 10.5%sy, 0.0%ni, 95.4%id, 0.0%wa, 0.0%hi,
Mem: 24682328k total,6962336k used, 17719992k free, 13644k buffers
Swap: 7999484k total, 0k used, 7999484k free, 4151420k cached
PID PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26109 20 0 76048 128m 1340 S 99 0.2 6133:28 redis-server
3342 20 0 9040 1320 844 R 2 0.0 0:00.01 top
1 20 0 90412 3920 2576 S 0 0.0 103:45.82 init
2 20 0 0 0 0 S 0 0.0 0:05.17 kthreadd
Let’s take a look at the Redis box…
www.twitch.tv
Investigating Redis
Redis is unsprisingly maxing out the CPU
$ top
Tasks: 281 total, 1 running, 311 sleeping, 0 stopped, 0 zombie
Cpu(s): 10.3%us, 10.5%sy, 0.0%ni, 95.4%id, 0.0%wa, 0.0%hi
Mem: 24682328k total,6962336k used, 17719992k free, 13644k buffers
Swap: 7999484k total, 0k used, 7999484k free, 4151420k cached
PID PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26109 20 0 76048 128m 1340 S 99 0.2 6133:28 redis-server
3342 20 0 9040 1320 844 R 2 0.0 0:00.01 top
1 20 0 90412 3920 2576 S 0 0.0 103:45.82 init
2 20 0 0 0 0 S 0 0.0 0:05.17 kthreadd
www.twitch.tv
Redis Optimization Options?
• Optimize Redis at the application-level
• Distribute Redis
www.twitch.tv
Clue Logic
2:00pm: Smarter caching?
Redis Optimization Options?
www.twitch.tv
Redis
Clue Logic
2:00pm: Smarter caching?
Redis Optimization Options?
www.twitch.tv
Redis
Clue
local cache
• 2:23pm: There are some challenges (cache coherence,
implementation difficulty)
• Is there any low-hanging fruit?
• Yes! Rate limiting code!
• 2:33pm: Change has little effect...
Redis Optimization Options?
www.twitch.tv
2:41pm: Distribute Redis?
Redis Optimization Options?
www.twitch.tv
Clue Logic
Redis
Clue
Local Cache
Redis
Redis
Redis
2:56pm: Yes! Sharding by key!
Redis Optimization Options?
www.twitch.tv
set “effinga.chat_color”
get “degentp.chat_color”
get “effinga.chat_color”
set “degentp.chat_color”
Redis
Redis
Redis
Redis
Distribution
Function
3:03pm: How do we implement this?
• Puppet configuration changes
• HAProxy changes
• Redis deploy changes (copy/paste!)
• Do we need to modify any code?
• Can we let HAProxy load balance for us?
• No – LB needs to be aware of Redis protocol
• Changes required at the application level
Distributing Redis
www.twitch.tv
3:52pm: Code surveyed – Two problematic patterns
Distributing Redis
www.twitch.tv
Problematic pattern #1:
Distributing Redis
www.twitch.tv
Distributing Redis
www.twitch.tv
Problematic pattern #1 solution:
Problematic pattern #2:
Distributing Redis
www.twitch.tv
What if we need these keys in different contexts?
Distributing Redis
www.twitch.tv
Problematic pattern #2 solution:
7:21pm: Test (fingers crossed)
8:11pm: Deploy cache changes
9:29pm: Deploy chat code changes
Distributing Redis
www.twitch.tv
10:10pm: Better, but still bad…
Solving Twitch Plays Pokémon
www.twitch.tv
Solving Twitch Plays Pokémon
Let’s use some tools to dig deeper…
$ redis-cli -h redis2.internal.twitch.tv -p 6379 INFO
# Clients
connected_clients:3021
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
$ redis-cli -h redis2.internal.twitch.tv -p 6379 CLIENT LIST |
grep idle | wc -l
2311
www.twitch.tv
Solving Twitch Plays Pokémon
Lots of bad connections
$ redis-cli -h redis2.internal.twitch.tv -p 6379 INFO
# Clients
connected_clients:3021
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
$ redis-cli -h redis2.internal.twitch.tv -p 6379 CLIENT LIST |
grep idle | wc -l
2311
www.twitch.tv
Solving Twitch Plays Pokémon
Let’s grab the pid of one Redis instance
$ sudo svstat /etc/service/redis_*
/etc/service/redis_6379: up (pid 26109) 3543 seconds
/etc/service/redis_6380: up (pid 26111) 3543 seconds
/etc/service/redis_6381: up (pid 26113) 3543 seconds
/etc/service/redis_6382: up (pid 26114) 3544 seconds
www.twitch.tv
Solving Twitch Plays Pokémon
$ sudo svstat /etc/service/redis_*
/etc/service/redis_6379: up (pid 26109) 3543 seconds
/etc/service/redis_6380: up (pid 26111) 3543 seconds
/etc/service/redis_6381: up (pid 26113) 3543 seconds
/etc/service/redis_6382: up (pid 26114) 3544 seconds
www.twitch.tv
Let’s grab the pid of one Redis instance
Solving Twitch Plays Pokémon
$ sudo lsof –p 26109 | grep chat | cut -d ' ' -f 32 | cut -d ':' -f
2 | sort | uniq –c
2012 6421->chat-testing.internal.twitch.tv
121 6421->chat1.internal.twitch.tv
101 6421->chat3.internal.twitch.tv
...
www.twitch.tv
Solving Twitch Plays Pokémon
$ sudo lsof –p 26109 | grep tmi | cut -d ' ' -f 32 | cut -d ':' -f
2 | sort | uniq –c
2012 6421->chat-testing.internal.twitch.tv
121 6421->chat1.internal.twitch.tv
101 6421->chat3.internal.twitch.tv
...
www.twitch.tv
Solving Twitch Plays Pokémon
Lesson learned: Testing is bad
www.twitch.tv
Solving Twitch Plays Pokémon
11:12pm: Shut off testing cluster completely
www.twitch.tv
Users can connect to chat again!
Users can send messages again!
Chat team leaves the office at 11:31pm
Twitch Plays Pokémon is
Solved(?)
www.twitch.tv
Complaints that users don’t see the same messages
A New Bug Arises
www.twitch.tv
Message Distribution
www.twitch.tv
User Chat Edge
Public Internet Internal Twitch Network
Ingestion
Distribution
Message Distribution
www.twitch.tv
User Chat Edge
Public Internet Internal Twitch Network
Clue
Exchange
Message Distribution
www.twitch.tv
Clue Exchange
Exchange
Exchange
HAProxy
Edge
Edge
Edge
Edge
Edge
Edge
• Edge/Clue instrumentation show no errors
• Exchange isn’t even instrumented!
• Let’s fix that…
Solving the Distribution Problem
www.twitch.tv
Solving the Distribution Problem
Let’s look at our new exchange logs
Feb 19 14:11:06 exchange: [exchange] i/o timeout
Feb 19 14:11:06 exchange: [exchange] Exchange success
Feb 19 14:11:06 exchange: [exchange] i/o timeout
Feb 19 14:11:06 exchange: [exchange] i/o timeout
Feb 19 14:11:06 exchange: [exchange] i/o timeout
Feb 19 14:11:06 exchange: [exchange] i/o timeout
Feb 19 14:11:06 exchange: [exchange] i/o timeout
Feb 19 14:11:06 exchange: [exchange] i/o timeout
Feb 19 14:11:06 exchange: [exchange] Exchange success
www.twitch.tv
Ideas?
Solving the Distribution Problem
• These are extremely short and simple requests, but there are
many of them
• We aren’t using HTTP keepalives
www.twitch.tv
Solving the Distribution Problem
Go makes this extremely simple
www.twitch.tv
Lessons Learned
• Better logs/instrumentation to make debugging easier
• Generate fake traffic to force us to handle more load than we
need
• Make sure we use and configure our infrastructure wisely!
www.twitch.tv
What have we done since?
• We now use AWS servers exclusively
• Better Redis infrastructure
• Python -> Go
• Lots of other big and small changes to support new products
and better QoS
www.twitch.tv
Thank you.
Questions?
www.twitch.tv
Video Architecture
www.twitch.tv
PoP Origin
PoP
PoP
Ingest Distribution
Broadcaster
Viewers
Replication
Hierarchy
RTMP Protocol
www.twitch.tv
Attribution:
MMick66 - Wikipedia
HLS Protocol
www.twitch.tv
Video Ingest
www.twitch.tv
Streaming
Encoder
Internet
PoP
Ingest Proxy
Ingest
Video Ingest
www.twitch.tv
Ingest
Auth Service
DB Transcode
Queue
Video Ingest
www.twitch.tv
Transcoder
Transcode/Transmux
Worker
gotranscoder
HLS Origin
Queue
RTMP Data
Video API
Replication
Hierarchy
VOD
Video Distribution
www.twitch.tv
PoP
Video Edge
Protected Replication
HLS Proxy
HLS Cache HLS Cache
Find API
Ingest Proxy
Upstream PoP
Protected Replication
Video Distribution
www.twitch.tv
PoP Replication Hierarchy
PoP
Edge
PR
PoP
Edge
PR
PoP
Edge
PR
PoP
Edge
PR
PoP
Edge
PR
Transcoding
Tier 1 Cache
Thank you.
Questions?
www.twitch.tv
Watch the video with slide synchronization on
InfoQ.com!
https://www.infoq.com/presentations/twitch-
pokemon

More Related Content

What's hot

An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
Amir Sedighi
 
Istio presentation jhug
Istio presentation jhugIstio presentation jhug
Istio presentation jhug
Georgios Andrianakis
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
confluent
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
Jose De La Rosa
 
Redis Reliability, Performance & Innovation
Redis Reliability, Performance & InnovationRedis Reliability, Performance & Innovation
Redis Reliability, Performance & Innovation
Redis Labs
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
Martin Podval
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
Cloudera, Inc.
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013Jun Rao
 
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's ScalePinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Seunghyun Lee
 
Architecture Sustaining LINE Sticker services
Architecture Sustaining LINE Sticker servicesArchitecture Sustaining LINE Sticker services
Architecture Sustaining LINE Sticker services
LINE Corporation
 
RDMA programming design and case studies – for better performance distributed...
RDMA programming design and case studies – for better performance distributed...RDMA programming design and case studies – for better performance distributed...
RDMA programming design and case studies – for better performance distributed...
NTT Software Innovation Center
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
Chhavi Parasher
 
Accelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux KernelAccelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux Kernel
Thomas Graf
 
Nick Fisk - low latency Ceph
Nick Fisk - low latency CephNick Fisk - low latency Ceph
Nick Fisk - low latency Ceph
ShapeBlue
 
Spark Tips & Tricks
Spark Tips & TricksSpark Tips & Tricks
Spark Tips & Tricks
Jason Hubbard
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ Uber
Xiang Fu
 
Consumer offset management in Kafka
Consumer offset management in KafkaConsumer offset management in Kafka
Consumer offset management in Kafka
Joel Koshy
 
Disaster Recovery and High Availability with Kafka, SRM and MM2
Disaster Recovery and High Availability with Kafka, SRM and MM2Disaster Recovery and High Availability with Kafka, SRM and MM2
Disaster Recovery and High Availability with Kafka, SRM and MM2
Abdelkrim Hadjidj
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
David Groozman
 

What's hot (20)

An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Istio presentation jhug
Istio presentation jhugIstio presentation jhug
Istio presentation jhug
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
 
Redis Reliability, Performance & Innovation
Redis Reliability, Performance & InnovationRedis Reliability, Performance & Innovation
Redis Reliability, Performance & Innovation
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
 
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's ScalePinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
 
Architecture Sustaining LINE Sticker services
Architecture Sustaining LINE Sticker servicesArchitecture Sustaining LINE Sticker services
Architecture Sustaining LINE Sticker services
 
RDMA programming design and case studies – for better performance distributed...
RDMA programming design and case studies – for better performance distributed...RDMA programming design and case studies – for better performance distributed...
RDMA programming design and case studies – for better performance distributed...
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Accelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux KernelAccelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux Kernel
 
HBase Low Latency
HBase Low LatencyHBase Low Latency
HBase Low Latency
 
Nick Fisk - low latency Ceph
Nick Fisk - low latency CephNick Fisk - low latency Ceph
Nick Fisk - low latency Ceph
 
Spark Tips & Tricks
Spark Tips & TricksSpark Tips & Tricks
Spark Tips & Tricks
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ Uber
 
Consumer offset management in Kafka
Consumer offset management in KafkaConsumer offset management in Kafka
Consumer offset management in Kafka
 
Disaster Recovery and High Availability with Kafka, SRM and MM2
Disaster Recovery and High Availability with Kafka, SRM and MM2Disaster Recovery and High Availability with Kafka, SRM and MM2
Disaster Recovery and High Availability with Kafka, SRM and MM2
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 

Viewers also liked

twitch tv presentation
twitch tv presentationtwitch tv presentation
twitch tv presentation
Galen Gong
 
State of the Video Game Industry
State of the Video Game IndustryState of the Video Game Industry
State of the Video Game Industry
Margaret Wallace
 
Twitch - Video Walkthrough Feature
Twitch - Video Walkthrough FeatureTwitch - Video Walkthrough Feature
Twitch - Video Walkthrough Feature
Carolyn Jao
 
大规模、高并发实时通信系统的挑战和思路
大规模、高并发实时通信系统的挑战和思路大规模、高并发实时通信系统的挑战和思路
大规模、高并发实时通信系统的挑战和思路
Junwen Feng
 
The Livestream Way - Building a Culture of Sales Excellence & Accountability
The Livestream Way - Building a Culture of Sales Excellence & AccountabilityThe Livestream Way - Building a Culture of Sales Excellence & Accountability
The Livestream Way - Building a Culture of Sales Excellence & Accountability
Sam Jacobs
 
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Amazon Web Services
 
实时全互动直播 毫秒延迟&多人连麦
实时全互动直播 毫秒延迟&多人连麦实时全互动直播 毫秒延迟&多人连麦
实时全互动直播 毫秒延迟&多人连麦
Hardway Hou
 
Twitch Plays Pokemon: Collaborative Learning Opportunities with Video Game S...
Twitch Plays Pokemon:  Collaborative Learning Opportunities with Video Game S...Twitch Plays Pokemon:  Collaborative Learning Opportunities with Video Game S...
Twitch Plays Pokemon: Collaborative Learning Opportunities with Video Game S...
Eric Sembrat
 
秒秒夺金:直播互动的最佳实践
秒秒夺金:直播互动的最佳实践秒秒夺金:直播互动的最佳实践
秒秒夺金:直播互动的最佳实践
Junwen Feng
 
Twitch - Project Presentation
Twitch - Project PresentationTwitch - Project Presentation
Twitch - Project Presentation
Daise Rocha
 
eSports: Quite Possibly the Next Big Thing in Media/Entertainment
eSports: Quite Possibly the Next Big Thing in Media/EntertainmenteSports: Quite Possibly the Next Big Thing in Media/Entertainment
eSports: Quite Possibly the Next Big Thing in Media/Entertainment
Colin Sebastian
 
Netflix - Freedom & Responsibility
Netflix - Freedom & ResponsibilityNetflix - Freedom & Responsibility
Netflix - Freedom & ResponsibilityChris Ellis
 
Twitch.tv presentation 2015 BS
Twitch.tv presentation 2015 BSTwitch.tv presentation 2015 BS
Twitch.tv presentation 2015 BS
fastjack25
 
Livehouse.in 直播影音趨勢
Livehouse.in 直播影音趨勢Livehouse.in 直播影音趨勢
Livehouse.in 直播影音趨勢
Kimmy Chen
 
Over the Top Content Delivery: State of the Art and Challenges Ahead
Over the Top Content Delivery: State of the Art and Challenges AheadOver the Top Content Delivery: State of the Art and Challenges Ahead
Over the Top Content Delivery: State of the Art and Challenges AheadAlpen-Adria-Universität
 
網路直播工作流程分享
網路直播工作流程分享網路直播工作流程分享
網路直播工作流程分享
Gaspar Hsieh
 
直播效益與趨勢
直播效益與趨勢直播效益與趨勢
直播效益與趨勢
Kimmy Chen
 
Facebook chat architecture
Facebook chat architectureFacebook chat architecture
Facebook chat architecture
Udaya Kiran
 
중국웨이보-2016년라이브방송통찰보고서
중국웨이보-2016년라이브방송통찰보고서중국웨이보-2016년라이브방송통찰보고서
중국웨이보-2016년라이브방송통찰보고서
Young Jong Sihn
 

Viewers also liked (20)

Twitch - an intro
Twitch - an introTwitch - an intro
Twitch - an intro
 
twitch tv presentation
twitch tv presentationtwitch tv presentation
twitch tv presentation
 
State of the Video Game Industry
State of the Video Game IndustryState of the Video Game Industry
State of the Video Game Industry
 
Twitch - Video Walkthrough Feature
Twitch - Video Walkthrough FeatureTwitch - Video Walkthrough Feature
Twitch - Video Walkthrough Feature
 
大规模、高并发实时通信系统的挑战和思路
大规模、高并发实时通信系统的挑战和思路大规模、高并发实时通信系统的挑战和思路
大规模、高并发实时通信系统的挑战和思路
 
The Livestream Way - Building a Culture of Sales Excellence & Accountability
The Livestream Way - Building a Culture of Sales Excellence & AccountabilityThe Livestream Way - Building a Culture of Sales Excellence & Accountability
The Livestream Way - Building a Culture of Sales Excellence & Accountability
 
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
 
实时全互动直播 毫秒延迟&多人连麦
实时全互动直播 毫秒延迟&多人连麦实时全互动直播 毫秒延迟&多人连麦
实时全互动直播 毫秒延迟&多人连麦
 
Twitch Plays Pokemon: Collaborative Learning Opportunities with Video Game S...
Twitch Plays Pokemon:  Collaborative Learning Opportunities with Video Game S...Twitch Plays Pokemon:  Collaborative Learning Opportunities with Video Game S...
Twitch Plays Pokemon: Collaborative Learning Opportunities with Video Game S...
 
秒秒夺金:直播互动的最佳实践
秒秒夺金:直播互动的最佳实践秒秒夺金:直播互动的最佳实践
秒秒夺金:直播互动的最佳实践
 
Twitch - Project Presentation
Twitch - Project PresentationTwitch - Project Presentation
Twitch - Project Presentation
 
eSports: Quite Possibly the Next Big Thing in Media/Entertainment
eSports: Quite Possibly the Next Big Thing in Media/EntertainmenteSports: Quite Possibly the Next Big Thing in Media/Entertainment
eSports: Quite Possibly the Next Big Thing in Media/Entertainment
 
Netflix - Freedom & Responsibility
Netflix - Freedom & ResponsibilityNetflix - Freedom & Responsibility
Netflix - Freedom & Responsibility
 
Twitch.tv presentation 2015 BS
Twitch.tv presentation 2015 BSTwitch.tv presentation 2015 BS
Twitch.tv presentation 2015 BS
 
Livehouse.in 直播影音趨勢
Livehouse.in 直播影音趨勢Livehouse.in 直播影音趨勢
Livehouse.in 直播影音趨勢
 
Over the Top Content Delivery: State of the Art and Challenges Ahead
Over the Top Content Delivery: State of the Art and Challenges AheadOver the Top Content Delivery: State of the Art and Challenges Ahead
Over the Top Content Delivery: State of the Art and Challenges Ahead
 
網路直播工作流程分享
網路直播工作流程分享網路直播工作流程分享
網路直播工作流程分享
 
直播效益與趨勢
直播效益與趨勢直播效益與趨勢
直播效益與趨勢
 
Facebook chat architecture
Facebook chat architectureFacebook chat architecture
Facebook chat architecture
 
중국웨이보-2016년라이브방송통찰보고서
중국웨이보-2016년라이브방송통찰보고서중국웨이보-2016년라이브방송통찰보고서
중국웨이보-2016년라이브방송통찰보고서
 

Similar to Twitch Plays Pokémon: Twitch's Chat Architecture

Openstack: An Open Source Cloud Framework
Openstack: An Open Source Cloud FrameworkOpenstack: An Open Source Cloud Framework
Openstack: An Open Source Cloud Framework
Andrew Shafer
 
Breaking Smart Speakers: We are Listening to You.
Breaking Smart Speakers: We are Listening to You.Breaking Smart Speakers: We are Listening to You.
Breaking Smart Speakers: We are Listening to You.
Priyanka Aash
 
Let's Get Real (time): Server-Sent Events, WebSockets and WebRTC for the soul
Let's Get Real (time): Server-Sent Events, WebSockets and WebRTC for the soulLet's Get Real (time): Server-Sent Events, WebSockets and WebRTC for the soul
Let's Get Real (time): Server-Sent Events, WebSockets and WebRTC for the soul
Swanand Pagnis
 
Microservices Antipatterns
Microservices AntipatternsMicroservices Antipatterns
Microservices Antipatterns
C4Media
 
Distributed "Web Scale" Systems
Distributed "Web Scale" SystemsDistributed "Web Scale" Systems
Distributed "Web Scale" Systems
Ricardo Vice Santos
 
WebRTC: A front-end perspective
WebRTC: A front-end perspectiveWebRTC: A front-end perspective
WebRTC: A front-end perspective
shwetank
 
WebRTC Reborn - Full Stack Toronto
WebRTC Reborn -  Full Stack TorontoWebRTC Reborn -  Full Stack Toronto
WebRTC Reborn - Full Stack Toronto
Dan Jenkins
 
Tips from Support: Always Carry a Towel and Don’t Panic!
Tips from Support: Always Carry a Towel and Don’t Panic!Tips from Support: Always Carry a Towel and Don’t Panic!
Tips from Support: Always Carry a Towel and Don’t Panic!
Perforce
 
Dylan Butler & Oliver Hager - Building a cross platform cryptocurrency app
Dylan Butler & Oliver Hager - Building a cross platform cryptocurrency appDylan Butler & Oliver Hager - Building a cross platform cryptocurrency app
Dylan Butler & Oliver Hager - Building a cross platform cryptocurrency app
DevCamp Campinas
 
Advanced WCF Workshop
Advanced WCF WorkshopAdvanced WCF Workshop
Advanced WCF Workshop
Ido Flatow
 
Rollup-as-a-service and why it matters to the next-gen of dApps
Rollup-as-a-service and why it matters to the next-gen of dAppsRollup-as-a-service and why it matters to the next-gen of dApps
Rollup-as-a-service and why it matters to the next-gen of dApps
TinaBregovi
 
Loophole: Timing Attacks on Shared Event Loops in Chrome
Loophole: Timing Attacks on Shared Event Loops in ChromeLoophole: Timing Attacks on Shared Event Loops in Chrome
Loophole: Timing Attacks on Shared Event Loops in Chrome
cgvwzq
 
Bit_Bucket_x31_Final
Bit_Bucket_x31_FinalBit_Bucket_x31_Final
Bit_Bucket_x31_FinalSam Knutson
 
Have Your Cake and Eat It Too -- Further Dispelling the Myths of the Lambda A...
Have Your Cake and Eat It Too -- Further Dispelling the Myths of the Lambda A...Have Your Cake and Eat It Too -- Further Dispelling the Myths of the Lambda A...
Have Your Cake and Eat It Too -- Further Dispelling the Myths of the Lambda A...
C4Media
 
[System design] Design a tweeter-like system
[System design] Design a tweeter-like system[System design] Design a tweeter-like system
[System design] Design a tweeter-like system
Aree Oh
 
Consul administration at scale
Consul administration at scaleConsul administration at scale
Consul administration at scale
Pierre Souchay
 
WebRTC Standards & Implementation Q&A - Legacy API Support Changes
WebRTC Standards & Implementation Q&A - Legacy API Support ChangesWebRTC Standards & Implementation Q&A - Legacy API Support Changes
WebRTC Standards & Implementation Q&A - Legacy API Support Changes
Amir Zmora
 
How you can benefit from using Redis - Ramirez
How you can benefit from using Redis - RamirezHow you can benefit from using Redis - Ramirez
How you can benefit from using Redis - Ramirez
Codemotion
 
FFMEET: running a non-profit conference system
FFMEET: running a non-profit conference systemFFMEET: running a non-profit conference system
FFMEET: running a non-profit conference system
Annika Wickert
 

Similar to Twitch Plays Pokémon: Twitch's Chat Architecture (20)

Openstack: An Open Source Cloud Framework
Openstack: An Open Source Cloud FrameworkOpenstack: An Open Source Cloud Framework
Openstack: An Open Source Cloud Framework
 
Breaking Smart Speakers: We are Listening to You.
Breaking Smart Speakers: We are Listening to You.Breaking Smart Speakers: We are Listening to You.
Breaking Smart Speakers: We are Listening to You.
 
Let's Get Real (time): Server-Sent Events, WebSockets and WebRTC for the soul
Let's Get Real (time): Server-Sent Events, WebSockets and WebRTC for the soulLet's Get Real (time): Server-Sent Events, WebSockets and WebRTC for the soul
Let's Get Real (time): Server-Sent Events, WebSockets and WebRTC for the soul
 
Microservices Antipatterns
Microservices AntipatternsMicroservices Antipatterns
Microservices Antipatterns
 
Distributed "Web Scale" Systems
Distributed "Web Scale" SystemsDistributed "Web Scale" Systems
Distributed "Web Scale" Systems
 
WebRTC: A front-end perspective
WebRTC: A front-end perspectiveWebRTC: A front-end perspective
WebRTC: A front-end perspective
 
WebRTC Reborn - Full Stack Toronto
WebRTC Reborn -  Full Stack TorontoWebRTC Reborn -  Full Stack Toronto
WebRTC Reborn - Full Stack Toronto
 
Tips from Support: Always Carry a Towel and Don’t Panic!
Tips from Support: Always Carry a Towel and Don’t Panic!Tips from Support: Always Carry a Towel and Don’t Panic!
Tips from Support: Always Carry a Towel and Don’t Panic!
 
Dylan Butler & Oliver Hager - Building a cross platform cryptocurrency app
Dylan Butler & Oliver Hager - Building a cross platform cryptocurrency appDylan Butler & Oliver Hager - Building a cross platform cryptocurrency app
Dylan Butler & Oliver Hager - Building a cross platform cryptocurrency app
 
Advanced WCF Workshop
Advanced WCF WorkshopAdvanced WCF Workshop
Advanced WCF Workshop
 
Rollup-as-a-service and why it matters to the next-gen of dApps
Rollup-as-a-service and why it matters to the next-gen of dAppsRollup-as-a-service and why it matters to the next-gen of dApps
Rollup-as-a-service and why it matters to the next-gen of dApps
 
Loophole: Timing Attacks on Shared Event Loops in Chrome
Loophole: Timing Attacks on Shared Event Loops in ChromeLoophole: Timing Attacks on Shared Event Loops in Chrome
Loophole: Timing Attacks on Shared Event Loops in Chrome
 
Bit_Bucket_x31_Final
Bit_Bucket_x31_FinalBit_Bucket_x31_Final
Bit_Bucket_x31_Final
 
Have Your Cake and Eat It Too -- Further Dispelling the Myths of the Lambda A...
Have Your Cake and Eat It Too -- Further Dispelling the Myths of the Lambda A...Have Your Cake and Eat It Too -- Further Dispelling the Myths of the Lambda A...
Have Your Cake and Eat It Too -- Further Dispelling the Myths of the Lambda A...
 
Preso fcul
Preso fculPreso fcul
Preso fcul
 
[System design] Design a tweeter-like system
[System design] Design a tweeter-like system[System design] Design a tweeter-like system
[System design] Design a tweeter-like system
 
Consul administration at scale
Consul administration at scaleConsul administration at scale
Consul administration at scale
 
WebRTC Standards & Implementation Q&A - Legacy API Support Changes
WebRTC Standards & Implementation Q&A - Legacy API Support ChangesWebRTC Standards & Implementation Q&A - Legacy API Support Changes
WebRTC Standards & Implementation Q&A - Legacy API Support Changes
 
How you can benefit from using Redis - Ramirez
How you can benefit from using Redis - RamirezHow you can benefit from using Redis - Ramirez
How you can benefit from using Redis - Ramirez
 
FFMEET: running a non-profit conference system
FFMEET: running a non-profit conference systemFFMEET: running a non-profit conference system
FFMEET: running a non-profit conference system
 

More from C4Media

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
C4Media
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy Mobile
C4Media
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
C4Media
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
C4Media
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
C4Media
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
C4Media
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
C4Media
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
C4Media
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
C4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
C4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
C4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
C4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
C4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
C4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
C4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
C4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
C4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
C4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
C4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
C4Media
 

More from C4Media (20)

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy Mobile
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 

Recently uploaded

Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 

Recently uploaded (20)

Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 

Twitch Plays Pokémon: Twitch's Chat Architecture

  • 1. Twitch Plays Pokémon: Twitch’s Chat Architecture John Rizzo Sr Software Engineer
  • 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ twitch-pokemon
  • 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon San Francisco www.qconsf.com
  • 7. • Over 800k concurrent users • Tens of BILLIONS of daily messages • ~10 Servers • 2 Engineers • 19 Amp Energy drinks per day Twitch Introduction www.twitch.tv
  • 8. Architecture Overview www.twitch.tv Video Edge API Edge Chat Edge fChannel Page Video Backend Services Chat
  • 9. Twitch Plays Pokémon Strikes! www.twitch.tv
  • 10. Public Reaction Media outlets have described the proceedings of the game as being "mesmerizing", "miraculous" and "beautiful chaos", with one viewer comparing it to "watching a car crash in slow motion” - Wikipedia “Some users are hailing the development as part of internet history…” – BBC –TwitchChatTeam www.twitch.tv
  • 11. 10:33pm, February 14: TPP hits 5k concurrent viewers 9:42am, February 15: TPP hits 20k 8:21pm, February 16: Time to take preemptive action Twitch Plays Pokémon Strikes! www.twitch.tv
  • 12. We separate servers into several clusters • Robust in the face of failure • Can tune each cluster for its specific purpose Chat Server Organization www.twitch.tv Main Cluster Event Cluster Group Cluster Test Cluster
  • 13. 8:21pm, February 16: Move TPP onto the event cluster Twitch Plays Pokémon Strikes! www.twitch.tv
  • 14. 5:43pm, February 16: TPP hits 50k users, chat system starts to show signs of stress (but…it’s on the event cluster!) 8:01am, February 17: Twitch chat engineers panic, rush to the office 9:35am, February 17: Investigation begins Twitch Plays Pokémon Strikes! www.twitch.tv
  • 15. Debugging principle: Start investigating upstream, move downstream as necessary Solving Twitch Plays Pokémon www.twitch.tv
  • 16. Edge Server: Sits at the “edge” of the public internet and Twitch’s internal network Chat Architecture Overview www.twitch.tv User Chat Edge Public Internet Internal Twitch Network Flash socket Ingestion Distribution
  • 17. Any user:room:edge tuple is valid Solving Twitch Plays Pokémon www.twitch.tv user9 user1:lirik user2:degentp user3:witwix … user4:witwix user5:armorra user6:degentp … user2:witwix user7:cep21 user8:lirik …
  • 18. A Note on Instrumentation www.twitch.tv Chat Service Remote statsd collector Graphite cluster UDP UDP HTTP
  • 19. Solving Twitch Plays Pokémon So let’s take a look at our edge server logs… Feb 18 06:54:04 tmi_edge: [clue] Timed out (after 5s) while writing request for: privmsg Feb 18 06:54:04 tmi_edge: [clue] Message successfully sent to #degentp Feb 18 06:54:04 tmi_edge: [clue] Timed out (after 5s) while writing request for: privmsg Feb 18 06:54:04 tmi_edge: [clue] Timed out (after 5s) while writing request for: privmsg Feb 18 06:54:04 tmi_edge: [clue] Timed out (after 5s) while writing request for: privmsg Feb 18 06:54:04 tmi_edge: [clue] Timed out (after 5s) while writing request for: privmsg Feb 18 06:54:04 tmi_edge: [clue] Timed out (after 5s) while writing request for: privmsg Feb 18 06:54:05 tmi_edge: [clue] Timed out (after 5s) while writing request for: privmsg Feb 18 06:54:04 tmi_edge: [clue] Message successfully sent to #paragusrants www.twitch.tv
  • 20. Solving Twitch Plays Pokémon Let’s dissect one of these log lines Feb 18 06:54:04 chat_edge: [clue] Timed out (after 5s) while writing request for: privmsg www.twitch.tv
  • 21. Solving Twitch Plays Pokémon Let’s dissect one of these log lines Feb 18 06:54:04 chat_edge: [clue] Timed out (after 5s) while writing request for: privmsg www.twitch.tv Timestamp - when this action was recorded on the server
  • 22. Solving Twitch Plays Pokémon Let’s dissect one of these log lines Feb 18 06:54:04 chat_edge: [clue] Timed out (after 5s) while writing request for: privmsg www.twitch.tv Server - the name of the server that is generating this log file
  • 23. Solving Twitch Plays Pokémon Let’s dissect one of these log lines Feb 18 06:54:04 chat_edge: [clue] Timed out (after 5s) while writing request for: privmsg www.twitch.tv Remote service - the name of the service that edge is connecting to
  • 24. Solving Twitch Plays Pokémon Let’s dissect one of these log lines Feb 18 06:54:04 chat_edge: [clue] Timed out (after 5s) while writing request for: privmsg www.twitch.tv Event detail Why did the clue service take so long to respond? Also, what is the clue service?
  • 25. Ingestion Message Ingestion www.twitch.tv User Chat Edge Clue Public Internet Internal Twitch Network Distribution HTTP
  • 26. Solving Twitch Plays Pokémon Let’s dissect one of these log lines Feb 18 06:54:04 tmi_edge: [clue] Timed out (after 5s) while writing request for: privmsg www.twitch.tv Clue server took longer than 5 seconds to process this message. Why?
  • 27. Solving Twitch Plays Pokémon Clue logs… Feb 18 06:54:04 chat_clue: WARNING:tornado.general:Connect error on fd 10: ECONNREFUSED Feb 18 06:54:04 chat_clue: WARNING:tornado.general:Connect error on fd 15: ECONNREFUSED Feb 18 06:54:05 chat_clue: WARNING:tornado.general:Connect error on fd 9: ECONNREFUSED Feb 18 06:54:05 chat_clue: WARNING:tornado.general:Connect error on fd 18: ECONNREFUSED www.twitch.tv Not very useful…but we get some info. Clue’s connections are being refused. Which machine is clue failing to connect to? Why?
  • 28. Let’s take a step back…these errors are happening on both main AND the event clusters. Why? Are there common services or dependencies? • Databases (store chat color, badges, bans, etc) • Cache (to speed up database access) • Services (authentication, user data, etc) Investigating Clue www.twitch.tv
  • 29. Investigating Clue www.twitch.tv Chat:Clue Rails Video PGBouncer Postgre s Redis
  • 30. Can rule out databases and services – rest of site is functional Let’s look closer at our cache – this one is specific to chat servers Investigating Clue www.twitch.tv
  • 31. Redis: where do we start investigating? Strategy: start high-level then drill down Investigating Redis www.twitch.tv Redis server isn’t being stressed very hard
  • 32. Investigating Redis www.twitch.tv Let’s look at how Clue is using Redis…
  • 35. Redis Configuration www.twitch.tv $ ps aux | grep redis 13909 0.0 0.0 2434840 796 s000 S+ grep redis Redis doesn’t seem to be running locally - what listens on port 2000?
  • 36. Redis Configuration www.twitch.tv $ netstat -lp | grep 2000 tcp 0 0 localhost:2000 *:* LISTEN 2109/haproxy HAProxy!
  • 37. HAProxy www.twitch.tv • Limits for #connections, requests, etc • Robust instrumentation Attribution: Shareholic.com
  • 38. Redis Configuration www.twitch.tv Are we load balancing across many Redis instances? DB_SERVER=db.internal.twitch.tv DB_NAME=twitch_db DB_TIMEOUT=1s CACHE_SERVER=localhost CACHE_PORT=2000 CACHE_MAX_CONNS=10 CACHE_TIMEOUT=100ms … …
  • 39. Redis Configurtaion www.twitch.tv Are we load balancing across many Redis instances? class twitch::haproxy::listeners::chat_redis ( $haproxy_instance = ‘chat-backend', $proxy_name = 'chat-redis', $servers = [ 'redis2.internal.twitch.tv:6379', ], ... ... ...
  • 40. Redis Configurtaion www.twitch.tv Are we load balancing across many Redis instances? class twitch::haproxy::listeners::chat_redis ( $haproxy_instance = ’chat-backend', $proxy_name = 'chat-redis', $servers = [ 'redis2.internal.twitch.tv:6379', ], ... ... ... We are not load balancing across several instances
  • 41. Investigating Redis $ top Tasks: 281 total, 1 running, 311 sleeping, 0 stopped, 0 zombie Cpu(s): 10.3%us, 10.5%sy, 0.0%ni, 95.4%id, 0.0%wa, 0.0%hi, Mem: 24682328k total,6962336k used, 17719992k free, 13644k buffers Swap: 7999484k total, 0k used, 7999484k free, 4151420k cached PID PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 26109 20 0 76048 128m 1340 S 99 0.2 6133:28 redis-server 3342 20 0 9040 1320 844 R 2 0.0 0:00.01 top 1 20 0 90412 3920 2576 S 0 0.0 103:45.82 init 2 20 0 0 0 0 S 0 0.0 0:05.17 kthreadd Let’s take a look at the Redis box… www.twitch.tv
  • 42. Investigating Redis Redis is unsprisingly maxing out the CPU $ top Tasks: 281 total, 1 running, 311 sleeping, 0 stopped, 0 zombie Cpu(s): 10.3%us, 10.5%sy, 0.0%ni, 95.4%id, 0.0%wa, 0.0%hi Mem: 24682328k total,6962336k used, 17719992k free, 13644k buffers Swap: 7999484k total, 0k used, 7999484k free, 4151420k cached PID PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 26109 20 0 76048 128m 1340 S 99 0.2 6133:28 redis-server 3342 20 0 9040 1320 844 R 2 0.0 0:00.01 top 1 20 0 90412 3920 2576 S 0 0.0 103:45.82 init 2 20 0 0 0 0 S 0 0.0 0:05.17 kthreadd www.twitch.tv
  • 43. Redis Optimization Options? • Optimize Redis at the application-level • Distribute Redis www.twitch.tv
  • 44. Clue Logic 2:00pm: Smarter caching? Redis Optimization Options? www.twitch.tv Redis
  • 45. Clue Logic 2:00pm: Smarter caching? Redis Optimization Options? www.twitch.tv Redis Clue local cache
  • 46. • 2:23pm: There are some challenges (cache coherence, implementation difficulty) • Is there any low-hanging fruit? • Yes! Rate limiting code! • 2:33pm: Change has little effect... Redis Optimization Options? www.twitch.tv
  • 47. 2:41pm: Distribute Redis? Redis Optimization Options? www.twitch.tv Clue Logic Redis Clue Local Cache Redis Redis Redis
  • 48. 2:56pm: Yes! Sharding by key! Redis Optimization Options? www.twitch.tv set “effinga.chat_color” get “degentp.chat_color” get “effinga.chat_color” set “degentp.chat_color” Redis Redis Redis Redis Distribution Function
  • 49. 3:03pm: How do we implement this? • Puppet configuration changes • HAProxy changes • Redis deploy changes (copy/paste!) • Do we need to modify any code? • Can we let HAProxy load balance for us? • No – LB needs to be aware of Redis protocol • Changes required at the application level Distributing Redis www.twitch.tv
  • 50. 3:52pm: Code surveyed – Two problematic patterns Distributing Redis www.twitch.tv
  • 53. Problematic pattern #2: Distributing Redis www.twitch.tv What if we need these keys in different contexts?
  • 55. 7:21pm: Test (fingers crossed) 8:11pm: Deploy cache changes 9:29pm: Deploy chat code changes Distributing Redis www.twitch.tv
  • 56. 10:10pm: Better, but still bad… Solving Twitch Plays Pokémon www.twitch.tv
  • 57. Solving Twitch Plays Pokémon Let’s use some tools to dig deeper… $ redis-cli -h redis2.internal.twitch.tv -p 6379 INFO # Clients connected_clients:3021 client_longest_output_list:0 client_biggest_input_buf:0 blocked_clients:0 $ redis-cli -h redis2.internal.twitch.tv -p 6379 CLIENT LIST | grep idle | wc -l 2311 www.twitch.tv
  • 58. Solving Twitch Plays Pokémon Lots of bad connections $ redis-cli -h redis2.internal.twitch.tv -p 6379 INFO # Clients connected_clients:3021 client_longest_output_list:0 client_biggest_input_buf:0 blocked_clients:0 $ redis-cli -h redis2.internal.twitch.tv -p 6379 CLIENT LIST | grep idle | wc -l 2311 www.twitch.tv
  • 59. Solving Twitch Plays Pokémon Let’s grab the pid of one Redis instance $ sudo svstat /etc/service/redis_* /etc/service/redis_6379: up (pid 26109) 3543 seconds /etc/service/redis_6380: up (pid 26111) 3543 seconds /etc/service/redis_6381: up (pid 26113) 3543 seconds /etc/service/redis_6382: up (pid 26114) 3544 seconds www.twitch.tv
  • 60. Solving Twitch Plays Pokémon $ sudo svstat /etc/service/redis_* /etc/service/redis_6379: up (pid 26109) 3543 seconds /etc/service/redis_6380: up (pid 26111) 3543 seconds /etc/service/redis_6381: up (pid 26113) 3543 seconds /etc/service/redis_6382: up (pid 26114) 3544 seconds www.twitch.tv Let’s grab the pid of one Redis instance
  • 61. Solving Twitch Plays Pokémon $ sudo lsof –p 26109 | grep chat | cut -d ' ' -f 32 | cut -d ':' -f 2 | sort | uniq –c 2012 6421->chat-testing.internal.twitch.tv 121 6421->chat1.internal.twitch.tv 101 6421->chat3.internal.twitch.tv ... www.twitch.tv
  • 62. Solving Twitch Plays Pokémon $ sudo lsof –p 26109 | grep tmi | cut -d ' ' -f 32 | cut -d ':' -f 2 | sort | uniq –c 2012 6421->chat-testing.internal.twitch.tv 121 6421->chat1.internal.twitch.tv 101 6421->chat3.internal.twitch.tv ... www.twitch.tv
  • 63. Solving Twitch Plays Pokémon Lesson learned: Testing is bad www.twitch.tv
  • 64. Solving Twitch Plays Pokémon 11:12pm: Shut off testing cluster completely www.twitch.tv
  • 65. Users can connect to chat again! Users can send messages again! Chat team leaves the office at 11:31pm Twitch Plays Pokémon is Solved(?) www.twitch.tv
  • 66. Complaints that users don’t see the same messages A New Bug Arises www.twitch.tv
  • 67. Message Distribution www.twitch.tv User Chat Edge Public Internet Internal Twitch Network Ingestion Distribution
  • 68. Message Distribution www.twitch.tv User Chat Edge Public Internet Internal Twitch Network Clue Exchange
  • 70. • Edge/Clue instrumentation show no errors • Exchange isn’t even instrumented! • Let’s fix that… Solving the Distribution Problem www.twitch.tv
  • 71. Solving the Distribution Problem Let’s look at our new exchange logs Feb 19 14:11:06 exchange: [exchange] i/o timeout Feb 19 14:11:06 exchange: [exchange] Exchange success Feb 19 14:11:06 exchange: [exchange] i/o timeout Feb 19 14:11:06 exchange: [exchange] i/o timeout Feb 19 14:11:06 exchange: [exchange] i/o timeout Feb 19 14:11:06 exchange: [exchange] i/o timeout Feb 19 14:11:06 exchange: [exchange] i/o timeout Feb 19 14:11:06 exchange: [exchange] i/o timeout Feb 19 14:11:06 exchange: [exchange] Exchange success www.twitch.tv Ideas?
  • 72. Solving the Distribution Problem • These are extremely short and simple requests, but there are many of them • We aren’t using HTTP keepalives www.twitch.tv
  • 73. Solving the Distribution Problem Go makes this extremely simple www.twitch.tv
  • 74. Lessons Learned • Better logs/instrumentation to make debugging easier • Generate fake traffic to force us to handle more load than we need • Make sure we use and configure our infrastructure wisely! www.twitch.tv
  • 75. What have we done since? • We now use AWS servers exclusively • Better Redis infrastructure • Python -> Go • Lots of other big and small changes to support new products and better QoS www.twitch.tv
  • 77. Video Architecture www.twitch.tv PoP Origin PoP PoP Ingest Distribution Broadcaster Viewers Replication Hierarchy
  • 83. Video Distribution www.twitch.tv PoP Video Edge Protected Replication HLS Proxy HLS Cache HLS Cache Find API Ingest Proxy Upstream PoP Protected Replication
  • 84. Video Distribution www.twitch.tv PoP Replication Hierarchy PoP Edge PR PoP Edge PR PoP Edge PR PoP Edge PR PoP Edge PR Transcoding Tier 1 Cache
  • 86. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/twitch- pokemon