All Things Open 2014 - Day 2
Thursday, October 23rd, 2014
Peter Herndon
Senior Application Engineer for Bitly
DevOps
Streaming Way to Webscale: How We Scale Bitly via Streaming
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Streaming Way to Webscale: How We Scale Bitly via Streaming
1. 16 x 9
Streaming Your Way to Web Scale: Scaling
Bitly via Stream-Based Processing!
All Things Open!
October 23, 2014, 4:15pm
85
slides
23
Images
21
diagrams
26. Basic Web App
Web App
Database
Basic web app. In Python, Django + Postgres, Flask + Postgres, Tornado + Postgres
27. Scaling the Mountain (of Load)
Web App
Database
Web App Web App
First bottleneck: web layer
28. Cache Rules Everything Around Me
Database
Web AppWeb AppWeb App
Cache
Remove web layer bottleneck, next is DB, so add caching
layer
29. You Want Me to Replicate You
Database
Database
Web AppWeb AppWeb App
Cache
Works for a while, but DB requests still take too long, so
replicate
30. Shards Here, Shards There
Web AppWeb AppWeb App
Cache
Database
DatabaseDatabase
Database
Database
Database
…and then shard
31. It’s Off to Work I Go!
Database
Web App
Cache
Queue
Worker
But individual requests still take too long, because doing too much work. So add message
queue and worker. In Python, Celery
32. Message From a Bottle(.py)
Database
Web App
Cache
Queue
Worker
Web app sends messages to queue
34. Write Here, Write Now
Database
Web App
Cache
Queue
Worker
Worker writes results to database, file system, etc.
35. Write Here, Write Now (redux)
Web App
Database
Worker
Queue
Instead, imagine worker writes results
36. Sending Out an SMS
Web App
Database
Worker
Queue
and web app writes event messages to queue local to the web service
37. Listen, listen, LISTEN
Web App
Database
Worker
Queue
Queue
but worker is listening to a queue running on another server
38. Workin’ On a Chain(ed) Gang
Web App
Database
Worker
Queue
Web App
Database
Worker
Queue
Web App
Database
Worker
Queue
39. Look it up!
Web App
Database
Worker
Queue
Queue
Worker finds queue with topic via nsqlookupd
40. if
__name__
==
"__main__":
tornado.options.parse_command_line()
logatron_client.setup()
Reader(
topic=settings.get('nsqd_output_topic'),
channel='queuereader_spam_metrics',
validate_method=validate_message,
message_handler=count_spam_actions,
lookupd_http_addresses=settings.get('nsq_lookupd')
)
run()
/<service>/queuereader_<service>.py
How do I find things?
41. Sending Out an SMS
Web App
Database
Worker
Queue
topic: ‘spam_api’
First time app writes to a TOPIC in the local nsqd
42. Sending Out an SMS
Web App
Database
Worker
Queue
topic: ‘spam_api’
nsqlookupd
nsqd creates the topic and registers it with nsqlookupd
43. Where Am I Again?
Web App
Database
Worker
nsqd
topic: 'spam_counter'
nsqd
topic: 'spam_api'
nsqlookupd
topic: 'spam_api'?
Worker in another service looking for a topic asks nsqlookupd, replies with
address
44. Talkin’ ‘Bout Something
Web App
Database
Worker
nsqd
topic: 'spam_counter'
nsqd
topic: 'spam_api'
channel: 'spam_counter'
queuereader connects to nsqd, registers a channel
46. Channeling the Ghost
nsqd
topic: 'spam_api'
Worker Worker Worker
channel: 'spam_counter'
Worker
channel: 'nsq_to_file''
full copy of all messages to each channel
61. if
__name__
==
"__main__":
tornado.options.parse_command_line()
logatron_client.setup()
Reader(
topic=settings.get('nsqd_output_topic'),
channel='queuereader_spam_metrics',
validate_method=validate_message,
message_handler=count_spam_actions,
lookupd_http_addresses=settings.get('nsq_lookupd')
)
run()
/<service>/queuereader_<service>.py, 1 of 4
62. if
__name__
==
"__main__":
tornado.options.parse_command_line()
logatron_client.setup()
Reader(
topic=settings.get('nsqd_output_topic'),
channel='queuereader_spam_metrics',
validate_method=validate_message,
message_handler=count_spam_actions,
lookupd_http_addresses=settings.get('nsq_lookupd')
)
run()
/<service>/queuereader_<service>.py, 1 of 4
63. def
validate_message(message):
if
message.get('o')
==
'+'
and
message.get('l'):
return
True
if
message.get('o')
==
'-‐'
and
message.get('l')
and
message.get('bl'):
return
True
return
False
/<service>/queuereader_<service>.py, 2 of 4
64. if
__name__
==
"__main__":
tornado.options.parse_command_line()
logatron_client.setup()
Reader(
topic=settings.get('nsqd_output_topic'),
channel='queuereader_spam_metrics',
validate_method=validate_message,
message_handler=count_spam_actions,
lookupd_http_addresses=settings.get('nsq_lookupd')
)
run()
/<service>/queuereader_<service>.py, 1 of 4
80. Streaming Architecture
Easy to build new services
Easy to scale individual components horizontally
Durable in the face of single component failure
Distributed
81. THINGS TO THINK ABOUT
Monitoring, monitoring, monitoring
Failure modes — how can things fail? How does your application as a whole handle the failure of individual components?
Measurement — metrics show the range
Timeouts — connection timeouts, DNS timeouts — a slow network is the same as a failed service
84. Web Scale - http://www.mongodb-is-web-scale.com!
Waterfall - https://www.flickr.com/photos/desatur8/14949285342!
Tornado - https://www.flickr.com/photos/indigente/798304!
John de Lancie - https://www.flickr.com/photos/cayusa/1394930005!
Ben Whishaw - https://www.flickr.com/photos/rossendalewadey/6032496676!
Command Key - https://www.flickr.com/photos/klash/3175479797!
iPhone6 Event - https://www.flickr.com/photos/notionscapital/15067798867!
Wait for iPhone - https://www.flickr.com/photos/josh_gray/662814907!
NSQ Logo - http://nsq.io!
!
All other photos by T. Peter Herndon!
!
Photo Credits