SCALING
Òscar Vilaplana
@grimborg
http://oscarvilaplana.cat
WHAT’S THIS ABOUT?
People
Technology
Tools
PEOPLE
Care
Focus
Automate & Test.
Shared brain
Finish & DRY.
TECH
Design to clone
Separate pieces
API
Offload everything
Measure
VIRTUAL QUEUE
Queue
Instance
Queue
Instance
Queue
Instance
Queue
Instance
VIRTUAL QUEUE
Queue
Instance
Queue
Instance
Queue
Instance
VIRTUAL QUEUE
Queue
Instance
Queue
Instance
Queue
Instance
VIRTUAL QUEUE
Queue
Instance
Queue
Instance
Queue
Instance
Queue
Instance
TECH
• Design to clone
• Separate pieces
• API
• Offload everything
• Measure
TYPES OF TASKS
• Realtime
• ASAP
• When you have time } Async!
INSTAGRAM’S FEED
• Redis queue per follower.
• New media: push to queues
• Small chained tasks
INSTAGRAM’S FEED
harro wouter orestis siebejan oscar
Schedule

next

batch
SMALL TASKS
• 10k followers per task
• < 2s
• Finer-grained load balancing
• Lower penalty of failure/reload
CELERY: REDIS
• Good: Fast
• Bad:
• Polling for task distribution
• Messy non-synchronous replication
• Memory limits task...
CELERY: BEANSTALK
• Good:
• Fast
• Push to consumers
• Writes to disk
• Bad:
• No replication
• Only useful for Celery
CELERY: RABBITMQ
• Fast
• Writes to disk
• Low-maintenance synchronous replication
• Excellent Celery compatibility
• Supp...
RESERVATIONS
• UI
• Room locking
• Room availability
• Registration manager
• Email, PDF invoice
• Payment
• Login
• …
WE DON’T DO THIS
def do_everything(request):
hotel_id = request.GET.hotel_id
room_number = request.GET.room_number
with ro...
BUT WE DO THIS
• Frontend UI
• Locking rooms
• Calculating room availability
• Temporarily locking rooms
• Payment process...
BUT WE CAN SCALE!
SCALE DB: HARD
• Slaves
• Master-
Master?
• Sharding?
SCALING
MINOR SCALE
MAJOR SCALE
FRONTEND
Everything
Frontend
External

payment

providers
User
Everything
Frontend
Master
Read slaves
SPLIT
• Responsibility
• Stateful/stateless
• Type of system
TYPES OF SYSTEMS
• Unique (mutex, datastore)
• Multiple
TYPES OF TASKS
• Realtime
• ASAP
• When you have time } Async!
SPLIT THIS
Everything
Frontend
External

payment

providers
User
Everything
Frontend
Master
Read slaves
AUTONOMOUS SYSTEMS
Payment
External

payment

providers
Locking
Invoice

PDF
Mailer
UI
Reservations
Manager
User
Session

...
CLONABILITY
CLONABILITY
CLONABILITY
Frontend
CLONABILITY
Everything
Frontend
External

payment

providers
User
Everything
Frontend
Master
Read slaves
WHAT’S IN AN EASY STEP
As little change as possible.
Reuse.
Unintrusive.
Measure.
Go on the right direction.
SMALL STEPS
PROBLEMS?
!
Oversells
Configuration
Reporting
Payout
Everything
Frontend
Everything
Frontend
Everything
Fronte...
SMALL STEPS
PROBLEMS?
!
Oversells
Configuration
Reporting
Payout
SessionsRoom
Availability
Lock
Read
Everything
Frontend
E...
ISOLATED SYSTEM
Best technology
Decoupled
API
Testable
SMALL STEPS
PROBLEMS?
!
Oversells
Configuration
Reporting
Payout
Sessions
Everything
Frontend
Config
Backend
Settings
Every...
INITIAL SYSTEM
Everything
Frontend
INITIAL SYSTEM (MODIFIED)
Everything
Frontend Sales
Sync
INITIAL SYSTEM (MODIFIED)
Sales Backend
SMALL STEPS
PROBLEMS?
!
Oversells
Configuration
Reporting
Payout
Sessions
Everything
Frontend
Sales
Backend
Sales
Main DB
...
SMALL STEPS
PROBLEMS?
!
Oversells
Configuration
Reporting
Payout
SessionsSession
Storage Everything
Frontend
Everything
Fr...
WHEN?
• Difficult.
• Measure everything.
• Find patterns.
• Define thresholds.
• Design: address as risk.
• Don’t overenig...
EVENTBRITE
• 2012: $600M ticket sales
• Accumulated: $1B
TECHNOLOGY
• Monitoring: nagios, ganglia, pingdom
• Email: offloaded to StrongMail
• Load-balanced read slave pool
• Featu...
TECHNOLOGY
• Feature flags
• Develop on Vagrant
• Celery + RabbitMQ
• Virtual customer queue
• Big data for reporting, fra...
TECHNOLOGY
• Hadoop
• Cassandra
• HBase
• Hive
• Separated into independent services
TIPS
• Instrument and monitor everything
• Lean
HOW BIG?
• 2Gb/day database transactions
• 3.5Tb/day social data analyzed
• 15Gb/day logs
ORDER PROCESSOR
• Pub/sub queue with Cassandra and
Zookeeper
PUBLISHING
Publisher
Get queue lock+last batch id
Create new batch
“process orders 10, 11, 12”
Store batch id, release lock
SUBSCRIBING
Subscriber
Get my latest processed batch id
Store result
Update my latest processed batch id
SCALING STORAGE
• Move to NoSQL
• Aggressively move queries to slaves
• Different indexes per slave
• Better hardware
• Mo...
EMAIL ADDRESSES
• Users have many email addresses.
• Lookup by email, join to users table
FIRST ATTEMPT
CREATE TABLE `user_emails` (
`id` int NOT NULL AUTO_INCREMENT,
`email_address` varchar(255) NOT NULL,
... --...
FIRST ATTEMPT
LOOKUP
CAN IT BE IMPROVED?
INDEX VS PK
• InnoDB: B+trees, O(log n)
• Known user id: index on email not
needed.
• Small win on lookup: O(1)
• Big win ...
INNODB INDEXES
HASH TABLE
DISQUS
• >165K messages per second
• <10ms latency
• 1.3B unique visitors
• 10B page views
• 500M users in discussions
• 3...
ORIGINAL REALTIME BACKEND
• Python + gevent
• NginxPushStream
• Network IO: great
• CPU: choking at peaks
• <15ms latency
CURRENT REALTIME BACKEND
• Go
• Handles all users
• Normal load:

3200 connections/machine/sec
• <10ms latency
• Only 10%-...
Workers
CURRENT REALTIME BACKEND
Subscribed to results
Push result to user
NginxPushStream
TESTING
• Test with real traffic
• Measure everything
LESSONS
• Do work once, distribute results.
• Most likely to fail: your code. Don’t
reinvent. Keep team small.
• End-to-en...
LEARN MORE
• Instagram
• Braintree
• highscalability.com
• VelocityConf (youtube, nov 2014 @ bcn?)
QUESTIONS?
ANSWERS?
THANKS!
Òscar Vilaplana
@grimborg
http://oscarvilaplana.cat
Scaling
Scaling
Scaling
Scaling
Scaling
Upcoming SlideShare
Loading in …5
×

Scaling

192
-1

Published on

Scaling: a naïve approach

A look at how to scale an existing monolythic system, and how companies such as Disqus and Eventbrite have done it.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
192
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Scaling

  1. 1. SCALING Òscar Vilaplana @grimborg http://oscarvilaplana.cat
  2. 2. WHAT’S THIS ABOUT? People Technology Tools
  3. 3. PEOPLE Care Focus Automate & Test. Shared brain Finish & DRY.
  4. 4. TECH Design to clone Separate pieces API Offload everything Measure
  5. 5. VIRTUAL QUEUE Queue Instance Queue Instance Queue Instance Queue Instance
  6. 6. VIRTUAL QUEUE Queue Instance Queue Instance Queue Instance
  7. 7. VIRTUAL QUEUE Queue Instance Queue Instance Queue Instance
  8. 8. VIRTUAL QUEUE Queue Instance Queue Instance Queue Instance Queue Instance
  9. 9. TECH • Design to clone • Separate pieces • API • Offload everything • Measure
  10. 10. TYPES OF TASKS • Realtime • ASAP • When you have time } Async!
  11. 11. INSTAGRAM’S FEED • Redis queue per follower. • New media: push to queues • Small chained tasks
  12. 12. INSTAGRAM’S FEED harro wouter orestis siebejan oscar Schedule
 next
 batch
  13. 13. SMALL TASKS • 10k followers per task • < 2s • Finer-grained load balancing • Lower penalty of failure/reload
  14. 14. CELERY: REDIS • Good: Fast • Bad: • Polling for task distribution • Messy non-synchronous replication • Memory limits task capacity
  15. 15. CELERY: BEANSTALK • Good: • Fast • Push to consumers • Writes to disk • Bad: • No replication • Only useful for Celery
  16. 16. CELERY: RABBITMQ • Fast • Writes to disk • Low-maintenance synchronous replication • Excellent Celery compatibility • Supports other use cases
  17. 17. RESERVATIONS • UI • Room locking • Room availability • Registration manager • Email, PDF invoice • Payment • Login • …
  18. 18. WE DON’T DO THIS def do_everything(request): hotel_id = request.GET.hotel_id room_number = request.GET.room_number with room_mutex(hotel_id, room_number): room = (session.query(Room) .filter(Room.hotel_id == hotel_id) .filter(Room.room_number == room_number).one()) if not room.available: return Response("Room not available”, template=room_template) reservation = Reservation(client=request.client, room=room) session.add(reservation) room.available = False price = # price_calculation payment = Payment(reservation=reservation, price=price) session.add(payment) session.commit() url = payment.get_psp_url() return Redirect(url)
  19. 19. BUT WE DO THIS • Frontend UI • Locking rooms • Calculating room availability • Temporarily locking rooms • Payment processing • Mail • PDF invoice generation
  20. 20. BUT WE CAN SCALE!
  21. 21. SCALE DB: HARD • Slaves • Master- Master? • Sharding?
  22. 22. SCALING
  23. 23. MINOR SCALE
  24. 24. MAJOR SCALE
  25. 25. FRONTEND Everything Frontend External
 payment
 providers User Everything Frontend Master Read slaves
  26. 26. SPLIT • Responsibility • Stateful/stateless • Type of system
  27. 27. TYPES OF SYSTEMS • Unique (mutex, datastore) • Multiple
  28. 28. TYPES OF TASKS • Realtime • ASAP • When you have time } Async!
  29. 29. SPLIT THIS Everything Frontend External
 payment
 providers User Everything Frontend Master Read slaves
  30. 30. AUTONOMOUS SYSTEMS Payment External
 payment
 providers Locking Invoice
 PDF Mailer UI Reservations Manager User Session
 Storage Datawarehouse Reporting Configuration Payout
  31. 31. CLONABILITY
  32. 32. CLONABILITY
  33. 33. CLONABILITY Frontend
  34. 34. CLONABILITY Everything Frontend External
 payment
 providers User Everything Frontend Master Read slaves
  35. 35. WHAT’S IN AN EASY STEP As little change as possible. Reuse. Unintrusive. Measure. Go on the right direction.
  36. 36. SMALL STEPS PROBLEMS? ! Oversells Configuration Reporting Payout Everything Frontend Everything Frontend Everything Frontend Everything Frontend Everything Frontend Everything Frontend
  37. 37. SMALL STEPS PROBLEMS? ! Oversells Configuration Reporting Payout SessionsRoom Availability Lock Read Everything Frontend Everything Frontend Everything Frontend Everything Frontend Everything Frontend Everything Frontend
  38. 38. ISOLATED SYSTEM Best technology Decoupled API Testable
  39. 39. SMALL STEPS PROBLEMS? ! Oversells Configuration Reporting Payout Sessions Everything Frontend Config Backend Settings Everything Frontend Everything Frontend Everything Frontend Everything Frontend Everything Frontend Everything Frontend
  40. 40. INITIAL SYSTEM Everything Frontend
  41. 41. INITIAL SYSTEM (MODIFIED) Everything Frontend Sales Sync
  42. 42. INITIAL SYSTEM (MODIFIED) Sales Backend
  43. 43. SMALL STEPS PROBLEMS? ! Oversells Configuration Reporting Payout Sessions Everything Frontend Sales Backend Sales Main DB Everything Frontend Everything Frontend Everything Frontend Everything Frontend Everything Frontend Everything Frontend
  44. 44. SMALL STEPS PROBLEMS? ! Oversells Configuration Reporting Payout SessionsSession Storage Everything Frontend Everything Frontend Everything Frontend Everything Frontend Everything Frontend Everything Frontend
  45. 45. WHEN? • Difficult. • Measure everything. • Find patterns. • Define thresholds. • Design: address as risk. • Don’t overenigneer — Don’t ignore.
  46. 46. EVENTBRITE • 2012: $600M ticket sales • Accumulated: $1B
  47. 47. TECHNOLOGY • Monitoring: nagios, ganglia, pingdom • Email: offloaded to StrongMail • Load-balanced read slave pool • Feature flags • Automated server configuration and release with Puppet and Jenkins
  48. 48. TECHNOLOGY • Feature flags • Develop on Vagrant • Celery + RabbitMQ • Virtual customer queue • Big data for reporting, fraud, spam, event recommendations
  49. 49. TECHNOLOGY • Hadoop • Cassandra • HBase • Hive • Separated into independent services
  50. 50. TIPS • Instrument and monitor everything • Lean
  51. 51. HOW BIG? • 2Gb/day database transactions • 3.5Tb/day social data analyzed • 15Gb/day logs
  52. 52. ORDER PROCESSOR • Pub/sub queue with Cassandra and Zookeeper
  53. 53. PUBLISHING Publisher Get queue lock+last batch id Create new batch “process orders 10, 11, 12” Store batch id, release lock
  54. 54. SUBSCRIBING Subscriber Get my latest processed batch id Store result Update my latest processed batch id
  55. 55. SCALING STORAGE • Move to NoSQL • Aggressively move queries to slaves • Different indexes per slave • Better hardware • Most optimal tables for large and highly-utilized datasets
  56. 56. EMAIL ADDRESSES • Users have many email addresses. • Lookup by email, join to users table
  57. 57. FIRST ATTEMPT CREATE TABLE `user_emails` ( `id` int NOT NULL AUTO_INCREMENT, `email_address` varchar(255) NOT NULL, ... --other columns about the user `user_id` int, --foreign key to users KEY (`email_address`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
  58. 58. FIRST ATTEMPT
  59. 59. LOOKUP
  60. 60. CAN IT BE IMPROVED?
  61. 61. INDEX VS PK • InnoDB: B+trees, O(log n) • Known user id: index on email not needed. • Small win on lookup: O(1) • Big win on not storing the index.
  62. 62. INNODB INDEXES
  63. 63. HASH TABLE
  64. 64. DISQUS • >165K messages per second • <10ms latency • 1.3B unique visitors • 10B page views • 500M users in discussions • 3M communitios • 25M comments
  65. 65. ORIGINAL REALTIME BACKEND • Python + gevent • NginxPushStream • Network IO: great • CPU: choking at peaks • <15ms latency
  66. 66. CURRENT REALTIME BACKEND • Go • Handles all users • Normal load:
 3200 connections/machine/sec • <10ms latency • Only 10%-20% CPU
  67. 67. Workers CURRENT REALTIME BACKEND Subscribed to results Push result to user NginxPushStream
  68. 68. TESTING • Test with real traffic • Measure everything
  69. 69. LESSONS • Do work once, distribute results. • Most likely to fail: your code. Don’t reinvent. Keep team small. • End-to-end ACKs are expensive. Avoid. • Understand use cases when load testing. • Tune architecture to scale.
  70. 70. LEARN MORE • Instagram • Braintree • highscalability.com • VelocityConf (youtube, nov 2014 @ bcn?)
  71. 71. QUESTIONS? ANSWERS? THANKS! Òscar Vilaplana @grimborg http://oscarvilaplana.cat
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×