David Cramer: Building to scale

  • 1,460 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,460
On Slideshare
0
From Embeds
0
Number of Embeds
3

Actions

Shares
Downloads
21
Comments
0
Likes
6

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. BUILDING TO SCALE David Cramer twitter.com/zeegTuesday, February 26, 13
  • 2. The things we build will not and can not lastTuesday, February 26, 13
  • 3. Who am I?Tuesday, February 26, 13
  • 4. Tuesday, February 26, 13
  • 5. Tuesday, February 26, 13
  • 6. Tuesday, February 26, 13
  • 7. What do we mean by scale?Tuesday, February 26, 13
  • 8. DISQUS Massive traffic with a long tail Sentry Counters and event aggregation tenXer More stats than we can countTuesday, February 26, 13
  • 9. Does one size fit all?Tuesday, February 26, 13
  • 10. Practical StorageTuesday, February 26, 13
  • 11. Postgres is the foundation of DISQUSTuesday, February 26, 13
  • 12. MySQL powers the tenXer graph storeTuesday, February 26, 13
  • 13. Sentry is built on SQLTuesday, February 26, 13
  • 14. Databases are not the problemTuesday, February 26, 13
  • 15. CompromiseTuesday, February 26, 13
  • 16. Scaling is about PredictabilityTuesday, February 26, 13
  • 17. Augment SQL with [technology]Tuesday, February 26, 13
  • 18. Tuesday, February 26, 13
  • 19. Simple solutions using Redis (I like Redis)Tuesday, February 26, 13
  • 20. CountersTuesday, February 26, 13
  • 21. Counters are everywhereTuesday, February 26, 13
  • 22. Counters in SQL UPDATE table SET counter = counter + 1;Tuesday, February 26, 13
  • 23. Counters in Redis INCR counter 1 >>> redis.incr(counter)Tuesday, February 26, 13
  • 24. Counters in Sentry event ID 1 event ID 2 event ID 3 Redis INCR Redis INCR Redis INCR SQL UpdateTuesday, February 26, 13
  • 25. Counters in Sentry ‣ INCR event_id in Redis ‣ Queue buffer incr task ‣ 5 - 10s explicit delay ‣ Task does atomic GET event_id and DEL event_id (Redis pipeline) ‣ No-op If GET is not > 0 ‣ One SQL UPDATE per unique event per delayTuesday, February 26, 13
  • 26. Counters in Sentry (cont.) Pros ‣ Solves database row lock contention ‣ Redis nodes are horizontally scalable ‣ Easy to implement Cons ‣ Too many dummy (no-op) tasksTuesday, February 26, 13
  • 27. Alternative Counters event ID 1 event ID 2 event ID 3 Redis ZINCRBY Redis ZINCRBY Redis ZINCRBY SQL UpdateTuesday, February 26, 13
  • 28. Sorted Sets in Redis > ZINCRBY events ad93a 1 {ad93a: 1} > ZINCRBY events ad93a 1 {ad93a: 2} > ZINCRBY events d2ow3 1 {ad93a: 2, d2ow3: 1}Tuesday, February 26, 13
  • 29. Alternative Counters ‣ ZINCRBY events event_id in Redis ‣ Cron buffer flush ‣ ZRANGE events to get pending updates ‣ Fire individual task per update ‣ Atomic ZSCORE events event_id and ZREM events event_id to get and flush count.Tuesday, February 26, 13
  • 30. Alternative Counters (cont.) Pros ‣ Removes (most) no-op tasks ‣ Works without a complex queue due to no required delay on jobs Cons ‣ Single Redis key stores all pending updatesTuesday, February 26, 13
  • 31. Activity StreamsTuesday, February 26, 13
  • 32. Streams are everywhereTuesday, February 26, 13
  • 33. Streams in SQL class Activity: SET_RESOLVED = 1 SET_REGRESSION = 6 TYPE = ( (SET_RESOLVED, set_resolved), (SET_REGRESSION, set_regression), ) event = ForeignKey(Event) type = IntegerField(choices=TYPE) user = ForeignKey(User, null=True) datetime = DateTimeField() data = JSONField(null=True)Tuesday, February 26, 13
  • 34. Streams in SQL (cont.) >>> Activity(event, SET_RESOLVED, user, now) "David marked this event as resolved." >>> Activity(event, SET_REGRESSION, datetime=now) "The system marked this event as a regression." >>> Activity(type=DEPLOY_START, datetime=now) "A deploy started." >>> Activity(type=SET_RESOLVED, datetime=now) "All events were marked as resolved"Tuesday, February 26, 13
  • 35. Stream == View == CacheTuesday, February 26, 13
  • 36. Views as a Cache TIMELINE = [] MAX = 500 def on_event_creation(event): global TIMELINE TIMELINE.insert(0, event) TIMELINE = TIMELINE[:MAX] def get_latest_events(num=100): return TIMELINE[:num]Tuesday, February 26, 13
  • 37. Views in Redis class Timeline(object): def __init__(self): self.db = Redis() def add(self, event): score = float(event.date.strftime(%s.%m)) self.db.zadd(timeline, event.id, score) def list(self, offset=0, limit=-1): return self.db.zrevrange( timeline, offset, limit)Tuesday, February 26, 13
  • 38. Views in Redis (cont.) MAX_SIZE = 10000 def add(self, event): score = float(event.date.strftime(%s.%m)) # increment the key and trim the data to avoid # data bloat in a single key with self.db.pipeline() as pipe: pipe.zadd(self.key, event.id, score) pipe.zremrange(self.key, event.id, MAX_SIZE, -1)Tuesday, February 26, 13
  • 39. QueuingTuesday, February 26, 13
  • 40. Introducing CeleryTuesday, February 26, 13
  • 41. RabbitMQ or RedisTuesday, February 26, 13
  • 42. Asynchronous Tasks # Register the task @task(exchange=”event_creation”) def on_event_creation(event_id): counter.incr(events, event_id) # Delay execution on_event_creation(event.id)Tuesday, February 26, 13
  • 43. Fanout @task(exchange=”counters”) def incr_counter(key, id=None): counter.incr(key, id) @task(exchange=”event_creation”) def on_event_creation(event_id): incr_counter.delay(events, event_id) incr_counter.delay(global) # Delay execution on_event_creation(event.id)Tuesday, February 26, 13
  • 44. Object CachingTuesday, February 26, 13
  • 45. Object Cache Prerequisites ‣ Your database cant handle the read-load ‣ Your data changes infrequently ‣ You can handle slightly worse performanceTuesday, February 26, 13
  • 46. Distributing Load with Memcache Memcache 1 Memcache 2 Memcache 3 Event ID 01 Event ID 02 Event ID 03 Event ID 04 Event ID 05 Event ID 06 Event ID 07 Event ID 08 Event ID 09 Event ID 10 Event ID 11 Event ID 12 Event ID 13 Event ID 14 Event ID 15Tuesday, February 26, 13
  • 47. Querying the Object Cache def make_key(model, id): return {}:{}.format(model.__name__, id) def get_by_ids(model, id_list): model_name = model.__name__ keys = map(make_key, id_list) res = cache.get_multi() pending = set() for id, value in res.iteritems(): if value is None: pending.add(id) if pending: mres = model.objects.in_bulk(pending) cache.set_multi({make_key(o.id): o for o in mres}) res.update(mres) return resTuesday, February 26, 13
  • 48. Pushing State def save(self): cache.set(make_key(type(self), self.id), self) def delete(self): cache.delete(make_key(type(self), self.id)Tuesday, February 26, 13
  • 49. Redis for Persistence Redis 1 Redis 2 Redis 3 Event ID 01 Event ID 02 Event ID 03 Event ID 04 Event ID 05 Event ID 06 Event ID 07 Event ID 08 Event ID 09 Event ID 10 Event ID 11 Event ID 12 Event ID 13 Event ID 14 Event ID 15Tuesday, February 26, 13
  • 50. Routing with Nydus # create a cluster of Redis connections which # partition reads/writes by (hash(key) % size) from nydus.db import create_cluster redis = create_cluster({ engine: nydus.db.backends.redis.Redis, router: nydus.db...redis.PartitionRouter, hosts: { {0: {db: 0} for n in xrange(10)}, } }) github.com/disqus/nydusTuesday, February 26, 13
  • 51. Planning for the FutureTuesday, February 26, 13
  • 52. One of the largest problems for Disqus is network-wide moderationTuesday, February 26, 13
  • 53. Be Mindful of FeaturesTuesday, February 26, 13
  • 54. Sentrys Team Dashboard ‣ Data limited to a single team ‣ Simple views which could be materialized ‣ Only entry point for "data for team"Tuesday, February 26, 13
  • 55. Sentrys Stream View ‣ Data limited to a single project ‣ Each project could map to a different DBTuesday, February 26, 13
  • 56. Preallocate ShardsTuesday, February 26, 13
  • 57. redis-1 DB0 DB1 DB2 DB3 DB4 DB5 DB6 DB7 DB8 DB9Tuesday, February 26, 13
  • 58. redis-1 DB0 DB1 DB2 DB3 DB4 When a physical machine becomes overloaded migrate a chunk of shards to another machine. redis-2 DB5 DB6 DB7 DB8 DB9Tuesday, February 26, 13
  • 59. TakeawaysTuesday, February 26, 13
  • 60. Enhance your database Dont replace itTuesday, February 26, 13
  • 61. Queue EverythingTuesday, February 26, 13
  • 62. Learn to say no (to features)Tuesday, February 26, 13
  • 63. Complex problems do not require complex solutionsTuesday, February 26, 13
  • 64. QUESTIONS?Tuesday, February 26, 13