LEARNING TO BUILDDISTRIBUTED SYSTEMS    THE HARD WAY        @iconara
speakerdeck.com/u/iconara          (real time!)
Theo / @iconara
Chief Architect at
let’s make online advertising a great experience
MAKING THIS
INTO THIS
HOW HARD CAN IT BE?
TRACKING AD IMPRESSIONS      track page views and all their adstrack visibility and send updates on changes  track events,...
LOADED                      VISIBLE                      HIDDEN                       LOADED                       VISIBLE...
ASSEMBLING            SESSIONS   assemble ad impressions, page views and visits,to be able to calculate things like total ...
WAS                      HIDDEN   BECAME                                                        {                         ...
ANALYTICSprecompute metrics, count uniques, build visitor histories for attribution
precompute metrics, count uniques, build visitor histories for attribution
HOW HARD CAN IT BE?
25K REQUESTS   PER SECOND~1 billion requests per day, 1 TB raw data
ONE VISIT CAN      CHANGE UP TO     100K COUNTERShundreds of millions of individual counters per day,   plus counting uniq...
IN REAL TIMEor near real time, if you want to be pedantic
START WITH TWOOF EVERYTHINGgoing from one to two is the hardest
GIVE A LOT OFTHOUGHT TO YOUR  KEYS AND IDS   it will save you lots of pain
a timestamp              something randomMANLO0 JME57Z  monotonically increasing,       sorts nicely
something random                   a timestampJME57Z MANLO0      uniformly distributed,    works nicely with sharding
PUT BUFFERS BETWEEN LAYERS           queues can even out peaks,       let you scale layers independently,and let you resta...
SEPARATE  PROCESSING FROM STORAGEthat way you can scale each independently
×                   × ××                 × ×                    ×PLAN HOW TO GETRID OF YOUR DATAdeleting stuff is harder t...
×   NoDBkeep things streaming
STREAMPARTITIONING
RANDOMLYwhen you have no interdependenciesbetween things it’s easy to scale out         (or round robin, it’s basically th...
CONSISTENTLYwhen there are interdependencies you needto route using some property of the objects,but make sure you get a u...
NUMEROLOGY
12
2 | 123 | 124 | 126 | 12
8 | 245 | 60
12, 60, 120, 360superior highly composite numbers
12, 60, 120, 360superior highly composite numbers
12, 60, 120, 360superior highly composite numbers
12, 60, 120, 360superior highly composite numbers
12, 60, 120, 360superior highly composite numbers
12, 60, 120, 360superior highly composite numbers
12, 60, 120, 360superior highly composite numbers
12, 60, 120, 360superior highly composite numbers
for maximal flexibility partition with multiples of 12
for maximal flexibility partition with multiples of 12
A SHORT DIVERSION ABOUT  COUNTING TO 60the reason why there’s 60 seconds to a minute,          and 360 degrees to a circle
3 SEGMENTSON EACH FINGER                 = 12
3 SEGMENTSON EACH FINGER                  = 12                  FIVE FINGERS                 ON OTHER HAND                ...
log2(366) ≈ 31
$-$(ASCII code 36)-----
log2(366) ≈ 31
log2(366) ≈ 31six characters 0-9, A-Z can represent 31 bits,which is kind of almost very close to four bytes
MANLO0
Time.now.to_i.to_s(36).upcase     MANLO0       a timestamp
DO YOU REALLY  NEED A BACKUP?      if you got 3x replication over multipleavailability zones, is that backup really worth ...
PRODUCTION IS THE  ONLY REAL TEST   ENVIRONMENT  when thousands of things happen every second,new, weird and unforeseen th...
KTHXBAI        @iconara   github.com/iconaraarchitecturalatrocities.com       burtcorp.com
COME TO SWEDEN   IN MARCH ANDTALK ABOUT BIG DATA  scandevconf.se/2013/call-for-proposals
IDEMPOTENCE
f(f(x)) = f(x)doing something again doesn’t change the outcome
IDEMPOTENCE if you don’t have to worry about things accidentallyhappening twice, everything becomes much simpler
COUNTING UNIQUESwhen adding to a set it doesn’t matter how many   times you do it, the end result is the same
INC X VS SET Xincrements are not idempotent, and very scary,if you can avoid non-idempotent operations, try
KTHXBAI        @iconara   github.com/iconaraarchitecturalatrocities.com       burtcorp.com
Upcoming SlideShare
Loading in...5
×

Learning to Build Distributed Systems the Hard Way

599

Published on

I’ve learned how to build distributed systems the hard way; I’ve failed, and failed again. I’ve made many of the common mistakes and tried a few other things that turned out to be a disappointment. You shouldn't have to make those mistakes too. In this talk I'll tell the story of how I built a real time advertising analytics platform that tracks and reports on millions of impressions every day, and all the things I did wrong before I got it to work. I’ll also tell you what I did right, and the choices I don’t regret.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
599
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Learning to Build Distributed Systems the Hard Way

  1. 1. LEARNING TO BUILDDISTRIBUTED SYSTEMS THE HARD WAY @iconara
  2. 2. speakerdeck.com/u/iconara (real time!)
  3. 3. Theo / @iconara
  4. 4. Chief Architect at
  5. 5. let’s make online advertising a great experience
  6. 6. MAKING THIS
  7. 7. INTO THIS
  8. 8. HOW HARD CAN IT BE?
  9. 9. TRACKING AD IMPRESSIONS track page views and all their adstrack visibility and send updates on changes track events, track activity, sync cookies, and track visits
  10. 10. LOADED VISIBLE HIDDEN LOADED VISIBLE track page views and all their adstrack visibility and send updates on changes track events, track activity, sync cookies, and track visits
  11. 11. ASSEMBLING SESSIONS assemble ad impressions, page views and visits,to be able to calculate things like total visible duration mix in demographics, revenue, and third-party data
  12. 12. WAS HIDDEN BECAME { A CLICK! "user_id": "M9L6R5TD0YXK", ACTIVE "session_id": "MAI3QAGNAIYT", "timestamp": 1347896675038, "placement_name": "example", "category": "frontpage", "embed_url": "http://example.com/", "visible_duration": 1340 "browser": "Chrome", "device_type": "computer", BECAME BECAME "click": true, "ad_dimensions":"980x300"WAS VISIBLE } VISIBLELOADED AGAIN 3rd PARTY DATA & OTHER GOODIES assemble ad impressions, page views and visits,to be able to calculate things like total visible duration mix in demographics, revenue, and third-party data
  13. 13. ANALYTICSprecompute metrics, count uniques, build visitor histories for attribution
  14. 14. precompute metrics, count uniques, build visitor histories for attribution
  15. 15. HOW HARD CAN IT BE?
  16. 16. 25K REQUESTS PER SECOND~1 billion requests per day, 1 TB raw data
  17. 17. ONE VISIT CAN CHANGE UP TO 100K COUNTERShundreds of millions of individual counters per day, plus counting uniques and visitor histories
  18. 18. IN REAL TIMEor near real time, if you want to be pedantic
  19. 19. START WITH TWOOF EVERYTHINGgoing from one to two is the hardest
  20. 20. GIVE A LOT OFTHOUGHT TO YOUR KEYS AND IDS it will save you lots of pain
  21. 21. a timestamp something randomMANLO0 JME57Z monotonically increasing, sorts nicely
  22. 22. something random a timestampJME57Z MANLO0 uniformly distributed, works nicely with sharding
  23. 23. PUT BUFFERS BETWEEN LAYERS queues can even out peaks, let you scale layers independently,and let you restart services without loosing data
  24. 24. SEPARATE PROCESSING FROM STORAGEthat way you can scale each independently
  25. 25. × × ×× × × ×PLAN HOW TO GETRID OF YOUR DATAdeleting stuff is harder than you might think
  26. 26. × NoDBkeep things streaming
  27. 27. STREAMPARTITIONING
  28. 28. RANDOMLYwhen you have no interdependenciesbetween things it’s easy to scale out (or round robin, it’s basically the same)
  29. 29. CONSISTENTLYwhen there are interdependencies you needto route using some property of the objects,but make sure you get a uniform distribution
  30. 30. NUMEROLOGY
  31. 31. 12
  32. 32. 2 | 123 | 124 | 126 | 12
  33. 33. 8 | 245 | 60
  34. 34. 12, 60, 120, 360superior highly composite numbers
  35. 35. 12, 60, 120, 360superior highly composite numbers
  36. 36. 12, 60, 120, 360superior highly composite numbers
  37. 37. 12, 60, 120, 360superior highly composite numbers
  38. 38. 12, 60, 120, 360superior highly composite numbers
  39. 39. 12, 60, 120, 360superior highly composite numbers
  40. 40. 12, 60, 120, 360superior highly composite numbers
  41. 41. 12, 60, 120, 360superior highly composite numbers
  42. 42. for maximal flexibility partition with multiples of 12
  43. 43. for maximal flexibility partition with multiples of 12
  44. 44. A SHORT DIVERSION ABOUT COUNTING TO 60the reason why there’s 60 seconds to a minute, and 360 degrees to a circle
  45. 45. 3 SEGMENTSON EACH FINGER = 12
  46. 46. 3 SEGMENTSON EACH FINGER = 12 FIVE FINGERS ON OTHER HAND = 60
  47. 47. log2(366) ≈ 31
  48. 48. $-$(ASCII code 36)-----
  49. 49. log2(366) ≈ 31
  50. 50. log2(366) ≈ 31six characters 0-9, A-Z can represent 31 bits,which is kind of almost very close to four bytes
  51. 51. MANLO0
  52. 52. Time.now.to_i.to_s(36).upcase MANLO0 a timestamp
  53. 53. DO YOU REALLY NEED A BACKUP? if you got 3x replication over multipleavailability zones, is that backup really worth it?
  54. 54. PRODUCTION IS THE ONLY REAL TEST ENVIRONMENT when thousands of things happen every second,new, weird and unforeseen things happen all the time, no test can anticipate everything (but testing is good anyway, just don’t think you got everything covered)
  55. 55. KTHXBAI @iconara github.com/iconaraarchitecturalatrocities.com burtcorp.com
  56. 56. COME TO SWEDEN IN MARCH ANDTALK ABOUT BIG DATA scandevconf.se/2013/call-for-proposals
  57. 57. IDEMPOTENCE
  58. 58. f(f(x)) = f(x)doing something again doesn’t change the outcome
  59. 59. IDEMPOTENCE if you don’t have to worry about things accidentallyhappening twice, everything becomes much simpler
  60. 60. COUNTING UNIQUESwhen adding to a set it doesn’t matter how many times you do it, the end result is the same
  61. 61. INC X VS SET Xincrements are not idempotent, and very scary,if you can avoid non-idempotent operations, try
  62. 62. KTHXBAI @iconara github.com/iconaraarchitecturalatrocities.com burtcorp.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×