Learning to Build Distributed Systems the Hard Way
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Learning to Build Distributed Systems the Hard Way

  • 742 views
Uploaded on

I’ve learned how to build distributed systems the hard way; I’ve failed, and failed again. I’ve made many of the common mistakes and tried a few other things that turned out to be a disappointment.......

I’ve learned how to build distributed systems the hard way; I’ve failed, and failed again. I’ve made many of the common mistakes and tried a few other things that turned out to be a disappointment. You shouldn't have to make those mistakes too. In this talk I'll tell the story of how I built a real time advertising analytics platform that tracks and reports on millions of impressions every day, and all the things I did wrong before I got it to work. I’ll also tell you what I did right, and the choices I don’t regret.

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
742
On Slideshare
742
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
2
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. LEARNING TO BUILDDISTRIBUTED SYSTEMS THE HARD WAY @iconara
  • 2. speakerdeck.com/u/iconara (real time!)
  • 3. Theo / @iconara
  • 4. Chief Architect at
  • 5. let’s make online advertising a great experience
  • 6. MAKING THIS
  • 7. INTO THIS
  • 8. HOW HARD CAN IT BE?
  • 9. TRACKING AD IMPRESSIONS track page views and all their adstrack visibility and send updates on changes track events, track activity, sync cookies, and track visits
  • 10. LOADED VISIBLE HIDDEN LOADED VISIBLE track page views and all their adstrack visibility and send updates on changes track events, track activity, sync cookies, and track visits
  • 11. ASSEMBLING SESSIONS assemble ad impressions, page views and visits,to be able to calculate things like total visible duration mix in demographics, revenue, and third-party data
  • 12. WAS HIDDEN BECAME { A CLICK! "user_id": "M9L6R5TD0YXK", ACTIVE "session_id": "MAI3QAGNAIYT", "timestamp": 1347896675038, "placement_name": "example", "category": "frontpage", "embed_url": "http://example.com/", "visible_duration": 1340 "browser": "Chrome", "device_type": "computer", BECAME BECAME "click": true, "ad_dimensions":"980x300"WAS VISIBLE } VISIBLELOADED AGAIN 3rd PARTY DATA & OTHER GOODIES assemble ad impressions, page views and visits,to be able to calculate things like total visible duration mix in demographics, revenue, and third-party data
  • 13. ANALYTICSprecompute metrics, count uniques, build visitor histories for attribution
  • 14. precompute metrics, count uniques, build visitor histories for attribution
  • 15. HOW HARD CAN IT BE?
  • 16. 25K REQUESTS PER SECOND~1 billion requests per day, 1 TB raw data
  • 17. ONE VISIT CAN CHANGE UP TO 100K COUNTERShundreds of millions of individual counters per day, plus counting uniques and visitor histories
  • 18. IN REAL TIMEor near real time, if you want to be pedantic
  • 19. START WITH TWOOF EVERYTHINGgoing from one to two is the hardest
  • 20. GIVE A LOT OFTHOUGHT TO YOUR KEYS AND IDS it will save you lots of pain
  • 21. a timestamp something randomMANLO0 JME57Z monotonically increasing, sorts nicely
  • 22. something random a timestampJME57Z MANLO0 uniformly distributed, works nicely with sharding
  • 23. PUT BUFFERS BETWEEN LAYERS queues can even out peaks, let you scale layers independently,and let you restart services without loosing data
  • 24. SEPARATE PROCESSING FROM STORAGEthat way you can scale each independently
  • 25. × × ×× × × ×PLAN HOW TO GETRID OF YOUR DATAdeleting stuff is harder than you might think
  • 26. × NoDBkeep things streaming
  • 27. STREAMPARTITIONING
  • 28. RANDOMLYwhen you have no interdependenciesbetween things it’s easy to scale out (or round robin, it’s basically the same)
  • 29. CONSISTENTLYwhen there are interdependencies you needto route using some property of the objects,but make sure you get a uniform distribution
  • 30. NUMEROLOGY
  • 31. 12
  • 32. 2 | 123 | 124 | 126 | 12
  • 33. 8 | 245 | 60
  • 34. 12, 60, 120, 360superior highly composite numbers
  • 35. 12, 60, 120, 360superior highly composite numbers
  • 36. 12, 60, 120, 360superior highly composite numbers
  • 37. 12, 60, 120, 360superior highly composite numbers
  • 38. 12, 60, 120, 360superior highly composite numbers
  • 39. 12, 60, 120, 360superior highly composite numbers
  • 40. 12, 60, 120, 360superior highly composite numbers
  • 41. 12, 60, 120, 360superior highly composite numbers
  • 42. for maximal flexibility partition with multiples of 12
  • 43. for maximal flexibility partition with multiples of 12
  • 44. A SHORT DIVERSION ABOUT COUNTING TO 60the reason why there’s 60 seconds to a minute, and 360 degrees to a circle
  • 45. 3 SEGMENTSON EACH FINGER = 12
  • 46. 3 SEGMENTSON EACH FINGER = 12 FIVE FINGERS ON OTHER HAND = 60
  • 47. log2(366) ≈ 31
  • 48. $-$(ASCII code 36)-----
  • 49. log2(366) ≈ 31
  • 50. log2(366) ≈ 31six characters 0-9, A-Z can represent 31 bits,which is kind of almost very close to four bytes
  • 51. MANLO0
  • 52. Time.now.to_i.to_s(36).upcase MANLO0 a timestamp
  • 53. DO YOU REALLY NEED A BACKUP? if you got 3x replication over multipleavailability zones, is that backup really worth it?
  • 54. PRODUCTION IS THE ONLY REAL TEST ENVIRONMENT when thousands of things happen every second,new, weird and unforeseen things happen all the time, no test can anticipate everything (but testing is good anyway, just don’t think you got everything covered)
  • 55. KTHXBAI @iconara github.com/iconaraarchitecturalatrocities.com burtcorp.com
  • 56. COME TO SWEDEN IN MARCH ANDTALK ABOUT BIG DATA scandevconf.se/2013/call-for-proposals
  • 57. IDEMPOTENCE
  • 58. f(f(x)) = f(x)doing something again doesn’t change the outcome
  • 59. IDEMPOTENCE if you don’t have to worry about things accidentallyhappening twice, everything becomes much simpler
  • 60. COUNTING UNIQUESwhen adding to a set it doesn’t matter how many times you do it, the end result is the same
  • 61. INC X VS SET Xincrements are not idempotent, and very scary,if you can avoid non-idempotent operations, try
  • 62. KTHXBAI @iconara github.com/iconaraarchitecturalatrocities.com burtcorp.com