Your SlideShare is downloading. ×
Learning to build distributed systems the hard way
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Learning to build distributed systems the hard way

56
views

Published on

JDays 2012

JDays 2012

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
56
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. LEARNING TO BUILD DISTRIBUTED SYSTEMS THE HARD WAY @iconara
  • 2. LEARNING TO BUILD DISTRIBUTED SYSTEMS THE HARD WAY BIG DATA @iconara
  • 3. speakerdeck.com/u/iconara (real time!)
  • 4. Theo / @iconara
  • 5. chief architect at BURT
  • 6. let’s make online advertising a great experience
  • 7. MAKING THIS
  • 8. INTO THIS
  • 9. HOWHARDCANITBE?
  • 10. 30K REQUESTS PER SECOND more than a billion requests per day, over 1 TB raw data
  • 11. ONE VISIT CAN CHANGE UP TO 100K COUNTERS hundreds of millions of individual counters per day, plus counting uniques and visitor histories
  • 12. IN REAL TIME or near real time, if you want to be pedantic ×
  • 13. HOWHARDCANITBE?
  • 14. START WITH TWO OF EVERYTHING going from one to two is the hardest, solve the scaling problem up front
  • 15. START WITH TWO OF EVERYTHING you’ll solve the scaling problem, and need less overcapacity THREE
  • 16. GIVE A LOT OF THOUGHT TO KEYS AND IDS and think about your queries first
  • 17. MEIHO0 JME57Z monotonically increasing, sorts nicely a timestamp something random
  • 18. JME57Z MEIHO0 uniformly distributed, works nicely with sharding something random a timestamp
  • 19. CONSISTENCY IS OVERRATED don’t fear R + W < N
  • 20. PRECOMPUTE ALL THE THINGS your users most likely don’t know what they want, so why let them do ad hoc queries?
  • 21. SEPARATE PROCESSING FROM STORAGE that way you can scale each independently
  • 22. PLAN HOW TO GET RID OF YOUR DATA deleting stuff is harder than you might think × × × × × × ×
  • 23. NoDB keep things streaming ×
  • 24. DIVIDE THE LOAD big data systems are all about routing and partitioning
  • 25. RANDOM when you have no interdependencies between things it’s easy to scale out
  • 26. CONSISTENT when there are interdependencies you need to route using some property of the objects, but make sure you get a uniform distribution
  • 27. NUMEROLOGY
  • 28. 12
  • 29. 2 | 12 3 | 12 4 | 12 6 | 12
  • 30. 8 | 24 5 | 60
  • 31. A DIVERSION ABOUT COUNTING TO 60 the reason why there’s 60 seconds to a minute, and 360 degrees to a circle ××
  • 32. 3 SEGMENTS ON EACH FINGER = 12
  • 33. 3 SEGMENTS ON EACH FINGER = 12 FIVE FINGERS ON OTHER HAND = 60
  • 34. 12, 60, 120, 360 superior highly composite numbers
  • 35. 12, 60, 120, 360 superior highly composite numbers
  • 36. 12, 60, 120, 360 superior highly composite numbers
  • 37. 12, 60, 120, 360 superior highly composite numbers
  • 38. 12, 60, 120, 360 superior highly composite numbers
  • 39. 12, 60, 120, 360 superior highly composite numbers
  • 40. 12, 60, 120, 360 superior highly composite numbers
  • 41. 12, 60, 120, 360 superior highly composite numbers
  • 42. 12, 60, 120, 360 superior highly composite numbers
  • 43. 12, 60, 120, 360 superior highly composite numbers
  • 44. 12, 60, 120, 360 superior highly composite numbers
  • 45. 12, 60, 120, 360 superior highly composite numbers
  • 46. use multiples of 12 to scale without always having to double
  • 47. BLAH BLAH BLAH use multiples of 12 to scale without always having to double
  • 48. log2(366) ≈ 31
  • 49. $-$ (ASCII code 36)-----
  • 50. log2(366) ≈ 31
  • 51. log2(366) ≈ 31 six characters 0-9, A-Z can represent 31 bits, which is kind of almost very close to four bytes
  • 52. MEIHO0
  • 53. MEIHO0 a timestamp Time.now.to_i.to_s(36).upcase
  • 54. YOU CAN’T SCALE TO REAL TIME and don’t trust code that doesn’t run continuously ×
  • 55. DO YOU REALLY NEED A BACKUP? if you got 3x replication over multiple availability zones, is that backup really worth it?
  • 56. PRODUCTION IS THE ONLY REAL TEST ENVIRONMENT when thousands of things happen every second, new, weird and unforeseen things happen all the time, your tests can only cover the foreseeable =
  • 57. GÖTEBORG, DISTRIBUTED @gbgdistr
  • 58. KTHXBAI @iconara github.com/iconara architecturalatrocities.com burtcorp.com