Deploying large       payloads at scale        Ramon van AlterenFriday, November 11, 2011
Hyves           •            9,7M dutch members (16,7M population)           •            ~7M unique visitors / month (Com...
Hyves - Operational environment            •               3500 node serverpark in 3 datacenters            •             ...
Weekend project                            Result: 4.5 x speed/throughput increaseFriday, November 11, 2011
Weekend -> Company projectFriday, November 11, 2011
A Few Minor problems          •             compilation took ~40-60 minutes          •             resulting binary was 75...
Part I - Solving the build problem                                    Jenkins to the rescue                            Add...
Part I - Solving the build problem                            Add some serious hardware                            Compile...
Part II: Deploying Sequential ?                              500MB @ 1Gb/s = 4 seconds                             500MB @...
Part II: Deploying - BittorrentFriday, November 11, 2011
Bittorrent - Previous experiences                            Naive run using bittorrent to transport                      ...
Bittorrent - Previous experiencesFriday, November 11, 2011
Bittorrent - The Problem                            Every server has 1Gb/s link to every other                            ...
Bittorrent - Actual bandwidth available                            1-4Gb/s    Core                                      Ne...
Bittorrent - Actual bandwidth available                             Production                               traffic       ...
Murder - Why not ?                            Murder uses two tricks:                            •      Clients (including...
DIY: Location aware trackerFriday, November 11, 2011
SMDB - Location metadata                            We have bandwidth information available in                            ...
DIY: 2-tier swarmsFriday, November 11, 2011
DIY - Tracker                            Tracker in python + Flask:                            •   1100 lines of code (190...
DIY - Client                            We use a slightly modified rtorrent client                            Same things t...
Results - single deploy ~300 hosts                            2000                            1500                        ...
Graphs - full webfarm clusterFriday, November 11, 2011
Graphs - single nodeFriday, November 11, 2011
Graphs - single rack (24 nodes)Friday, November 11, 2011
Bitorrent - Statistics == release: 101482 expected: 287 actual: 107 seeders: 0 progress: 0.00% start: 12:04:20 last_comple...
Next Steps                      Enable more projects                      •     currently only three main projects        ...
Next Steps            Open Source the tracker:            •               Very closely tied to our infra            •     ...
Next Steps - Deployment Glue                      TODO:                      [] prepare stuff                      [] tran...
Next Steps - Deployment Glue                      Simple from a single host perspective                      Complex when ...
Thank you, Questions ?          Bram Cohen - http://www.bittorrent.com/          Flask: http://flask.pocoo.org/          Rt...
Upcoming SlideShare
Loading in …5
×

Deploying large payloads at scale

843 views

Published on

Deploying large payloads at scale talk at Velocity conference Europe in Berlin november 2011

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
843
On SlideShare
0
From Embeds
0
Number of Embeds
18
Actions
Shares
0
Downloads
2
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Deploying large payloads at scale

  1. 1. Deploying large payloads at scale Ramon van AlterenFriday, November 11, 2011
  2. 2. Hyves • 9,7M dutch members (16,7M population) • ~7M unique visitors / month (Comscore 09/2011) • ~2.3M unique visitors / day • 800.000 photo uploads / day • 7M chat messages / dayFriday, November 11, 2011
  3. 3. Hyves - Operational environment • 3500 node serverpark in 3 datacenters • 6Gbps daily outgoing traffic • System Engineering team: 12 • Development team: 33Friday, November 11, 2011
  4. 4. Weekend project Result: 4.5 x speed/throughput increaseFriday, November 11, 2011
  5. 5. Weekend -> Company projectFriday, November 11, 2011
  6. 6. A Few Minor problems • compilation took ~40-60 minutes • resulting binary was 750MB • Code issues • gcc 4.5.2 required + added depsFriday, November 11, 2011
  7. 7. Part I - Solving the build problem Jenkins to the rescue Add distCC to speed up compilation timesFriday, November 11, 2011
  8. 8. Part I - Solving the build problem Add some serious hardware Compile / build in < 6 minsFriday, November 11, 2011
  9. 9. Part II: Deploying Sequential ? 500MB @ 1Gb/s = 4 seconds 500MB @ 500Mb/s = 8 seconds 500MB @ 200Mb/s = 20 seconds 450 servers * 8 seconds = 3600 seconds == 1 hour 450 servers * 20 seconds = 9000 seconds == 2.5 hour Diffs ? binary diff would be between 10KB - 400MB Even on consecutive runs without changeFriday, November 11, 2011
  10. 10. Part II: Deploying - BittorrentFriday, November 11, 2011
  11. 11. Bittorrent - Previous experiences Naive run using bittorrent to transport 300MB throughout our serverpark • Near-complete network outage due to bandwidth starvation • Several crucial subsystems delayed or unreachable due to network bandwidth shortageFriday, November 11, 2011
  12. 12. Bittorrent - Previous experiencesFriday, November 11, 2011
  13. 13. Bittorrent - The Problem Every server has 1Gb/s link to every other server they don’tFriday, November 11, 2011
  14. 14. Bittorrent - Actual bandwidth available 1-4Gb/s Core NetworkFriday, November 11, 2011
  15. 15. Bittorrent - Actual bandwidth available Production traffic Core Administration Network trafficFriday, November 11, 2011
  16. 16. Murder - Why not ? Murder uses two tricks: • Clients (including the seeder) capped to 1 upload peer • Every client receives every peer from the tracker • No download bandwidth cap (easy to add though) Peers will still connect all over the place It’s slow, timing run over 25 peers took 7 minsFriday, November 11, 2011
  17. 17. DIY: Location aware trackerFriday, November 11, 2011
  18. 18. SMDB - Location metadata We have bandwidth information available in our server management database Build two-tier bittorrent swarms: • 1 swarm with 2 peers / rack(uplink) • 1 additional swarm per rack (uplink) • cap every client @ 96mbit/sFriday, November 11, 2011
  19. 19. DIY: 2-tier swarmsFriday, November 11, 2011
  20. 20. DIY - Tracker Tracker in python + Flask: • 1100 lines of code (1900 with tests) • Stores transfer metadata in redis • Connects to our SMDB using REST • Exposes REST interfaceFriday, November 11, 2011
  21. 21. DIY - Client We use a slightly modified rtorrent client Same things twitter modified: • Remove features related to operating on the big bad internet. • Make various timeouts more aggresive • No DHT, UPNP etc. Nice bonus: RPC remote APIFriday, November 11, 2011
  22. 22. Results - single deploy ~300 hosts 2000 1500 1000 500 0 Without build With build Classic deploy Bittorrent deployFriday, November 11, 2011
  23. 23. Graphs - full webfarm clusterFriday, November 11, 2011
  24. 24. Graphs - single nodeFriday, November 11, 2011
  25. 25. Graphs - single rack (24 nodes)Friday, November 11, 2011
  26. 26. Bitorrent - Statistics == release: 101482 expected: 287 actual: 107 seeders: 0 progress: 0.00% start: 12:04:20 last_completed: none failed: 0 == == release: 101482 expected: 287 actual: 267 seeders: 0 progress: 0.00% start: 12:04:20 last_completed: none failed: 0 == == release: 101482 expected: 287 actual: 286 seeders: 0 progress: 0.00% start: 12:04:20 last_completed: none failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 1 progress: 0.35% start: 12:04:20 last_completed: none failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 2 progress: 42.11% start: 12:04:20 last_completed: 12:06:01 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 5 progress: 44.48% start: 12:04:20 last_completed: 12:06:05 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 34 progress: 45.80% start: 12:04:20 last_completed: 12:06:09 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 95 progress: 48.17% start: 12:04:20 last_completed: 12:06:10 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 95 progress: 48.17% start: 12:04:20 last_completed: 12:06:10 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 97 progress: 48.46% start: 12:04:20 last_completed: 12:06:15 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 101 progress: 49.23% start: 12:04:20 last_completed: 12:06:21 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 108 progress: 50.72% start: 12:04:20 last_completed: 12:06:24 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 118 progress: 52.95% start: 12:04:20 last_completed: 12:06:27 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 131 progress: 55.88% start: 12:04:20 last_completed: 12:06:30 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 172 progress: 67.77% start: 12:04:20 last_completed: 12:06:34 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 213 progress: 80.53% start: 12:04:20 last_completed: 12:06:37 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 239 progress: 87.36% start: 12:04:20 last_completed: 12:06:40 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 246 progress: 91.24% start: 12:04:20 last_completed: 12:06:43 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 265 progress: 97.55% start: 12:04:20 last_completed: 12:06:46 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 274 progress: 98.79% start: 12:04:20 last_completed: 12:06:49 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 280 progress: 99.07% start: 12:04:20 last_completed: 12:06:52 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 283 progress: 99.26% start: 12:04:20 last_completed: 12:06:55 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 284 progress: 99.36% start: 12:04:20 last_completed: 12:06:56 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 284 progress: 99.36% start: 12:04:20 last_completed: 12:06:56 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 285 progress: 99.52% start: 12:04:20 last_completed: 12:07:04 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 286 progress: 99.85% start: 12:04:20 last_completed: 12:07:06 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 287 progress: 100.00% start: 12:04:20 last_completed: 12:07:37 failed: 0 ==Friday, November 11, 2011
  27. 27. Next Steps Enable more projects • currently only three main projects • one codebase • Multi-transfer Move to Continuous Delivery • currently doing continuous deployment • 6-12 deploys a day • want to allow feature teams to deployFriday, November 11, 2011
  28. 28. Next Steps Open Source the tracker: • Very closely tied to our infra • Not overly clean code We will open source it, however some refactoring is needed contact me if you’re interested. Watch our github repository: https://github.com/organizations/hyves-orgFriday, November 11, 2011
  29. 29. Next Steps - Deployment Glue TODO: [] prepare stuff [] transport stuff [] do some more stuff [] Activate payload [] post checkFriday, November 11, 2011
  30. 30. Next Steps - Deployment Glue Simple from a single host perspective Complex when executed in parallel, remotely, with failure handling and proper reporting FabricFriday, November 11, 2011
  31. 31. Thank you, Questions ? Bram Cohen - http://www.bittorrent.com/ Flask: http://flask.pocoo.org/ Rtorrent: http://libtorrent.rakshasa.no/ Twitter - Murder: https://github.com/lg/murder Boris, Cor, Lorenzo & others at hyves.nl Michael Tekel: https://github.com/mtekel/ http://wiki.theory.org/BitTorrentSpecification You ? Weʼre hiring: http://werkenbijhyves.hyves.nl email: ramon@hyves.nl twitter: @ramonvanalterenFriday, November 11, 2011

×