Your SlideShare is downloading. ×
StumbleUpon UK Hadoop Users Group 2011
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

StumbleUpon UK Hadoop Users Group 2011

1,735
views

Published on

Published in: Technology

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,735
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
21
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. A Sneak Peek into StumbleUpon’s Infrastructure
  • 2. Quick SU Intro
  • 3. Our Traffic
  • 4. Our Stack: 100% Open-Source• MySQL (legacy source of truth) In prod since ’09• Memcache (lots)• HBase (most new apps / features)• Hadoop (DWH, MapReduce, Hive, ...)• elasticsearch (“you know, for search”)• OpenTSDB (distributed monitoring)• Varnish (HTTP load-balancing)• Gearman (processing off the fast path)• ... etc
  • 5. The Infrastructure 2 core 52 x 10GbE 1U Arista 7050 Arista 7050switches SFP+ ... L3 ECMP1U Arista 7048T Arista 7048T Arista 7048T Arista 7048T Thick2U Nodes 48x1GbE copper ... MTU=9000 4x10GbE SFP+2U Thin Nodes
  • 6. The Infrastructure • SuperMicro half-width motherboards • 2 x Intel L5630 (40W TDP) (16 hardware threads total) • 48GB RAM • Commodity disks (consumer grade SATA 7200rpm) • 1x2TB per “thin node” (4-in-2U) (web/app servers, gearman, etc.) • 6x2TB per “thick node” (2-in-2U) (Hadoop/HBase, elasticsearch, etc.)(86 nodes = 1PB)
  • 7. The Infrastructure• No virtualization• No oversubscription• Rack locality doesn’t matter much (sub-100µs RTT across racks)• cgroups / Linux containers to keep MapReduce under controlTwo production HBase clusters per colo• Low-latency (user-facing services)• Batch (analytics, scheduled jobs...)
  • 8. Low-Latency Cluster• Workload mostly driven by HBase• Very few scheduled MR jobs• HBase replication to batch cluster• Most queries from PHP over ThriftChallenges:• Tuning Hadoop for low latency• Taming the long latency tail• Quickly recovering from failures
  • 9. Batch Cluster• 2x more capacity• Wildly changing workload (e.g. 40K 14M QPS)• Lots of scheduled MR jobs• Frequent ad-hoc jobs (MR/Hive)• OpenTSDB’s data >800M data points added per day 133B data points totalChallenges:• Resource isolation• Tuning for larger scale
  • 10. Questions? l? Think this is coo W e’re hiring