StumbleUpon UK Hadoop Users Group 2011
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

StumbleUpon UK Hadoop Users Group 2011

on

  • 2,160 views

 

Statistics

Views

Total Views
2,160
Views on SlideShare
2,053
Embed Views
107

Actions

Likes
2
Downloads
21
Comments
0

3 Embeds 107

http://lanyrd.com 105
http://s.medcl.net 1
http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

StumbleUpon UK Hadoop Users Group 2011 Presentation Transcript

  • 1. A Sneak Peek into StumbleUpon’s Infrastructure
  • 2. Quick SU Intro
  • 3. Our Traffic
  • 4. Our Stack: 100% Open-Source• MySQL (legacy source of truth) In prod since ’09• Memcache (lots)• HBase (most new apps / features)• Hadoop (DWH, MapReduce, Hive, ...)• elasticsearch (“you know, for search”)• OpenTSDB (distributed monitoring)• Varnish (HTTP load-balancing)• Gearman (processing off the fast path)• ... etc
  • 5. The Infrastructure 2 core 52 x 10GbE 1U Arista 7050 Arista 7050switches SFP+ ... L3 ECMP1U Arista 7048T Arista 7048T Arista 7048T Arista 7048T Thick2U Nodes 48x1GbE copper ... MTU=9000 4x10GbE SFP+2U Thin Nodes
  • 6. The Infrastructure • SuperMicro half-width motherboards • 2 x Intel L5630 (40W TDP) (16 hardware threads total) • 48GB RAM • Commodity disks (consumer grade SATA 7200rpm) • 1x2TB per “thin node” (4-in-2U) (web/app servers, gearman, etc.) • 6x2TB per “thick node” (2-in-2U) (Hadoop/HBase, elasticsearch, etc.)(86 nodes = 1PB)
  • 7. The Infrastructure• No virtualization• No oversubscription• Rack locality doesn’t matter much (sub-100µs RTT across racks)• cgroups / Linux containers to keep MapReduce under controlTwo production HBase clusters per colo• Low-latency (user-facing services)• Batch (analytics, scheduled jobs...)
  • 8. Low-Latency Cluster• Workload mostly driven by HBase• Very few scheduled MR jobs• HBase replication to batch cluster• Most queries from PHP over ThriftChallenges:• Tuning Hadoop for low latency• Taming the long latency tail• Quickly recovering from failures
  • 9. Batch Cluster• 2x more capacity• Wildly changing workload (e.g. 40K 14M QPS)• Lots of scheduled MR jobs• Frequent ad-hoc jobs (MR/Hive)• OpenTSDB’s data >800M data points added per day 133B data points totalChallenges:• Resource isolation• Tuning for larger scale
  • 10. Questions? l? Think this is coo W e’re hiring