4. 82
%
of users are
outside of
the U.S
4 domestic regions today. Europe region will come online later this year.
Facebook Scale
5. Facebook Stats
• 1 billion users
• 350+ million photos added per day
• 4.2 billion likes, posts and comments per day
• 140+ billion friend connections
• 240+ billion photos
• 17 billion check-ins
6. Cost and Efficiency
•From our 10-Q filed with the SEC in October 2012:
•“The first nine months of 2012 ... $1.0 billion for capital expenditures
related to the purchase of servers, networking equipment, storage
infrastructure, and the construction of data centers.”
•At this size, we spend a lot of time thinking about efficiency
and costs.
7. Architecture
Service Cluster Back-End Cluster
Front-End Cluster
Web
250
racks
Ads 30 racks
Cache (~144TB)
Search Photos Msg Others UDB ADS-DB Tao Leader
Multifeed 9 racks
Other small services
9. Multifeed rack
• The rack is our unit of capacity
• All 40 servers work together
• Leaf + agg code runs on all servers
• Leaf has most of the the RAM
• Aggregator uses most of the CPU
• Lots of network BW within the rack
Leaf Aggregator
AL
AL
AL
.
.
.
.
10. Life of a “hit”
Front-End Back-End
Web
MC
MC
MC
MC
Ads
Database
L
Feed
agg
request
starts
Time
request
completes
L
L
L
L
L
11. Standard
Systems
I
Web
III
Database
IV
Hadoop
V
Haystack
VI
Feed
CPU
High
2 x E5-2670
Med
2 x X5650
Low
1 x L5630
High
2 x E5-2660
Memory
Low
16GB
High
144GB
Medium
48GB
Low
18GB
High
144GB
Disk
Low
250GB
High IOPS
3.2 TB Flash
High
12 x 3TB SATA
High
12 x 3TB SATA
Medium
2TB SATA
Services Web, Chat Database Hadoop Photos, Video
Multifeed,
Search, Ads
Five Standard Servers
12. Five Server Types
Advantages:
• Volume pricing
• Re-purposing
• Easier operations - simpler repairs, drivers, DC headcount
• New servers allocated in hours rather than months
Drawbacks:
• 40 major services; 200 minor ones - not all fit perfectly
• Service needs change over time.
14. Server Processors
• Servers in datacenters use processors that were designed for desktop
computers.
•Intel and AMD have dominated this market with big x86 processors.
15. Mobile Processors
• Smaller processors for smart phones will pass two criteria by 2014:
• 64 bit instructions
• High clock speed - ~2.4 GHz
•It is now reasonable to consider ARM, Atom and even MIPS processors
for big compute jobs.
19. The Problem
• Big processors provide a cost advantage by amortizing fixed costs in the
servers.
•If all other costs remain the same then wimpy cores (ARM, MIPS, Atom)
will effectively triple the price of fixed resources:
• Rack, chassis, disk, RAM, NIC, etc.
20. Our Solution: Group Hug
•Facebook is driving a solutions through the Open Compute initiative:
• Group Hug server board:
• Allows up to 10 individual compute boards.
• Single Processor PCIE-like cards
• A 1GB interfaces mux’ed up to a 10GB NIC
• No drives, flash, or prehephrials
• ==> 3 to 5x the processors compared to a dual-socket system
• ==> About the same throughput and power.
22. Disaggregated Rack Challenge
• Can we build hardware that will fit more services and still do
well in terms of serviceability and cost?
• Can we build hardware that will grow with services over time?
• What might it look like to support Group Hug?
23. Server/Service Fit - across services
TYPE-6 server
CPU
Other Service A
RAM
MultiFeed
CPU
RAM
WASTED CPU
RESOURCE
TYPE-6 server
24. Server/Service Fit - over time
TYPE-6 server
CPU
Year 2 - more CPU needed
RAM
Year 1
CPU
RAM
NOT ENOUGH CPU
TYPE-6 server
25. Building blocks:
• CPU
• RAM (key/value pairs)
• Disk IOPS
• Disk space
• Flash IOPS
• Flash space
Common resource pairs:
In-Rack Resources
26. Disaggregated Rack
How can we build hardware that is highly configurable
and re-configurable but still cost effective?
27. A rack of multifeed servers...
COMPUTE
RAM
STORAGE
Type-6 Server
Network Switch
Type-6 Server
Type-6 Server
Type-6 Server
=
>
40 Feed servers per rack
each server with:
2 x E5-2660
144GB RAM
2TB hard drives
760GB of flash
* We assume full line-rate
network within
the rack.
5.8
TB
80 TB
.
.
.
FLASH30
TB
Type-6 Server
80 processors
640 cores
28. Compute
• Standard Server
• 2 processors
• 8 or 16 DIMM slots
• no hard drive - small flash boot
partition.
• big NIC - 10 Gbps or more
• Group Hug
• 10 individual single-proc servers
• A few DIMMS
• no hard drive - small flash boot
partition.
• smaller NICs to 10 GBps
29. Ram Sled
•Hardware
• 128GB to 512GB
• compute: FPGA, ASIC, mobile processor or desktop processor
•Performance
• 450k to 1 million key/value gets/sec
•Cost
• Excluding RAM cost: $500 to $700 or a few dollars per GB
30. Storage Sled (Knox)
•Hardware
• 15 drives
• Replace SAS expander w/ small server
•Performance
• 3k IOPS
•Cost
• Excluding drives: $500 to $700 or less
than $0.01 per GB