SlideShare a Scribd company logo
Facebook
Haystack
Finding a needle in Haystack: Facebook's photo
storage. An Analysis of Facebook Photo Caching
PWL SF - June 30, 2015
Sargun Dhillon
@Sargun
Agenda
•The Haystack problem
•Design & Architecture
•Takeaways
What is
Haystack?
Storage System
Needle Storage & Serving
For Facebook
Workload
•Write Once
•Read Often
•Delete Rarely
•Write Once
•Read Often
•Delete Rarely
•Write Once
•Read Often
•Delete Rarely
Why Haystack
as a Paper?
Why Not?
Really BIG dataset
20+
Petabytes
120 million
new photos a day
Simple
Clever
optimizations
Where did
Haystack
come from?
History
Network Attached
Storage (NAS)
mounted over NFS
CDNs for low-latency
Pareto Distribution
Theoretical Image Access
CDF
Pareto
Fetches are expensive
• Multiple seeks:
• Directory metadata
• Inode
• File contents
• File metadata is 10s of kilobytes
• Long-tail uncachable
Decision to Build
• Existing systems unable to be adapted
• Hadoop
• MySQL
• Traditional NAS appliances
• Don’t need to solve for the kitchen sink
• Log data
• Development work
Haystack
Design
Design
Constraints
High Throughput
&
Low Latency
Cost-effective*
*CDNs are Expensive
Simple
Separation of concerns
•Haystack Store
•Haystack Cache
•Haystack Directory
Read Path
Write Path
Store
Concerns:
Read, Write,
Delete Needles
How much
cache?
There is no right
ratio
Just enough
memory for
metadata
Store little
metadata
Volume Layer
Volume layer above filesystem
Volumes are
append-only
Arranged Into
Logical Volumes
Append-only Data File
Indexing
10 bytes of
metadata per
photo
2-bit overhead
Read by
<Key, Alt Key,
Cookie>
Checks cookie for
security
Modifications are
appends
Deletions change
offset to 0
Compaction for
reclamation
Batch Upload
Similar to
Bitcask & CDB
Uses XFS
Volumes
preallocated
Fault Tolerance
Pitchfork:
generates artificial
load
Checksum verified
on compaction
Directory marks
volumes offline
Recovery: Rsync*
* With QoS
Restore:
Multiple Replicas
The Hardware
OCP:
Open Compute
Project
Open Vault: KNOX
12x3TB SATA in
RAID6
RAID Controller
with NVRAM
Only Writes
Cached
Good at reads xor
writes not both
Read
Throughput
Avg. Read
Latency
Write
Throughput
Avg. Write
Latency
Only
Reads 770.6 33.2 - -
Only
Writes - - 6099.4 4.9
Multiwrite
(x16) - - 10843.8 43.9
Reads
And
Writes
718.1 41.6 232.0 11.9
Latencies Table
Haystress
“Known Unknowns”
and
“Unknown Unknowns”
Haystack Store
• Responsibilities:
• Read needles
• Write needles
• Append-only
• O(1) read cost*
*Usually
Cache
Concerns:
Caching
Organized as
DHT
Not just an LRU
Two caching rules
Request isn’t from
CDN
Request is to write-
enabled store
Haystack Cache
•Simple cache
•Optimizations, given
access patterns
Directory
The Rug
Concerns:
Mapping, Load
Balancing, CDN
Management,
Directing
Maps logical
volumes to
physical machines
Mapping based on
business rules
Load balances
reads
Directs writes to
relevant logical
volume
Directs reads
away from CDN
Directory
• Manages capacity
• Manages volume mapping
• Manages image mapping
• Manages CDN
Tying it
together
Writes
Write-path
• Involves
• Store
• Directory
• Smart client
Reads
Detour: URLs
Directory uses
URLs for directing
http://⟨CDN⟩/⟨Cache⟩/
⟨Machine id⟩/⟨Logical
volume, Photo⟩
URL Makeup
• CDN
• Cache Node
• Machine ID
• Logical Volume ID
• Photo ID & Alt ID
• Cookie
Strips URL 

left-to-right
Read-path
• Involves:
• Directory
• Cache
• CDN
Insights
Narrow Scope
“That simplicity let us build and
deploy a working system in a few
months instead of a few years.”
Sometimes you’re
solving the wrong
problem
Smart Clients &
Ecosystem Control
Simple
Optimizations
Open Source
Implementation:
WeedFS
Thanks & Qs

More Related Content

What's hot

MongoDB Best Practices in AWS
MongoDB Best Practices in AWS MongoDB Best Practices in AWS
MongoDB Best Practices in AWS
Chris Harris
 
GlusterFS Architecture - June 30, 2011 Meetup
GlusterFS Architecture - June 30, 2011 MeetupGlusterFS Architecture - June 30, 2011 Meetup
GlusterFS Architecture - June 30, 2011 Meetup
GlusterFS
 
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalSizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Vigyan Jain
 
Efficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajoEfficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajo
Hyunsik Choi
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
DataWorks Summit
 
MongoDB Case Study at NoSQL Now 2012
MongoDB Case Study at NoSQL Now 2012MongoDB Case Study at NoSQL Now 2012
MongoDB Case Study at NoSQL Now 2012
Sean Laurent
 

What's hot (20)

MongoDB Best Practices in AWS
MongoDB Best Practices in AWS MongoDB Best Practices in AWS
MongoDB Best Practices in AWS
 
Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015
 
What'sNnew in 3.0 Webinar
What'sNnew in 3.0 WebinarWhat'sNnew in 3.0 Webinar
What'sNnew in 3.0 Webinar
 
GlusterFS Architecture - June 30, 2011 Meetup
GlusterFS Architecture - June 30, 2011 MeetupGlusterFS Architecture - June 30, 2011 Meetup
GlusterFS Architecture - June 30, 2011 Meetup
 
Big data with HDFS and Mapreduce
Big data  with HDFS and MapreduceBig data  with HDFS and Mapreduce
Big data with HDFS and Mapreduce
 
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalSizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
 
Capacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB ClusterCapacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB Cluster
 
Sizing your Content Databases: Understanding the Limits
Sizing your Content Databases: Understanding the LimitsSizing your Content Databases: Understanding the Limits
Sizing your Content Databases: Understanding the Limits
 
MongoDB Internals
MongoDB InternalsMongoDB Internals
MongoDB Internals
 
Tachyon workshop 2015-07-19
Tachyon workshop 2015-07-19Tachyon workshop 2015-07-19
Tachyon workshop 2015-07-19
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
 
Efficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajoEfficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajo
 
Pacemaker hadoop infrastructure and soft serve experience
Pacemaker   hadoop infrastructure and soft serve experiencePacemaker   hadoop infrastructure and soft serve experience
Pacemaker hadoop infrastructure and soft serve experience
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
 
MongodB Internals
MongodB InternalsMongodB Internals
MongodB Internals
 
MongoDB Administration 101
MongoDB Administration 101MongoDB Administration 101
MongoDB Administration 101
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 
Tachyon meetup slides.
Tachyon meetup slides.Tachyon meetup slides.
Tachyon meetup slides.
 
MongoDB Case Study at NoSQL Now 2012
MongoDB Case Study at NoSQL Now 2012MongoDB Case Study at NoSQL Now 2012
MongoDB Case Study at NoSQL Now 2012
 
Optimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopOptimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for Hadoop
 

Viewers also liked

Imágenes para conectarce a internet
Imágenes para conectarce a internetImágenes para conectarce a internet
Imágenes para conectarce a internet
edwinfreyle
 
Little Rock Nine Picture Edition
Little Rock Nine Picture EditionLittle Rock Nine Picture Edition
Little Rock Nine Picture Edition
Gablae
 

Viewers also liked (20)

Why Distributed Databases?
Why Distributed Databases?Why Distributed Databases?
Why Distributed Databases?
 
DC/OS 1.8 Container Networking
DC/OS 1.8 Container NetworkingDC/OS 1.8 Container Networking
DC/OS 1.8 Container Networking
 
Erlang User Conference 2016: Container Networking: A Field Report
Erlang User Conference 2016: Container Networking: A Field ReportErlang User Conference 2016: Container Networking: A Field Report
Erlang User Conference 2016: Container Networking: A Field Report
 
Building the Glue for Service Discovery & Load Balancing Microservices
Building the Glue for Service Discovery & Load Balancing MicroservicesBuilding the Glue for Service Discovery & Load Balancing Microservices
Building the Glue for Service Discovery & Load Balancing Microservices
 
Erlang/OTP in Riak
Erlang/OTP in RiakErlang/OTP in Riak
Erlang/OTP in Riak
 
Lying, Cheating, and Winning with Containers in Networking
Lying, Cheating, and Winning with Containers in NetworkingLying, Cheating, and Winning with Containers in Networking
Lying, Cheating, and Winning with Containers in Networking
 
Erlang containers
Erlang containersErlang containers
Erlang containers
 
Intro to Databases
Intro to DatabasesIntro to Databases
Intro to Databases
 
Imágenes para conectarce a internet
Imágenes para conectarce a internetImágenes para conectarce a internet
Imágenes para conectarce a internet
 
currículo vitae
currículo vitaecurrículo vitae
currículo vitae
 
ravi namboori-Iaas
 ravi namboori-Iaas ravi namboori-Iaas
ravi namboori-Iaas
 
rahmat_2014
rahmat_2014rahmat_2014
rahmat_2014
 
THE VALUE OF GLOBAL CAREERS TO THE UK
THE VALUE OF GLOBAL CAREERS TO THE UKTHE VALUE OF GLOBAL CAREERS TO THE UK
THE VALUE OF GLOBAL CAREERS TO THE UK
 
Zonarpresentation
ZonarpresentationZonarpresentation
Zonarpresentation
 
Custom kitchen remodels
Custom kitchen remodelsCustom kitchen remodels
Custom kitchen remodels
 
Little Rock Nine Picture Edition
Little Rock Nine Picture EditionLittle Rock Nine Picture Edition
Little Rock Nine Picture Edition
 
Подбор ключевых слов
Подбор ключевых словПодбор ключевых слов
Подбор ключевых слов
 
Mc luhan copy 2
Mc luhan copy 2Mc luhan copy 2
Mc luhan copy 2
 
RAMpresentation
RAMpresentationRAMpresentation
RAMpresentation
 
Roof Replacement
Roof ReplacementRoof Replacement
Roof Replacement
 

Similar to Papers We Love Too, June 2015: Haystack

M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
Edward Capriolo
 
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
 Improving Apache Spark by Taking Advantage of Disaggregated Architecture Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Databricks
 
MongoDB Administration 20110922
MongoDB Administration 20110922MongoDB Administration 20110922
MongoDB Administration 20110922
radiocats
 
Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecture
saipriyacoool
 

Similar to Papers We Love Too, June 2015: Haystack (20)

Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and Basics
 
Find a needle in Haystack: Facebook's storage system
Find a needle in Haystack: Facebook's storage systemFind a needle in Haystack: Facebook's storage system
Find a needle in Haystack: Facebook's storage system
 
Webinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyWebinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case Study
 
In-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesIn-memory Data Management Trends & Techniques
In-memory Data Management Trends & Techniques
 
Hadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldHadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the field
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
 Improving Apache Spark by Taking Advantage of Disaggregated Architecture Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
 
SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)
 
Move your on prem data to a lake in a Lake in Cloud
Move your on prem data to a lake in a Lake in CloudMove your on prem data to a lake in a Lake in Cloud
Move your on prem data to a lake in a Lake in Cloud
 
High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2
 
MongoDB Administration 20110922
MongoDB Administration 20110922MongoDB Administration 20110922
MongoDB Administration 20110922
 
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation
 
Dissecting Scalable Database Architectures
Dissecting Scalable Database ArchitecturesDissecting Scalable Database Architectures
Dissecting Scalable Database Architectures
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Introduction to Google BigQuery
Introduction to Google BigQueryIntroduction to Google BigQuery
Introduction to Google BigQuery
 
Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecture
 
Drop acid
Drop acidDrop acid
Drop acid
 
Hadoop hbase introduction
Hadoop hbase introductionHadoop hbase introduction
Hadoop hbase introduction
 

Recently uploaded

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 

Recently uploaded (20)

Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 

Papers We Love Too, June 2015: Haystack