Home
Explore
Submit Search
Upload
Login
Signup
Check these out next
Splunk introduction
Truong Cuong
Javascript
Manav Prasad
Security Vulnerabilities
Marius Vorster
Facebook Technology Stack
Husain Ali
Flutter Tutorial For Beginners | Edureka
Edureka!
jQuery for beginners
Arulmurugan Rajaraman
WEB I - 01 - Introduction to Web Development
Randy Connolly
Advanced Web Development
Robert J. Stein
1
of
185
Top clipped slide
Scaling Instagram
Apr. 12, 2012
•
0 likes
318 likes
×
Be the first to like this
Show More
•
189,246 views
views
×
Total views
0
On Slideshare
0
From embeds
0
Number of embeds
0
Download Now
Download to read offline
Report
Technology
Instagram 扩展性实践
iammutex
Follow
Recommended
Javascript
mussawir20
4.2K views
•
71 slides
Web Scraping Basics
Kyle Banerjee
3.2K views
•
63 slides
A Basic Django Introduction
Ganga Ram
18.9K views
•
33 slides
Introduction to php
Taha Malampatti
34.2K views
•
32 slides
Flutter
Himanshu Singh
23.3K views
•
26 slides
Introduction to MongoDB
Ravi Teja
51.3K views
•
20 slides
More Related Content
Slideshows for you
(20)
Splunk introduction
Truong Cuong
•
605 views
Javascript
Manav Prasad
•
46.8K views
Security Vulnerabilities
Marius Vorster
•
1K views
Facebook Technology Stack
Husain Ali
•
38.6K views
Flutter Tutorial For Beginners | Edureka
Edureka!
•
4.1K views
jQuery for beginners
Arulmurugan Rajaraman
•
34.5K views
WEB I - 01 - Introduction to Web Development
Randy Connolly
•
19K views
Advanced Web Development
Robert J. Stein
•
11.4K views
Jhon the ripper
Merve Karabudağ
•
1.7K views
Web development
RaziyaChoudhary
•
569 views
Front end web development
viveksewa
•
9.4K views
Django by rj
Shree M.L.Kakadiya MCA mahila college, Amreli
•
2.9K views
Python/Django Training
University of Technology
•
17K views
Front-end development introduction (HTML, CSS). Part 1
Oleksii Prohonnyi
•
15.1K views
The Future Of Web Frameworks
Matt Raible
•
12.1K views
Different types of mobile apps
MVM Infotech Co. Ltd.
•
4.7K views
Bug Bounty - Play For Money
Shubham Gupta
•
1.8K views
Introduction to JavaScript
Andres Baravalle
•
22.2K views
Introduction to burp suite
Utkarsh Bhargava
•
1.6K views
Web application architecture
Tejaswini Deshpande
•
12.4K views
Similar to Scaling Instagram
(20)
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
Mohit Jain
•
990 views
How a Small Team Scales Instagram
C4Media
•
4.5K views
Mike Krieger - A Brief, Rapid History of Scaling Instagram (with a tiny team)
Jean-Luc David
•
1K views
OrientDB for real & Web App development
Luca Garulli
•
8.4K views
Intro to Spark development
Spark Summit
•
10K views
What is Distributed Computing, Why we use Apache Spark
Andy Petrella
•
6.4K views
Критика "библиотечного" подхода в разработке под Android. UA Mobile 2016.
UA Mobile
•
358 views
Introduction to Spark Training
Spark Summit
•
1.4K views
Architecture by Accident
Gleicon Moraes
•
12K views
How Apache Spark fits in the Big Data landscape
Paco Nathan
•
6.9K views
Mobile Library Development - stuck between a pod and a jar file - Zan Markan ...
Codemotion
•
757 views
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
StampedeCon
•
1.3K views
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Spark Summit
•
1.6K views
What's new with Apache Spark?
Paco Nathan
•
6.9K views
SQL to NoSQL: Top 6 Questions
Mike Broberg
•
1.2K views
The Future of Computing is Distributed
Alluxio, Inc.
•
561 views
Scaling PHP apps
Matteo Moretti
•
3K views
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Paco Nathan
•
7.6K views
Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)
Maarten Balliauw
•
2.8K views
How Apache Spark fits into the Big Data landscape
Paco Nathan
•
7.6K views
More from iammutex
(20)
Redis深入浅出
iammutex
•
10.8K views
深入了解Redis
iammutex
•
9.1K views
NoSQL误用和常见陷阱分析
iammutex
•
2K views
MongoDB 在盛大大数据量下的应用
iammutex
•
4.2K views
8 minute MongoDB tutorial slide
iammutex
•
4.5K views
skip list
iammutex
•
9.3K views
Thoughts on Transaction and Consistency Models
iammutex
•
2K views
Rethink db&tokudb调研测试报告
iammutex
•
2.3K views
redis 适用场景与实现
iammutex
•
27K views
Introduction to couchdb
iammutex
•
1.6K views
What every data programmer needs to know about disks
iammutex
•
11.3K views
Ooredis
iammutex
•
1.2K views
Ooredis
iammutex
•
978 views
redis运维之道
iammutex
•
2.1K views
Realtime hadoopsigmod2011
iammutex
•
1.8K views
[译]No sql生态系统
iammutex
•
1.2K views
Couchdb + Membase = Couchbase
iammutex
•
6.2K views
Redis cluster
iammutex
•
1.3K views
Redis cluster
iammutex
•
6.9K views
Hadoop introduction berlin buzzwords 2011
iammutex
•
1.3K views
Recently uploaded
(20)
Ammonia Gas removal in Fertilizer Industry Control Room Ventilation: A Case S...
Aqozatechh
•
0 views
TenT-Day05.pptx
JohanMyburgh15
•
0 views
Digital Transformation - M2M to IoT
Vishal Sharma
•
0 views
Ch 1.pdf
MuhammadAsif1069
•
0 views
Futures for the Web - Reclaim 2023.pptx
Bryan Alexander
•
0 views
OpenACC and Open Hackathons Monthly Highlights May 2023.pdf
OpenACC
•
0 views
Video Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning
Alpen-Adria-Universität
•
0 views
AlamofirebyJun.pdf
JUNSHIN8
•
0 views
NEARSHORE DEVELOPMENT CENTER
Prime Group
•
0 views
Nanotechnology Infographics.pdf
seed25
•
0 views
What is Philosophy in Technology?
Roman Krzanowski
•
0 views
01class_object_references & 02Generation_GC.pptx
ssuser95922e
•
0 views
Regression Testing and Its Importance and Process.pdf
RohitBhandari66
•
0 views
Ch 2.pdf
MuhammadAsif1069
•
0 views
EMA Policy 0070 Relaunch - Changes and how GenInvo can help
geninvomarketing
•
0 views
2008-03-04 - Geoprocessing with ArcGIS.pdf
ssuseref75f1
•
0 views
4.1 Modeling Data Relationships.pdf
Felipelipilef2nd
•
0 views
TenT-Day08.pptx
JohanMyburgh15
•
0 views
From Project to Product: Leaders, Here's What It Means to You
Cprime
•
0 views
Vin secure solutions PPT (1).pdf
vin secure solutions
•
0 views
Scaling Instagram
Scaling Instagram
AirBnB Tech Talk 2012 Mike Krieger Instagram
me -
Co-founder, Instagram - Previously: UX & Front-end @ Meebo - Stanford HCI BS/MS - @mikeyk on everything
communicating and sharing in
the real world
30+ million users
in less than 2 years
the story of
how we scaled it
a brief tangent
the beginning
Text
2 product guys
no real back-end
experience
analytics & python
@ meebo
CouchDB
CrimeDesk SF
let’s get hacking
good components in
place early on
...but were hosted
on a single machine somewhere in LA
less powerful than
my MacBook Pro
okay, we launched.
now what?
25k signups in
the first day
everything is on
fire!
best & worst
day of our lives so far
load was through
the roof
first culprit?
favicon.ico
404-ing on Django, causing
tons of errors
lesson #1: don’t
forget your favicon
real lesson #1:
most of your initial scaling problems won’t be glamorous
favicon
ulimit -n
memcached -t 4
prefork/postfork
friday rolls around
not slowing down
let’s move to
EC2.
scaling = replacing
all components of a car while driving it at 100mph
since...
“"canonical [architecture] of an
early stage startup in this era." (HighScalability.com)
Nginx & Redis & Postgres
& Django.
Nginx & HAProxy
& Redis & Memcached & Postgres & Gearman & Django.
24h Ops
our philosophy
1 simplicity
2 optimize for minimal
operational burden
3 instrument everything
walkthrough: 1 scaling the
database 2 choosing technology 3 staying nimble 4 scaling for android
1 scaling the
db
early days
django ORM, postgresql
why pg? postgis.
moved db to
its own machine
but photos kept
growing and growing...
...and only 68GB
of RAM on biggest machine in EC2
so what now?
vertical partitioning
django db routers
make it pretty easy
def db_for_read(self, model):
if app_label == 'photos': return 'photodb'
...once you untangle
all your foreign key relationships
a few months
later...
photosdb > 60GB
what now?
horizontal partitioning!
aka: sharding
“surely we’ll have
hired someone experienced before we actually need to shard”
you don’t get
to choose when scaling challenges come up
evaluated solutions
at the time,
none were up to task of being our primary DB
did in Postgres
itself
what’s painful about
sharding?
1 data retrieval
hard to know
what your primary access patterns will be w/out any usage
in most cases,
user ID
2 what happens
if one of your shards gets too big?
in range-based schemes
(like MongoDB), you split
A-H: shard0 I-Z: shard1
A-D:
shard0 E-H: shard2 I-P: shard1 Q-Z: shard2
downsides (especially on
EC2): disk IO
instead, we pre-split
many many many (thousands)
of logical shards
that map to
fewer physical ones
// 8 logical
shards on 2 machines user_id % 8 = logical shard logical shards -> physical shard map { 0: A, 1: A, 2: A, 3: A, 4: B, 5: B, 6: B, 7: B }
// 8 logical
shards on 2 4 machines user_id % 8 = logical shard logical shards -> physical shard map { 0: A, 1: A, 2: C, 3: C, 4: B, 5: B, 6: D, 7: D }
little known but
awesome PG feature: schemas
not “columns” schema
- database:
- schema: - table: - columns
machineA: shard0
photos_by_user shard1 photos_by_user shard2 photos_by_user shard3 photos_by_user
machineA:
machineA’: shard0 shard0 photos_by_user photos_by_user shard1 shard1 photos_by_user photos_by_user shard2 shard2 photos_by_user photos_by_user shard3 shard3 photos_by_user photos_by_user
machineA:
machineC: shard0 shard0 photos_by_user photos_by_user shard1 shard1 photos_by_user photos_by_user shard2 shard2 photos_by_user photos_by_user shard3 shard3 photos_by_user photos_by_user
can do this
as long as you have more logical shards than physical ones
lesson: take tech/tools you
know and try first to adapt them into a simple solution
2 which tools
where?
where to cache
/ otherwise denormalize data
we <3 redis
what happens when
a user posts a photo?
1 user uploads
photo with (optional) caption and location
2 synchronous write
to the media database for that user
3 queues!
3a if geotagged,
async worker POSTs to Solr
3b follower delivery
can’t have every
user who loads her timeline look up all their followers and then their photos
instead, everyone gets
their own list in Redis
media ID is
pushed onto a list for every person who’s following this user
Redis is awesome
for this; rapid insert, rapid subsets
when time to
render a feed, we take small # of IDs, go look up info in memcached
Redis is great
for...
data structures that
are relatively bounded
(don’t tie yourself
to a solution where your in- memory DB is your main data store)
caching complex objects where
you want to more than GET
ex: counting, sub-
ranges, testing membership
especially when Taylor Swift
posts live from the CMAs
follow graph
v1: simple DB
table (source_id, target_id, status)
who do I
follow? who follows me? do I follow X? does X follow me?
DB was busy,
so we started storing parallel version in Redis
follow_all(300 item list)
inconsistency
extra logic
so much extra
logic
exposing your support
team to the idea of cache invalidation
redesign took a
page from twitter’s book
PG can handle
tens of thousands of requests, very light memcached caching
two takeaways
1 have a
versatile complement to your core data storage (like Redis)
2 try not
to have two tools trying to do the same job
3 staying nimble
2010: 2 engineers
2011: 3 engineers
2012: 5 engineers
scarcity -> focus
engineer solutions that
you’re not constantly returning to because they broke
1 extensive unit-tests
and functional tests
2 keep it
DRY
3 loose coupling
using notifications / signals
4 do most
of our work in Python, drop to C when necessary
5 frequent code
reviews, pull requests to keep things in the ‘shared brain’
6 extensive monitoring
munin
statsd
“how is the
system right now?”
“how does this
compare to historical trends?”
scaling for android
1 million new
users in 12 hours
great tools that
enable easy read scalability
redis: slaveof <host>
<port>
our Redis framework assumes
0+ readslaves
tight iteration loops
statsd & pgfouine
know where you
can shed load if needed
(e.g. shorter feeds)
if you’re tempted
to reinvent the wheel...
don’t.
“our app servers sometimes
kernel panic under load”
...
“what if we
write a monitoring daemon...”
wait! this is
exactly what HAProxy is great at
surround yourself with
awesome advisors
culture of openness around
engineering
give back; e.g.
node2dm
focus on making
what you have better
“fast, beautiful photo
sharing”
“can we make
all of our requests 50% the time?”
staying nimble =
remind yourself of what’s important
your users around
the world don’t care that you wrote your own DB
wrapping up
unprecedented times
2 backend engineers can
scale a system to 30+ million users
key word =
simplicity
cleanest solution with
the fewest moving parts as possible
don’t over-optimize or expect
to know ahead of time how site will scale
don’t think “someone else
will join & take care of this”
will happen sooner
than you think; surround yourself with great advisors
when adding software
to stack: only if you have to, optimizing for operational simplicity
few, if any,
unsolvable scaling challenges for a social startup
have fun