Home
Explore
Submit Search
Upload
Login
Signup
Advertisement
Check these out next
Chapter1(hci)
Latesh Malik
A project report on chat application
Kumar Gaurav
Human computer interaction
National Institute of Technology Durgapur
Project report on blogs
Kritika Chauhan
User Interface Analysis and Design
Saqib Raza
Web Development on Web Project Report
Milind Gokhale
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
DataWorks Summit/Hadoop Summit
ChatGPT.pptx
nitindhiman53
1
of
185
Top clipped slide
Scaling Instagram
Apr. 12, 2012
•
0 likes
318 likes
×
Be the first to like this
Show More
•
189,242 views
views
×
Total views
0
On Slideshare
0
From embeds
0
Number of embeds
0
Download Now
Download to read offline
Report
Technology
Instagram 扩展性实践
iammutex
Follow
Advertisement
Advertisement
Advertisement
Recommended
The Future of Human Technology Interaction
Susan Weinschenk
3.1K views
•
81 slides
Quantum Computing
Rajasekhar Manda
41.1K views
•
25 slides
Labels and buttons on forms
Caroline Jarrett
14.1K views
•
61 slides
Pair Programming
Naresh Jain
12.5K views
•
73 slides
Etsy Activity Feeds Architecture
Dan McKinley
110.8K views
•
45 slides
Introduction to Redis
Dvir Volk
118.3K views
•
24 slides
More Related Content
Slideshows for you
(20)
Chapter1(hci)
Latesh Malik
•
17.2K views
A project report on chat application
Kumar Gaurav
•
184.9K views
Human computer interaction
National Institute of Technology Durgapur
•
5.4K views
Project report on blogs
Kritika Chauhan
•
33.3K views
User Interface Analysis and Design
Saqib Raza
•
20.2K views
Web Development on Web Project Report
Milind Gokhale
•
208.7K views
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
DataWorks Summit/Hadoop Summit
•
10.8K views
ChatGPT.pptx
nitindhiman53
•
492 views
Lecture 1: Human-Computer Interaction Introduction (2014)
Lora Aroyo
•
16.6K views
SRS FOR CHAT APPLICATION
Atul Kushwaha
•
72.1K views
Introduction to Quantum Computer
Sarun Sumriddetchkajorn
•
6.1K views
Web Development project
Manohar Prasad, PgMP®, PMP®, PMI-ACP®, CAL®, ACC®, CSP®
•
28.8K views
Prototyping
Eman Abed AlWahhab
•
8.2K views
Harsh Mathur Final Year Project Report on Restaurant Billing System
Harsh Mathur
•
16.3K views
What is ChatGPT
jeetendra mandal
•
27.9K views
HCI
Meet Shah
•
2.7K views
Topic 3 Human Computer Interaction
nur ezzaty
•
289 views
The Future of Quantum Computing
Bernard Marr
•
8.5K views
Public news droid.pptx
Shivam54500
•
636 views
Instagram human computer interaction project
Lahore Garrison University
•
1.5K views
Similar to Scaling Instagram
(20)
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
Mohit Jain
•
990 views
How a Small Team Scales Instagram
C4Media
•
4.5K views
Mike Krieger - A Brief, Rapid History of Scaling Instagram (with a tiny team)
Jean-Luc David
•
1K views
OrientDB for real & Web App development
Luca Garulli
•
8.4K views
Intro to Spark development
Spark Summit
•
10K views
What is Distributed Computing, Why we use Apache Spark
Andy Petrella
•
6.4K views
Критика "библиотечного" подхода в разработке под Android. UA Mobile 2016.
UA Mobile
•
358 views
Introduction to Spark Training
Spark Summit
•
1.4K views
Architecture by Accident
Gleicon Moraes
•
12K views
How Apache Spark fits in the Big Data landscape
Paco Nathan
•
6.9K views
Mobile Library Development - stuck between a pod and a jar file - Zan Markan ...
Codemotion
•
757 views
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
StampedeCon
•
1.3K views
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Spark Summit
•
1.6K views
What's new with Apache Spark?
Paco Nathan
•
6.9K views
SQL to NoSQL: Top 6 Questions
Mike Broberg
•
1.2K views
The Future of Computing is Distributed
Alluxio, Inc.
•
561 views
Scaling PHP apps
Matteo Moretti
•
3K views
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Paco Nathan
•
7.6K views
Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)
Maarten Balliauw
•
2.8K views
How Apache Spark fits into the Big Data landscape
Paco Nathan
•
7.6K views
Advertisement
More from iammutex
(20)
Redis深入浅出
iammutex
•
10.8K views
深入了解Redis
iammutex
•
9.1K views
NoSQL误用和常见陷阱分析
iammutex
•
2K views
MongoDB 在盛大大数据量下的应用
iammutex
•
4.2K views
8 minute MongoDB tutorial slide
iammutex
•
4.5K views
skip list
iammutex
•
9.3K views
Thoughts on Transaction and Consistency Models
iammutex
•
2K views
Rethink db&tokudb调研测试报告
iammutex
•
2.3K views
redis 适用场景与实现
iammutex
•
27K views
Introduction to couchdb
iammutex
•
1.6K views
What every data programmer needs to know about disks
iammutex
•
11.3K views
Ooredis
iammutex
•
1.2K views
Ooredis
iammutex
•
978 views
redis运维之道
iammutex
•
2.1K views
Realtime hadoopsigmod2011
iammutex
•
1.8K views
[译]No sql生态系统
iammutex
•
1.2K views
Couchdb + Membase = Couchbase
iammutex
•
6.2K views
Redis cluster
iammutex
•
1.3K views
Redis cluster
iammutex
•
6.9K views
Hadoop introduction berlin buzzwords 2011
iammutex
•
1.3K views
Recently uploaded
(20)
future-of-interoperability.pdf
FredReynolds2
•
0 views
2021 ECSE Certificate Design-Sella Septiana.pdf
Sella Serafina
•
0 views
QAM Microsoft PowerPoint جديد.pptx
ssuserd6ee01
•
0 views
Komunikasi Maritime.pdf
ChaeriahWael2
•
0 views
Lecture-7-Binary-Trees-and-Algorithms-11052023-054009pm.pptx
HamzaUsman48
•
0 views
DS Fusion CE - External Transactions.pptx
VatsalaC1
•
0 views
P4+ONOS SRv6 tutorial.pptx
tampham61268
•
0 views
Integration architectures based on Microservices, APIs and events
Sven Bernhardt
•
0 views
Brand Boost.docx
Mayur Agrawal
•
0 views
Internet Things presentation.pptx
SAMREENMUSTAFA5
•
0 views
finalppt-150606051347-lva1-app6892.pptx
AJAYVISHALRP
•
0 views
Architecting for Success: Designing Secure GCP Landing Zone for Enterprises
Bhuvaneswari Subramani
•
0 views
Carbon tech : transitional path way towards tomorrow ?碳技術:通往明天的過渡道路
Kevin Lognoné
•
0 views
Process 84% more MySQL database activity with the latest-gen Dell PowerEdge R...
Principled Technologies
•
0 views
StructuralUncertainty_example.pptx
ssuseref75f1
•
0 views
How to build an AI-powered chatbot.pdf
AnastasiaSteele10
•
0 views
How volunteering can benefit you or your organisation, with Capgemini
AbilityNet
•
0 views
Pile&Wellfoundation_ManualUpdated as on 20.5.16.pdf
DharmPalJangra1
•
0 views
FFT Scan tutorial.pptx
ssuser8f883c
•
0 views
Human robot
Butterfly education
•
0 views
Advertisement
Scaling Instagram
Scaling Instagram
AirBnB Tech Talk 2012 Mike Krieger Instagram
me -
Co-founder, Instagram - Previously: UX & Front-end @ Meebo - Stanford HCI BS/MS - @mikeyk on everything
communicating and sharing in
the real world
30+ million users
in less than 2 years
the story of
how we scaled it
a brief tangent
the beginning
Text
2 product guys
no real back-end
experience
analytics & python
@ meebo
CouchDB
CrimeDesk SF
let’s get hacking
good components in
place early on
...but were hosted
on a single machine somewhere in LA
less powerful than
my MacBook Pro
okay, we launched.
now what?
25k signups in
the first day
everything is on
fire!
best & worst
day of our lives so far
load was through
the roof
first culprit?
favicon.ico
404-ing on Django, causing
tons of errors
lesson #1: don’t
forget your favicon
real lesson #1:
most of your initial scaling problems won’t be glamorous
favicon
ulimit -n
memcached -t 4
prefork/postfork
friday rolls around
not slowing down
let’s move to
EC2.
scaling = replacing
all components of a car while driving it at 100mph
since...
“"canonical [architecture] of an
early stage startup in this era." (HighScalability.com)
Nginx & Redis & Postgres
& Django.
Nginx & HAProxy
& Redis & Memcached & Postgres & Gearman & Django.
24h Ops
our philosophy
1 simplicity
2 optimize for minimal
operational burden
3 instrument everything
walkthrough: 1 scaling the
database 2 choosing technology 3 staying nimble 4 scaling for android
1 scaling the
db
early days
django ORM, postgresql
why pg? postgis.
moved db to
its own machine
but photos kept
growing and growing...
...and only 68GB
of RAM on biggest machine in EC2
so what now?
vertical partitioning
django db routers
make it pretty easy
def db_for_read(self, model):
if app_label == 'photos': return 'photodb'
...once you untangle
all your foreign key relationships
a few months
later...
photosdb > 60GB
what now?
horizontal partitioning!
aka: sharding
“surely we’ll have
hired someone experienced before we actually need to shard”
you don’t get
to choose when scaling challenges come up
evaluated solutions
at the time,
none were up to task of being our primary DB
did in Postgres
itself
what’s painful about
sharding?
1 data retrieval
hard to know
what your primary access patterns will be w/out any usage
in most cases,
user ID
2 what happens
if one of your shards gets too big?
in range-based schemes
(like MongoDB), you split
A-H: shard0 I-Z: shard1
A-D:
shard0 E-H: shard2 I-P: shard1 Q-Z: shard2
downsides (especially on
EC2): disk IO
instead, we pre-split
many many many (thousands)
of logical shards
that map to
fewer physical ones
// 8 logical
shards on 2 machines user_id % 8 = logical shard logical shards -> physical shard map { 0: A, 1: A, 2: A, 3: A, 4: B, 5: B, 6: B, 7: B }
// 8 logical
shards on 2 4 machines user_id % 8 = logical shard logical shards -> physical shard map { 0: A, 1: A, 2: C, 3: C, 4: B, 5: B, 6: D, 7: D }
little known but
awesome PG feature: schemas
not “columns” schema
- database:
- schema: - table: - columns
machineA: shard0
photos_by_user shard1 photos_by_user shard2 photos_by_user shard3 photos_by_user
machineA:
machineA’: shard0 shard0 photos_by_user photos_by_user shard1 shard1 photos_by_user photos_by_user shard2 shard2 photos_by_user photos_by_user shard3 shard3 photos_by_user photos_by_user
machineA:
machineC: shard0 shard0 photos_by_user photos_by_user shard1 shard1 photos_by_user photos_by_user shard2 shard2 photos_by_user photos_by_user shard3 shard3 photos_by_user photos_by_user
can do this
as long as you have more logical shards than physical ones
lesson: take tech/tools you
know and try first to adapt them into a simple solution
2 which tools
where?
where to cache
/ otherwise denormalize data
we <3 redis
what happens when
a user posts a photo?
1 user uploads
photo with (optional) caption and location
2 synchronous write
to the media database for that user
3 queues!
3a if geotagged,
async worker POSTs to Solr
3b follower delivery
can’t have every
user who loads her timeline look up all their followers and then their photos
instead, everyone gets
their own list in Redis
media ID is
pushed onto a list for every person who’s following this user
Redis is awesome
for this; rapid insert, rapid subsets
when time to
render a feed, we take small # of IDs, go look up info in memcached
Redis is great
for...
data structures that
are relatively bounded
(don’t tie yourself
to a solution where your in- memory DB is your main data store)
caching complex objects where
you want to more than GET
ex: counting, sub-
ranges, testing membership
especially when Taylor Swift
posts live from the CMAs
follow graph
v1: simple DB
table (source_id, target_id, status)
who do I
follow? who follows me? do I follow X? does X follow me?
DB was busy,
so we started storing parallel version in Redis
follow_all(300 item list)
inconsistency
extra logic
so much extra
logic
exposing your support
team to the idea of cache invalidation
redesign took a
page from twitter’s book
PG can handle
tens of thousands of requests, very light memcached caching
two takeaways
1 have a
versatile complement to your core data storage (like Redis)
2 try not
to have two tools trying to do the same job
3 staying nimble
2010: 2 engineers
2011: 3 engineers
2012: 5 engineers
scarcity -> focus
engineer solutions that
you’re not constantly returning to because they broke
1 extensive unit-tests
and functional tests
2 keep it
DRY
3 loose coupling
using notifications / signals
4 do most
of our work in Python, drop to C when necessary
5 frequent code
reviews, pull requests to keep things in the ‘shared brain’
6 extensive monitoring
munin
statsd
“how is the
system right now?”
“how does this
compare to historical trends?”
scaling for android
1 million new
users in 12 hours
great tools that
enable easy read scalability
redis: slaveof <host>
<port>
our Redis framework assumes
0+ readslaves
tight iteration loops
statsd & pgfouine
know where you
can shed load if needed
(e.g. shorter feeds)
if you’re tempted
to reinvent the wheel...
don’t.
“our app servers sometimes
kernel panic under load”
...
“what if we
write a monitoring daemon...”
wait! this is
exactly what HAProxy is great at
surround yourself with
awesome advisors
culture of openness around
engineering
give back; e.g.
node2dm
focus on making
what you have better
“fast, beautiful photo
sharing”
“can we make
all of our requests 50% the time?”
staying nimble =
remind yourself of what’s important
your users around
the world don’t care that you wrote your own DB
wrapping up
unprecedented times
2 backend engineers can
scale a system to 30+ million users
key word =
simplicity
cleanest solution with
the fewest moving parts as possible
don’t over-optimize or expect
to know ahead of time how site will scale
don’t think “someone else
will join & take care of this”
will happen sooner
than you think; surround yourself with great advisors
when adding software
to stack: only if you have to, optimizing for operational simplicity
few, if any,
unsolvable scaling challenges for a social startup
have fun
Advertisement