Scaling Instagram
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Scaling Instagram

on

  • 75,469 views

Instagram 扩展性实践

Instagram 扩展性实践

Statistics

Views

Total Views
75,469
Views on SlideShare
58,549
Embed Views
16,920

Actions

Likes
224
Downloads
921
Comments
7

113 Embeds 16,920

http://blog.nosqlfan.com 4711
http://thenewstack.io 2997
http://www.feelingplace.com 2474
http://www.acenetcampus.com 1983
http://www.oschina.net 953
http://rockandsocial.com 938
http://bernardmoon.blogspot.com 394
http://simple-is-better.com 315
http://www.rockandsocial.com 261
http://www.scoop.it 221
http://join5works.com 182
http://ras 170
http://feed.feedsky.com 162
http://www.shiranyu.com 158
http://5works.dev 125
http://feedly.com 57
http://nitw.acenetcampus.com 55
http://tech.m6web.fr 47
http://localhost 47
https://twitter.com 45
http://assets.txmblr.com 44
http://acenetcampus.com 42
http://xianguo.com 41
http://uplanx.sinaapp.com 26
http://world-tourismspot.blogspot.com 25
http://bernardmoon.blogspot.in 24
http://hyungseokyeo.tumblr.com 24
http://guibaz.com 23
http://reader.youdao.com 23
http://www.bolan.com 20
http://secure.acenetcampus.com 19
http://www.simple-is-better.com 16
http://www.wojilu.com 15
http://bernardmoon.blogspot.kr 14
http://www.mefeedia.com 12
http://bernardmoon.blogspot.co.uk 11
http://marky.devel.io 11
http://www.newsblur.com 10
http://bernardmoon.blogspot.ca 10
http://blog.newitfarmer.com 10
http://webcache.googleusercontent.com 9
http://bernardmoon.blogspot.de 9
http://freshrss.zywukx.fr 9
http://www.feedspot.com 8
http://www.twylah.com 8
http://bernardmoon.blogspot.co.il 7
https://www.inoreader.com 6
http://pythoner.net 6
http://www.pearltrees.com 6
http://bernardmoon.blogspot.com.au 6
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

15 of 7 Post a comment

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Scaling Instagram Presentation Transcript

  • 1. Scaling Instagram AirBnB Tech Talk 2012 Mike Krieger Instagram
  • 2. me- Co-founder, Instagram- Previously: UX & Front-end @ Meebo- Stanford HCI BS/MS- @mikeyk on everything
  • 3. communicating andsharing in the real world
  • 4. 30+ million users in less than 2 years
  • 5. the story of how we scaled it
  • 6. a brief tangent
  • 7. the beginning
  • 8. Text
  • 9. 2 product guys
  • 10. no real back-end experience
  • 11. analytics & python @ meebo
  • 12. CouchDB
  • 13. CrimeDesk SF
  • 14. let’s get hacking
  • 15. good components in place early on
  • 16. ...but were hosted on a single machine somewhere in LA
  • 17. less powerful than my MacBook Pro
  • 18. okay, we launched. now what?
  • 19. 25k signups in the first day
  • 20. everything is on fire!
  • 21. best & worst day of our lives so far
  • 22. load was through the roof
  • 23. first culprit?
  • 24. favicon.ico
  • 25. 404-ing on Django,causing tons of errors
  • 26. lesson #1: don’t forget your favicon
  • 27. real lesson #1: most of your initial scaling problems won’t be glamorous
  • 28. favicon
  • 29. ulimit -n
  • 30. memcached -t 4
  • 31. prefork/postfork
  • 32. friday rolls around
  • 33. not slowing down
  • 34. let’s move to EC2.
  • 35. scaling = replacing allcomponents of a car while driving it at 100mph
  • 36. since...
  • 37. “"canonical [architecture]of an early stage startup in this era." (HighScalability.com)
  • 38. Nginx &Redis &Postgres &Django.
  • 39. Nginx & HAProxy &Redis & Memcached &Postgres & Gearman &Django.
  • 40. 24h Ops
  • 41. our philosophy
  • 42. 1 simplicity
  • 43. 2 optimize forminimal operational burden
  • 44. 3 instrument everything
  • 45. walkthrough:1 scaling the database2 choosing technology3 staying nimble4 scaling for android
  • 46. 1 scaling the db
  • 47. early days
  • 48. django ORM, postgresql
  • 49. why pg? postgis.
  • 50. moved db to its own machine
  • 51. but photos kept growing and growing...
  • 52. ...and only 68GB of RAM on biggest machine in EC2
  • 53. so what now?
  • 54. vertical partitioning
  • 55. django db routers make it pretty easy
  • 56. def db_for_read(self, model): if app_label == photos: return photodb
  • 57. ...once you untangle all your foreign key relationships
  • 58. a few months later...
  • 59. photosdb > 60GB
  • 60. what now?
  • 61. horizontal partitioning!
  • 62. aka: sharding
  • 63. “surely we’ll have hiredsomeone experiencedbefore we actually need to shard”
  • 64. you don’t get to choosewhen scaling challenges come up
  • 65. evaluated solutions
  • 66. at the time, none wereup to task of being our primary DB
  • 67. did in Postgres itself
  • 68. what’s painful about sharding?
  • 69. 1 data retrieval
  • 70. hard to know what yourprimary access patternswill be w/out any usage
  • 71. in most cases, user ID
  • 72. 2 what happens ifone of your shards gets too big?
  • 73. in range-based schemes (like MongoDB), you split
  • 74. A-H: shard0I-Z: shard1
  • 75. A-D: shard0E-H: shard2I-P: shard1Q-Z: shard2
  • 76. downsides (especially on EC2): disk IO
  • 77. instead, we pre-split
  • 78. many many many(thousands) of logical shards
  • 79. that map to fewer physical ones
  • 80. // 8 logical shards on 2 machinesuser_id % 8 = logical shardlogical shards -> physical shard map{ 0: A, 1: A, 2: A, 3: A, 4: B, 5: B, 6: B, 7: B}
  • 81. // 8 logical shards on 2 4 machinesuser_id % 8 = logical shardlogical shards -> physical shard map{ 0: A, 1: A, 2: C, 3: C, 4: B, 5: B, 6: D, 7: D}
  • 82. little known but awesome PG feature: schemas
  • 83. not “columns” schema
  • 84. - database: - schema: - table: - columns
  • 85. machineA: shard0 photos_by_user shard1 photos_by_user shard2 photos_by_user shard3 photos_by_user
  • 86. machineA: machineA’: shard0 shard0 photos_by_user photos_by_user shard1 shard1 photos_by_user photos_by_user shard2 shard2 photos_by_user photos_by_user shard3 shard3 photos_by_user photos_by_user
  • 87. machineA: machineC: shard0 shard0 photos_by_user photos_by_user shard1 shard1 photos_by_user photos_by_user shard2 shard2 photos_by_user photos_by_user shard3 shard3 photos_by_user photos_by_user
  • 88. can do this as long asyou have more logical shards than physical ones
  • 89. lesson: take tech/toolsyou know and try first toadapt them into a simple solution
  • 90. 2 which tools where?
  • 91. where to cache /otherwise denormalize data
  • 92. we <3 redis
  • 93. what happens when a user posts a photo?
  • 94. 1 user uploads photowith (optional) caption and location
  • 95. 2 synchronous write tothe media database for that user
  • 96. 3 queues!
  • 97. 3a if geotagged, async worker POSTs to Solr
  • 98. 3b follower delivery
  • 99. can’t have every user who loads her timelinelook up all their followers and then their photos
  • 100. instead, everyone gets their own list in Redis
  • 101. media ID is pushed onto a list for every personwho’s following this user
  • 102. Redis is awesome forthis; rapid insert, rapid subsets
  • 103. when time to render afeed, we take small # of IDs, go look up info in memcached
  • 104. Redis is great for...
  • 105. data structures that are relatively bounded
  • 106. (don’t tie yourself to a solution where your in-memory DB is your main data store)
  • 107. caching complex objectswhere you want to more than GET
  • 108. ex: counting, sub- ranges, testing membership
  • 109. especially when TaylorSwift posts live from the CMAs
  • 110. follow graph
  • 111. v1: simple DB table(source_id, target_id, status)
  • 112. who do I follow? who follows me? do I follow X?does X follow me?
  • 113. DB was busy, so westarted storing parallel version in Redis
  • 114. follow_all(300 item list)
  • 115. inconsistency
  • 116. extra logic
  • 117. so much extra logic
  • 118. exposing your support team to the idea of cache invalidation
  • 119. redesign took a page from twitter’s book
  • 120. PG can handle tens ofthousands of requests, very light memcached caching
  • 121. two takeaways
  • 122. 1 have a versatilecomplement to your core data storage (like Redis)
  • 123. 2 try not to have twotools trying to do the same job
  • 124. 3 staying nimble
  • 125. 2010: 2 engineers
  • 126. 2011: 3 engineers
  • 127. 2012: 5 engineers
  • 128. scarcity -> focus
  • 129. engineer solutions that you’re not constantly returning to because they broke
  • 130. 1 extensive unit-tests and functional tests
  • 131. 2 keep it DRY
  • 132. 3 loose coupling using notifications / signals
  • 133. 4 do most of our work inPython, drop to C when necessary
  • 134. 5 frequent code reviews, pull requests to keep things in the ‘shared brain’
  • 135. 6 extensive monitoring
  • 136. munin
  • 137. statsd
  • 138. “how is the system right now?”
  • 139. “how does this compare to historical trends?”
  • 140. scaling for android
  • 141. 1 million new users in 12 hours
  • 142. great tools that enable easy read scalability
  • 143. redis: slaveof <host> <port>
  • 144. our Redis frameworkassumes 0+ readslaves
  • 145. tight iteration loops
  • 146. statsd & pgfouine
  • 147. know where you canshed load if needed
  • 148. (e.g. shorter feeds)
  • 149. if you’re tempted toreinvent the wheel...
  • 150. don’t.
  • 151. “our app serverssometimes kernel panic under load”
  • 152. ...
  • 153. “what if we write amonitoring daemon...”
  • 154. wait! this is exactly what HAProxy is great at
  • 155. surround yourself with awesome advisors
  • 156. culture of opennessaround engineering
  • 157. give back; e.g. node2dm
  • 158. focus on making what you have better
  • 159. “fast, beautiful photo sharing”
  • 160. “can we make all of ourrequests 50% the time?”
  • 161. staying nimble = remind yourself of what’s important
  • 162. your users around theworld don’t care that you wrote your own DB
  • 163. wrapping up
  • 164. unprecedented times
  • 165. 2 backend engineerscan scale a system to 30+ million users
  • 166. key word = simplicity
  • 167. cleanest solution with the fewest moving parts as possible
  • 168. don’t over-optimize orexpect to know ahead of time how site will scale
  • 169. don’t think “someoneelse will join & take care of this”
  • 170. will happen sooner than you think; surround yourself with great advisors
  • 171. when adding software tostack: only if you have to,optimizing for operational simplicity
  • 172. few, if any, unsolvablescaling challenges for a social startup
  • 173. have fun