Scaling Instagram
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Scaling Instagram

  • 76,492 views
Uploaded on

Instagram 扩展性实践

Instagram 扩展性实践

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
76,492
On Slideshare
59,126
From Embeds
17,366
Number of Embeds
113

Actions

Shares
Downloads
930
Comments
7
Likes
225

Embeds 17,366

http://blog.nosqlfan.com 4,743
http://thenewstack.io 3,363
http://www.feelingplace.com 2,496
http://www.acenetcampus.com 1,983
http://www.oschina.net 953
http://rockandsocial.com 948
http://bernardmoon.blogspot.com 394
http://simple-is-better.com 315
http://www.rockandsocial.com 263
http://www.scoop.it 221
http://join5works.com 182
http://ras 170
http://feed.feedsky.com 162
http://www.shiranyu.com 158
http://5works.dev 125
http://feedly.com 67
http://nitw.acenetcampus.com 55
http://tech.m6web.fr 47
http://localhost 47
https://twitter.com 45
http://assets.txmblr.com 44
http://acenetcampus.com 42
http://xianguo.com 41
http://uplanx.sinaapp.com 26
http://hyungseokyeo.tumblr.com 25
http://world-tourismspot.blogspot.com 25
http://bernardmoon.blogspot.in 24
http://guibaz.com 23
http://reader.youdao.com 23
http://www.bolan.com 20
http://secure.acenetcampus.com 19
http://www.simple-is-better.com 16
http://www.wojilu.com 15
http://bernardmoon.blogspot.kr 14
http://www.mefeedia.com 12
http://bernardmoon.blogspot.co.uk 11
http://marky.devel.io 11
http://blog.newitfarmer.com 10
http://bernardmoon.blogspot.ca 10
http://freshrss.zywukx.fr 10
http://www.newsblur.com 10
http://webcache.googleusercontent.com 9
http://bernardmoon.blogspot.de 9
http://www.feedspot.com 8
http://www.twylah.com 8
http://bernardmoon.blogspot.co.il 7
https://www.inoreader.com 6
http://bernardmoon.blogspot.com.au 6
http://plus.url.google.com 6
http://www.pearltrees.com 6

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Scaling Instagram AirBnB Tech Talk 2012 Mike Krieger Instagram
  • 2. me- Co-founder, Instagram- Previously: UX & Front-end @ Meebo- Stanford HCI BS/MS- @mikeyk on everything
  • 3. communicating andsharing in the real world
  • 4. 30+ million users in less than 2 years
  • 5. the story of how we scaled it
  • 6. a brief tangent
  • 7. the beginning
  • 8. Text
  • 9. 2 product guys
  • 10. no real back-end experience
  • 11. analytics & python @ meebo
  • 12. CouchDB
  • 13. CrimeDesk SF
  • 14. let’s get hacking
  • 15. good components in place early on
  • 16. ...but were hosted on a single machine somewhere in LA
  • 17. less powerful than my MacBook Pro
  • 18. okay, we launched. now what?
  • 19. 25k signups in the first day
  • 20. everything is on fire!
  • 21. best & worst day of our lives so far
  • 22. load was through the roof
  • 23. first culprit?
  • 24. favicon.ico
  • 25. 404-ing on Django,causing tons of errors
  • 26. lesson #1: don’t forget your favicon
  • 27. real lesson #1: most of your initial scaling problems won’t be glamorous
  • 28. favicon
  • 29. ulimit -n
  • 30. memcached -t 4
  • 31. prefork/postfork
  • 32. friday rolls around
  • 33. not slowing down
  • 34. let’s move to EC2.
  • 35. scaling = replacing allcomponents of a car while driving it at 100mph
  • 36. since...
  • 37. “"canonical [architecture]of an early stage startup in this era." (HighScalability.com)
  • 38. Nginx &Redis &Postgres &Django.
  • 39. Nginx & HAProxy &Redis & Memcached &Postgres & Gearman &Django.
  • 40. 24h Ops
  • 41. our philosophy
  • 42. 1 simplicity
  • 43. 2 optimize forminimal operational burden
  • 44. 3 instrument everything
  • 45. walkthrough:1 scaling the database2 choosing technology3 staying nimble4 scaling for android
  • 46. 1 scaling the db
  • 47. early days
  • 48. django ORM, postgresql
  • 49. why pg? postgis.
  • 50. moved db to its own machine
  • 51. but photos kept growing and growing...
  • 52. ...and only 68GB of RAM on biggest machine in EC2
  • 53. so what now?
  • 54. vertical partitioning
  • 55. django db routers make it pretty easy
  • 56. def db_for_read(self, model): if app_label == photos: return photodb
  • 57. ...once you untangle all your foreign key relationships
  • 58. a few months later...
  • 59. photosdb > 60GB
  • 60. what now?
  • 61. horizontal partitioning!
  • 62. aka: sharding
  • 63. “surely we’ll have hiredsomeone experiencedbefore we actually need to shard”
  • 64. you don’t get to choosewhen scaling challenges come up
  • 65. evaluated solutions
  • 66. at the time, none wereup to task of being our primary DB
  • 67. did in Postgres itself
  • 68. what’s painful about sharding?
  • 69. 1 data retrieval
  • 70. hard to know what yourprimary access patternswill be w/out any usage
  • 71. in most cases, user ID
  • 72. 2 what happens ifone of your shards gets too big?
  • 73. in range-based schemes (like MongoDB), you split
  • 74. A-H: shard0I-Z: shard1
  • 75. A-D: shard0E-H: shard2I-P: shard1Q-Z: shard2
  • 76. downsides (especially on EC2): disk IO
  • 77. instead, we pre-split
  • 78. many many many(thousands) of logical shards
  • 79. that map to fewer physical ones
  • 80. // 8 logical shards on 2 machinesuser_id % 8 = logical shardlogical shards -> physical shard map{ 0: A, 1: A, 2: A, 3: A, 4: B, 5: B, 6: B, 7: B}
  • 81. // 8 logical shards on 2 4 machinesuser_id % 8 = logical shardlogical shards -> physical shard map{ 0: A, 1: A, 2: C, 3: C, 4: B, 5: B, 6: D, 7: D}
  • 82. little known but awesome PG feature: schemas
  • 83. not “columns” schema
  • 84. - database: - schema: - table: - columns
  • 85. machineA: shard0 photos_by_user shard1 photos_by_user shard2 photos_by_user shard3 photos_by_user
  • 86. machineA: machineA’: shard0 shard0 photos_by_user photos_by_user shard1 shard1 photos_by_user photos_by_user shard2 shard2 photos_by_user photos_by_user shard3 shard3 photos_by_user photos_by_user
  • 87. machineA: machineC: shard0 shard0 photos_by_user photos_by_user shard1 shard1 photos_by_user photos_by_user shard2 shard2 photos_by_user photos_by_user shard3 shard3 photos_by_user photos_by_user
  • 88. can do this as long asyou have more logical shards than physical ones
  • 89. lesson: take tech/toolsyou know and try first toadapt them into a simple solution
  • 90. 2 which tools where?
  • 91. where to cache /otherwise denormalize data
  • 92. we <3 redis
  • 93. what happens when a user posts a photo?
  • 94. 1 user uploads photowith (optional) caption and location
  • 95. 2 synchronous write tothe media database for that user
  • 96. 3 queues!
  • 97. 3a if geotagged, async worker POSTs to Solr
  • 98. 3b follower delivery
  • 99. can’t have every user who loads her timelinelook up all their followers and then their photos
  • 100. instead, everyone gets their own list in Redis
  • 101. media ID is pushed onto a list for every personwho’s following this user
  • 102. Redis is awesome forthis; rapid insert, rapid subsets
  • 103. when time to render afeed, we take small # of IDs, go look up info in memcached
  • 104. Redis is great for...
  • 105. data structures that are relatively bounded
  • 106. (don’t tie yourself to a solution where your in-memory DB is your main data store)
  • 107. caching complex objectswhere you want to more than GET
  • 108. ex: counting, sub- ranges, testing membership
  • 109. especially when TaylorSwift posts live from the CMAs
  • 110. follow graph
  • 111. v1: simple DB table(source_id, target_id, status)
  • 112. who do I follow? who follows me? do I follow X?does X follow me?
  • 113. DB was busy, so westarted storing parallel version in Redis
  • 114. follow_all(300 item list)
  • 115. inconsistency
  • 116. extra logic
  • 117. so much extra logic
  • 118. exposing your support team to the idea of cache invalidation
  • 119. redesign took a page from twitter’s book
  • 120. PG can handle tens ofthousands of requests, very light memcached caching
  • 121. two takeaways
  • 122. 1 have a versatilecomplement to your core data storage (like Redis)
  • 123. 2 try not to have twotools trying to do the same job
  • 124. 3 staying nimble
  • 125. 2010: 2 engineers
  • 126. 2011: 3 engineers
  • 127. 2012: 5 engineers
  • 128. scarcity -> focus
  • 129. engineer solutions that you’re not constantly returning to because they broke
  • 130. 1 extensive unit-tests and functional tests
  • 131. 2 keep it DRY
  • 132. 3 loose coupling using notifications / signals
  • 133. 4 do most of our work inPython, drop to C when necessary
  • 134. 5 frequent code reviews, pull requests to keep things in the ‘shared brain’
  • 135. 6 extensive monitoring
  • 136. munin
  • 137. statsd
  • 138. “how is the system right now?”
  • 139. “how does this compare to historical trends?”
  • 140. scaling for android
  • 141. 1 million new users in 12 hours
  • 142. great tools that enable easy read scalability
  • 143. redis: slaveof <host> <port>
  • 144. our Redis frameworkassumes 0+ readslaves
  • 145. tight iteration loops
  • 146. statsd & pgfouine
  • 147. know where you canshed load if needed
  • 148. (e.g. shorter feeds)
  • 149. if you’re tempted toreinvent the wheel...
  • 150. don’t.
  • 151. “our app serverssometimes kernel panic under load”
  • 152. ...
  • 153. “what if we write amonitoring daemon...”
  • 154. wait! this is exactly what HAProxy is great at
  • 155. surround yourself with awesome advisors
  • 156. culture of opennessaround engineering
  • 157. give back; e.g. node2dm
  • 158. focus on making what you have better
  • 159. “fast, beautiful photo sharing”
  • 160. “can we make all of ourrequests 50% the time?”
  • 161. staying nimble = remind yourself of what’s important
  • 162. your users around theworld don’t care that you wrote your own DB
  • 163. wrapping up
  • 164. unprecedented times
  • 165. 2 backend engineerscan scale a system to 30+ million users
  • 166. key word = simplicity
  • 167. cleanest solution with the fewest moving parts as possible
  • 168. don’t over-optimize orexpect to know ahead of time how site will scale
  • 169. don’t think “someoneelse will join & take care of this”
  • 170. will happen sooner than you think; surround yourself with great advisors
  • 171. when adding software tostack: only if you have to,optimizing for operational simplicity
  • 172. few, if any, unsolvablescaling challenges for a social startup
  • 173. have fun