Steve Huffman - Lessons learned while at


Published on

Neil will teach you five advanced website traffic statistics that you NEED to be measuring, but probably aren't. It isn't good enough anymore to just measure click-through and conversion rates to your signup page. You need MUCH more detail and Neil will explain how to get it and make decisions accordingly. You'll be amazed at the increase in valuable sign-ups and revenue increases you can achieve.

Published in: Technology
1 Comment
  • Reddit is one of amazing social media.

    Mark Chang,
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Began as a way to share links Now we have thousands of communities, some for news, some behave like forums Premier way to waste time at work
  • Originally, we didn’t detect crashes very well. wake up every could of hours and check if things were working. friends would have to call me. I dreaded the sound of my phone ringing Ruined many a dinner trying to fix reddit. I had to run across the street to an Apple store to use a terminal to fix things. Bringing supervise into the mix fixed many of our woes. If the app died, supervise automatically restarts it. write scripts to detect weird states Running out of connections? Let the app server crash, all of a sudden you’ll have plenty of connections. Memory leak? Deadlocked thread?
  • One machine with web, app, db Things were slow, but it wasn’t clear why. Cpu wasn’t too high, memory was under control. Context switching killing us Adding that second machine was a huge breath of fresh air. learned this lesson multiple times over the years Not just separating services, but also data. Breaking apart links/comments gave a large performance increase. PostgreSQL is great, but it doesn’t like to share.
  • In the early days our schema looked like this. It’s fairly straightforward. Normalized Lots of foreign keys Complex many-to-many relationships A table for links, accounts, comments. Columns for each attribute.
  • Every type of data is divided into two parts: things and data. A thing table stores properties common to all types. Every type has ups, downs, creation date. The data table is just a list of key value pairs. To do queries against the data table, we have specific indices for each key type.
  • only one application server and developed all sorts of bad habits. The app server was an always-running Lisp process. We stored all sorts of state per user. We didn’t have memcached, just stored data in an in-memory hashtable. When we switched to Python, we preserved this. When we added multiple app servers, we were in a bad way Every app had to share this cache. We were duplicating our entire cache for each app server. Couldn’t use memcache right away because we had too many keys
  • All queries are generated by the same piece of code. Makes general caching simple. limited state we have gets put in memcache. Examples: password reset or captchas. Every element of every page is cached. group small elements into bigger pieces and cache blobs. Slow function? Memoize it. For example, the normalized hot page, or more complex database lookups. Memcachedb – Listings, comment trees, slow queries (by_url)
  • When we first began, we had a nice, consistent normalized database. This means we had to do a lot of work to get all of the right data together to render a page. This can be mitigated with caching, but it’s not a cure-all What we do now is more like pre-emptive caching. store complete listings, complete comment trees. A link might appear in a front-page listing or in a user’s profile page. A comment might appear with a link, or in a user’s inbox. Each of these is stored pre-computed and ready to go. For some listings we store as many as 15 versions with different sorts and different time periods.
  • Why we work offline
  • What we do offline
  • Steve Huffman - Lessons learned while at

    1. 1. LESSONS LEARNED AT REDDIT Steve Huffman FOWA 2010
    2. 2.
    3. 3. A brief history of reddit <ul><li>Founded in June 2005 </li></ul><ul><li>Acquired by Condé Nast October 2007 </li></ul><ul><li>7.5 Million user / month </li></ul><ul><li>270 Million page views / month </li></ul><ul><li>Many mistakes along the way </li></ul>
    4. 4. Lesson 1: Crash! <ul><li>… and restart. </li></ul><ul><li>Daemontools (supervise) </li></ul><ul><li>Single greatest improvement to uptime we ever made. </li></ul><ul><li>When in doubt, let it die. </li></ul><ul><li>Don’t forget to read the logs! </li></ul>
    5. 5. Lesson 2: Separation of services <ul><li>Often, one->two machines more than doubles performance. </li></ul><ul><li>Group similar process together. </li></ul><ul><li>Group similar types of data together. </li></ul><ul><li>Better caching. </li></ul><ul><li>Less contention for CPU. </li></ul><ul><li>Avoid threads. Processes are easier to separate later. </li></ul>
    6. 6. Lesson 3: Open Schema ID UPS DOWNS TITLE URL 12345 120 34 Buffins Create Zombie Dog! 12346 3 24 Check out my new blog! 12347 509 167 Pee in a sink if you’ve ever voted up. self
    7. 7. Lesson 3: Open Schema <ul><li>In the early days: </li></ul><ul><li>Too much time spent thinking about the database. </li></ul><ul><li>Every feature required a schema update. </li></ul><ul><li>Schema updates became more painful as we grew. </li></ul><ul><li>Maintaining replication was difficult. </li></ul><ul><li>Deployment was complex. </li></ul>
    8. 8. Lesson 3: Open Schema THING_ID KEY VALUE 12345 Title Boffins Create Zombie Dog! 12345 URL 12346 Title Pee in a sink if you’ve ever voted up. 12346 URL self ID UPS DOWNS TYPE 12345 120 34 Link 12346 3 24 Link Thing Data
    9. 9. Lesson 3: Open Schema <ul><li>With an open schema: </li></ul><ul><li>Faster development </li></ul><ul><li>Easier deployment </li></ul><ul><li>Maintainable database replication </li></ul><ul><li>No joins = easy to distribute </li></ul><ul><li>Must be careful to maintain consistency </li></ul>
    10. 10. Lesson 4: Keep it stateless <ul><li>Goal: any app server can handle any request </li></ul><ul><li>App server failure/restart is no big deal </li></ul><ul><li>Scaling is straightforward </li></ul><ul><li>Caching must be independent from a specific app server. </li></ul>
    11. 11. Lesson 5: Memcache everything <ul><li>Database data </li></ul><ul><li>Session data </li></ul><ul><li>Rendered pages </li></ul><ul><li>Memoizing internal functions </li></ul><ul><li>Rate-limiting (user actions, crawlers) </li></ul><ul><li>Storing pre-computing listings/pages </li></ul><ul><li>Global locking </li></ul><ul><li>Memcachedb for persistence </li></ul>
    12. 12. Lesson 6: Store redundant data <ul><li>Recipe for slow: keep data normalized until you need it. </li></ul><ul><li>If data has multiple presentations, store it in multiple times in multiple formats. </li></ul><ul><li>Disk and memory is less costly than making your users wait. </li></ul>
    13. 13. Lesson 7: Work offline <ul><li>Do the minimum amount of work to end the request. </li></ul><ul><li>Everything else can be done offline. </li></ul><ul><li>An architecture of queues is simple and easy to scale. </li></ul><ul><li>AMQP/RabbitMQ. </li></ul>
    14. 14. Lesson 7: Work offline <ul><li>Pre-computing listings </li></ul><ul><li>Fetching thumbnails </li></ul><ul><li>Detecting cheating </li></ul><ul><li>Removing spam </li></ul><ul><li>Computing awards </li></ul><ul><li>Updating the “search” index </li></ul>
    15. 15. Lesson 7: Work offline Master Databases App Servers Worker Databases Cache Precomputer Thumbnailer Spam Request Queue
    16. 16. THANKS! QUESTIONS?