Delivering Data - Social Networking Personal


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Looked at top 10 sites on the web found 7 with social networking aspectsOther:Google 1Baidu 9
  • Decided to look at the traffic and found some very interesting statsFacebook – 200 million active users & 50 billion page views per monthYouTube – over 1 billion views per dayBasecamp – 2 million active accounts & 1.3 million projects managedTwitter – 1 Million + users & 3 million tweets per day
  • It should be noted that neither are the most efficient languages as they are not compiled (both are interpreted languages, they are not directly executed by the CPU but executed by an interpreter)Sites like Twitter and are written using Ruby on Rails. Tada list – has so much build into the framework that a full production app can be developed with very little code.
  • Some treat language as a religion, its ok to try something different, it doesn’t define you as a person.
  • Duplicating the cache is a waste of memoryNo group invalidation means you either need to notify all of your servers that they need to refresh their cache or rely solely on cache timeouts.a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.Memcached is used by: Facebook, YouTube, Wikipedia, LiveJournal, Digg, Twitter, SourceForgeMost site founders said that the biggest gain was from implementing a caching layer
  • There is a significant penalty in going to disk to read every time as opposed to reading from the cache.Implementing a cache is extremely easy, as shown by the code aboveGreat for reading data, but you still have to write data
  • All about responsivnessUsers wont tolerate long waits on social networksThey are now expecting this behaviour from all software
  • To prevent anomalies we don't duplicate data. We split everything up so it is stored once. The price of normalization is that when we want a person's address we have to go find the person and their address and bring the data together again. This is called a join. Joins are relatively slow, especially over very large data sets. Not just for reads (caching takes care of this) but for CUD.Flickr decided to denormalize because it took 13 Selects to each Insert, Delete or Update.
  • eBay do not use transactions, they have so much data that distributed transactions would harm responsiveness. Referential integrity and sorting are done in application code.Atomicity - all parts of a transaction succeed or none of then succeed.Consistency - The database will be in a consistent state when the transaction begins and ends.Isolation - The transaction will behave as if it is the only operation being performed upon the database.Durability - Upon completion of the transaction, the operation will not be reversed.Facebook has 4500 database servers
  • All solutions are slightly differentSame challenge in 5 years may have a totally different solution (hardware/software changes)
  • Need fresh ideas – otherwise well copy the mistakes of others
  • Delivering Data - Social Networking Personal

    1. 1. Social networking architectures what can we learn<br />
    2. 2. An enthusiasts view<br />2<br />
    3. 3. How do sites with a social networking angle figure globally?<br />3<br />As ranked by Alexa<br />Site Global ranking<br />Facebook 2<br />YouTube 3<br />Yahoo 4<br />Windows Live 5<br />Blogger 7<br />Wikipedia 8<br />Twitter 10<br />
    4. 4. 3 Principles<br />4<br />3 common principles<br />Fast feature delivery is key<br />Cache everything everywhere<br />Relational data is dead<br />
    5. 5. Interesting stats<br />5<br /> Facebook - Serve 120 million queries per second without a single join<br /> 37 Signals - Developed a production application serving over 4 million items using only 579 lines of code<br /> Flickr - 2 Billion photos served without using relational databases<br />
    6. 6. How did they do it?<br />6<br />Nobody thought this was possible<br />Unencumbered by history or restrictive rules<br />Had to be creative in solving problems that nobody had experienced using very little capital outlay<br />
    7. 7. 3 Principles<br />7<br />Fast feature delivery is key<br />Cache everything everywhere<br />Relational data is dead<br />
    8. 8. Fast feature delivery is key<br />8<br />Choose an appropriate language<br />Speed of development more important than speed of execution<br />Languages like PHP and Ruby commonly used for rapid development and deployment<br />
    9. 9. Language is not religion<br />9<br />
    10. 10. 3 Principles<br />10<br />Fast feature delivery is key<br />Cache everything everywhere<br />Relational data is dead<br />
    11. 11. Cache everything everywhere<br />11<br />You need a really good reason not to cache data for reading<br />Local caching a good start but<br />more than one server means duplicating the cache<br />no group invalidation<br />memory limited to how much spare RAM on the server<br />Most social networks use a distributed cache like memcached<br />
    12. 12. Cache everything everywhere<br />12<br />Check if the information is in the cache. If so, use it<br />If not, query the database put the result in the cache<br />On update delete from the cache. The next user goes to the database<br />function get_foo(int userid) { <br /> result = memcached_fetch("userrow:" + userid); <br /> if (!result) { <br /> result = db_select("SELECT * FROM users WHERE userid = ?", userid); <br /> memcached_add("userrow:" + userid, result); <br /> } return result; <br />
    13. 13. Responsivness is key<br />13<br />
    14. 14. 3 Principles<br />14<br />Fast feature delivery is key<br />Cache everything everywhere<br />Relational data is dead<br />
    15. 15. Everybody wants to use a database<br />15<br />
    16. 16. Relational issue No 1 - Normalisation<br />16<br />Relational databases do not scale well because of normalisation<br />Why normalise?<br /> - reduce storage space<br /> - reduce anomalies<br />Today <br /> - storage is cheap<br /> - as data gets larger, joins are expensive<br />
    17. 17. Relational issue No 2 - Transactions<br />17<br />ACID principles govern transactions<br />Relational databases do not scale well because of transactions<br />
    18. 18. After relational<br />18<br />Use BASE (basically available, soft state, eventually consistent)<br />Shard Data<br />Favour Name value pair stores over relational databases<br />
    19. 19. Lessons for enterprise<br />19<br />Design of software should always be it depends.<br />Test your most basic assumptions<br />Dynamic languages and frameworks may be suitable to deliver a feature quickly<br />You don't need an RDBMS for everything, especially if you need huge scale<br />You should always cache data for read (unless you shouldn’t)<br />
    20. 20. Fresh ideas always welcome<br />20<br />
    21. 21. Find me here <br />21<br /><br />
    22. 22. Or find me here<br />22<br />