Talk I did @ http://www.kingsofcode.nl about the things we learned the lst year about making http://www.netlog.com scalable and delivering high performance to our users...
6. What: itʼs personal
‣ You rule: itʼs yours
Music YOU
ANOTHER
Photos
Games
ANOTHER
YOU Videos
People
Blogs
Photos
Relations.
7. Friend Activity
‣ Share & discover friendsʼ activity
Pinguke V
Mari . reageert
Toon Coppens
wijzigt haar op haar foto
Jan Maarten
Willems tekent
uploadt een profielfoto het gastenboek
nieuwe foto
van nico b
Jaak Noukens
en Jo zijn nu
vrienden
Stijn Symons
uploadt een
nieuwe foto
Kenny Gryp
tekent het
gastenboek van
Lorenz Bogaert
14. Applications
‣ OpenSocial
• sandbox: http://nl.netlog.com/go/developer/opensocial/sandbox=1
‣ Officially announced tomorrow@ Google I/O
• Stay tuned!
‣ Public launch for june
16. Itʼs going pretty good
‣ More than 35,000,000 unique members
‣ More than 4,000,000,000 pageviews/Month
‣ 19 languages and more coming up
‣ More than 20 countries
‣ Current Alexa Top-100 ranking
(most visited web sites in the world)
‣ Current ComScore Europe Top-10 ranking
17. 0
50.000.000
100.000.000
150.000.000
200.000.000
Ja
nu
16%
3%
Fe ar
br y-
Western Asia
ua 07
Eastern Europe
M ry-
ar 07
ch
10%
Ap -07
22%
ril
-
M 07
ay
Southern Europe
Ju -07
Americas 3%
ne
-
Ju 07
ly
Northern Europe
Au -0
gu 7
st
-0
7
O
c
N tob
ov er
Monthly Visits
e -0
D mb 7
ec e
em r-0
46%
Ja be 7
nu r-0
Fe ary 7
Western Europe
br -0
ua 8
Itʼs going pretty good
M ry-
ar 08
ch
Ap -08
ril
-0
8
0
10.000.000
20.000.000
30.000.000
40.000.000
Ja
nu
0
1.250.000.000
2.500.000.000
3.750.000.000
5.000.000.000
Fe ary Ja
br -0 n
ua 7 Fe uar
M ry- br y-0
ar 07 ua 7
ch M ry-
Ap -07 ar 0
ch 7
ril
- Ap -07
M 07 ril
ay
M -07
Ju -07 ay
ne Ju -07
- ne
Ju 07
l Ju -07
Au y-0
gu 7 Au ly-0
st gu 7
-0 st
7 -0
O 7
ct O
N obe ct
ov N ob
e r-0 ov er
-
D mb 7 e
ec e D mb 07
em r-0 ec e
Monthly Unique Visitors
em r-0
Monthly Page Requests
Ja be 7 Ja be 7
nu r-0 n r-
Fe ary 7 Fe uar 07
br -0 br y-0
ua 8 ua 8
M ry- M ry-
ar 08 ar 0
ch ch 8
Ap -08 Ap -08
ril ril
-0 -0
8 8
20. 19 languages and alot more coming!
Slovenčina
Español Català
Svenska
suomi česky
slovenščina Deutsch Magyar
Nederlands
français
Русский Italiano Afrikaans
English
Dansk Türkçe
Polski Hrvatski
Lietuvių kalba
Eesti Latviešu valoda
Português
Română български
Norsk (bokmål)
37. Database Pools
‣ Different data on different database pools:
• messaging
• friendships
• blogs
• music
• videos
• ...
38. Replication
‣ write to one master
‣ read from multiple slaves (and master)
‣ pros
• easy to implement
• read intensive applications scale very well
‣ cons
• write intensive applications donʼt scale
39. Partitioning (sharding)
‣ Divide data on primary key:
• all user data for users with id 1 - 10 in database1
• all user data for users with id 11 - 20 in database2
• ...
‣ Best scaling possible
‣ How?
• managed in code
• MySQL partitioning (available from version 5.1)
40. Analyse, analyse, analyse!
‣ Tag your queries
• SELECT * FROM USER WHERE userid = 123 /*User::getUser():11 */
‣ Analyse mysql slow logs
‣ Analyse process lists
‣ Analyse based on tags
• 1023 User:getUser():230
• 512 User::isOnline():124
• 10 Activities:getActivity():320
‣ minutely cron that checks for “too many
connections”
• if “too many connections”, log process list
42. Introduction to memcached
‣ Developed by Danga Interactive:
• http://www.danga.com/
‣ Initially developed for LiveJournal:
• http://www.livejournal.com/
‣ OpenSource
43. Introduction to memcached
‣ Least Recently Used
‣ Fast!
‣ Distributed
‣ Automatic failover
‣ Big Hash table: set/add/get/delete
44. What to cache?
‣ sessions
‣ query caching
‣ processed data
‣ generated html
45. Session Cache
‣ 99% hit ratio
‣ Time to live is 20 minutes
‣ Faster than session database
46. Query Cache
‣ Why memcache and not MySQL query cache?
• MySQL invalidates cached queries on a table on
every update
• different query cache for different replicated
databases
‣ Add to generic database classes
• Cache key is query
49. HTML Caching
‣ Profile blocks are fully cached
‣ Data needed to generate html is also cached
‣ When data changes, html is invalidated, cached
data updated
‣ High cache hit rate on profile pages
50. 3 ways of caching
‣ Cache with TTL
‣ Cache forever with invalidate
‣ Cache forever with update
51. Cache with TTL
‣ The good:
• Quickly achieve better performance on existing code
‣ The bad:
• Users see outdated information
• TTL can not be high
• Caching efficiency is minimal
60. Global Locking: Chat Example
‣ Example: add new message to cached shared
chat thread
61. Flooding detection
‣ User can only redo action A after a timeout
• a guestbook message can only be posted once every
2 minutes
‣ User can not do action A more than X times in T
minutes
• only 12 failed login attempts per hour are allowed
63. Flooding detection
‣ User can only redo action A after a timeout
• a guestbook message can only be posted once every
2 minutes
‣ User can not do action A more than X times in T
minutes
• only 12 failed login attempts per hour are allowed
65. MySQL full-text search
‣ Initially used for our search
• can be very slow
• extra load on most of our databases, since most
content is searchable
‣ Better search engine needed
• Sphinx!
• OpenSource search engine developed by Andrew
Aksyonoff (http://sphinxsearch.com/)
66. Sphinx Features
‣ very fast indexing
‣ very fast searching
• 0.04 seconds average
• 5 million searches / day
• 60 searches / second
‣ distributed
‣ document fields
‣ stopwords
‣ api available in many languages
• PhP, Java, Python, Ruby, Perl, C++, ...
67. Sphinx Indexer
‣ Index is read-only (except for attributes)
‣ Build new index while searching old one
‣ How we index:
• rebuild full index from data once in a while (daily,
weekly)
• generate delta indexes often (every minute, 5
minutes)
• contains changes for search index since last full index merge
• full index merge of previous index and delta (every
hour)
68. Sphinx Search
‣ Search query returns list of ids
‣ For every result page shown, we fetch data
associated with ids
• data is cached with memcache for every id