Talk I did @ http://www.kingsofcode.nl about the things we learned the lst year about making http://www.netlog.com scalable and delivering high performance to our users...
6. What: itŹ¼s personal
ā£ You rule: itŹ¼s yours
Music YOU
ANOTHER
Photos
Games
ANOTHER
YOU Videos
People
Blogs
Photos
Relations.
7. Friend Activity
ā£ Share & discover friendsŹ¼ activity
Pinguke V
Mari . reageert
Toon Coppens
wijzigt haar op haar foto
Jan Maarten
Willems tekent
uploadt een profielfoto het gastenboek
nieuwe foto
van nico b
Jaak Noukens
en Jo zijn nu
vrienden
Stijn Symons
uploadt een
nieuwe foto
Kenny Gryp
tekent het
gastenboek van
Lorenz Bogaert
14. Applications
ā£ OpenSocial
ā¢ sandbox: http://nl.netlog.com/go/developer/opensocial/sandbox=1
ā£ Ofļ¬cially announced tomorrow@ Google I/O
ā¢ Stay tuned!
ā£ Public launch for june
16. ItŹ¼s going pretty good
ā£ More than 35,000,000 unique members
ā£ More than 4,000,000,000 pageviews/Month
ā£ 19 languages and more coming up
ā£ More than 20 countries
ā£ Current Alexa Top-100 ranking
(most visited web sites in the world)
ā£ Current ComScore Europe Top-10 ranking
17. 0
50.000.000
100.000.000
150.000.000
200.000.000
Ja
nu
16%
3%
Fe ar
br y-
Western Asia
ua 07
Eastern Europe
M ry-
ar 07
ch
10%
Ap -07
22%
ril
-
M 07
ay
Southern Europe
Ju -07
Americas 3%
ne
-
Ju 07
ly
Northern Europe
Au -0
gu 7
st
-0
7
O
c
N tob
ov er
Monthly Visits
e -0
D mb 7
ec e
em r-0
46%
Ja be 7
nu r-0
Fe ary 7
Western Europe
br -0
ua 8
ItŹ¼s going pretty good
M ry-
ar 08
ch
Ap -08
ril
-0
8
0
10.000.000
20.000.000
30.000.000
40.000.000
Ja
nu
0
1.250.000.000
2.500.000.000
3.750.000.000
5.000.000.000
Fe ary Ja
br -0 n
ua 7 Fe uar
M ry- br y-0
ar 07 ua 7
ch M ry-
Ap -07 ar 0
ch 7
ril
- Ap -07
M 07 ril
ay
M -07
Ju -07 ay
ne Ju -07
- ne
Ju 07
l Ju -07
Au y-0
gu 7 Au ly-0
st gu 7
-0 st
7 -0
O 7
ct O
N obe ct
ov N ob
e r-0 ov er
-
D mb 7 e
ec e D mb 07
em r-0 ec e
Monthly Unique Visitors
em r-0
Monthly Page Requests
Ja be 7 Ja be 7
nu r-0 n r-
Fe ary 7 Fe uar 07
br -0 br y-0
ua 8 ua 8
M ry- M ry-
ar 08 ar 0
ch ch 8
Ap -08 Ap -08
ril ril
-0 -0
8 8
20. 19 languages and alot more coming!
SlovenÄina
EspaƱol CatalĆ
Svenska
suomi Äesky
slovenÅ”Äina Deutsch Magyar
Nederlands
franƧais
Š ŃŃŃŠŗŠøŠ¹ Italiano Afrikaans
English
Dansk TĆ¼rkƧe
Polski Hrvatski
Lietuvių kalba
Eesti LatvieŔu valoda
PortuguĆŖs
RomĆ¢nÄ Š±ŃŠ»Š³Š°ŃŃŠŗŠø
Norsk (bokmƄl)
37. Database Pools
ā£ Different data on different database pools:
ā¢ messaging
ā¢ friendships
ā¢ blogs
ā¢ music
ā¢ videos
ā¢ ...
38. Replication
ā£ write to one master
ā£ read from multiple slaves (and master)
ā£ pros
ā¢ easy to implement
ā¢ read intensive applications scale very well
ā£ cons
ā¢ write intensive applications donŹ¼t scale
39. Partitioning (sharding)
ā£ Divide data on primary key:
ā¢ all user data for users with id 1 - 10 in database1
ā¢ all user data for users with id 11 - 20 in database2
ā¢ ...
ā£ Best scaling possible
ā£ How?
ā¢ managed in code
ā¢ MySQL partitioning (available from version 5.1)
40. Analyse, analyse, analyse!
ā£ Tag your queries
ā¢ SELECT * FROM USER WHERE userid = 123 /*User::getUser():11 */
ā£ Analyse mysql slow logs
ā£ Analyse process lists
ā£ Analyse based on tags
ā¢ 1023 User:getUser():230
ā¢ 512 User::isOnline():124
ā¢ 10 Activities:getActivity():320
ā£ minutely cron that checks for ātoo many
connectionsā
ā¢ if ātoo many connectionsā, log process list
42. Introduction to memcached
ā£ Developed by Danga Interactive:
ā¢ http://www.danga.com/
ā£ Initially developed for LiveJournal:
ā¢ http://www.livejournal.com/
ā£ OpenSource
43. Introduction to memcached
ā£ Least Recently Used
ā£ Fast!
ā£ Distributed
ā£ Automatic failover
ā£ Big Hash table: set/add/get/delete
44. What to cache?
ā£ sessions
ā£ query caching
ā£ processed data
ā£ generated html
45. Session Cache
ā£ 99% hit ratio
ā£ Time to live is 20 minutes
ā£ Faster than session database
46. Query Cache
ā£ Why memcache and not MySQL query cache?
ā¢ MySQL invalidates cached queries on a table on
every update
ā¢ different query cache for different replicated
databases
ā£ Add to generic database classes
ā¢ Cache key is query
49. HTML Caching
ā£ Proļ¬le blocks are fully cached
ā£ Data needed to generate html is also cached
ā£ When data changes, html is invalidated, cached
data updated
ā£ High cache hit rate on proļ¬le pages
50. 3 ways of caching
ā£ Cache with TTL
ā£ Cache forever with invalidate
ā£ Cache forever with update
51. Cache with TTL
ā£ The good:
ā¢ Quickly achieve better performance on existing code
ā£ The bad:
ā¢ Users see outdated information
ā¢ TTL can not be high
ā¢ Caching efļ¬ciency is minimal
60. Global Locking: Chat Example
ā£ Example: add new message to cached shared
chat thread
61. Flooding detection
ā£ User can only redo action A after a timeout
ā¢ a guestbook message can only be posted once every
2 minutes
ā£ User can not do action A more than X times in T
minutes
ā¢ only 12 failed login attempts per hour are allowed
63. Flooding detection
ā£ User can only redo action A after a timeout
ā¢ a guestbook message can only be posted once every
2 minutes
ā£ User can not do action A more than X times in T
minutes
ā¢ only 12 failed login attempts per hour are allowed
65. MySQL full-text search
ā£ Initially used for our search
ā¢ can be very slow
ā¢ extra load on most of our databases, since most
content is searchable
ā£ Better search engine needed
ā¢ Sphinx!
ā¢ OpenSource search engine developed by Andrew
Aksyonoff (http://sphinxsearch.com/)
66. Sphinx Features
ā£ very fast indexing
ā£ very fast searching
ā¢ 0.04 seconds average
ā¢ 5 million searches / day
ā¢ 60 searches / second
ā£ distributed
ā£ document ļ¬elds
ā£ stopwords
ā£ api available in many languages
ā¢ PhP, Java, Python, Ruby, Perl, C++, ...
67. Sphinx Indexer
ā£ Index is read-only (except for attributes)
ā£ Build new index while searching old one
ā£ How we index:
ā¢ rebuild full index from data once in a while (daily,
weekly)
ā¢ generate delta indexes often (every minute, 5
minutes)
ā¢ contains changes for search index since last full index merge
ā¢ full index merge of previous index and delta (every
hour)
68. Sphinx Search
ā£ Search query returns list of ids
ā£ For every result page shown, we fetch data
associated with ids
ā¢ data is cached with memcache for every id