mixi . jp
Upcoming SlideShare
Loading in...5

mixi . jp






Total Views
Views on SlideShare
Embed Views



2 Embeds 8

http://www.slideshare.net 5
http://webtoolswiki.com 3



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    mixi . jp mixi . jp Presentation Transcript

    • mixi.jp scaling out with open source Batara Kesuma mixi, Inc. bkesuma@mixi.co.jp
    • Introduction •Batara Kesuma •CTO of mixi, Inc.
    • What is mixi? •Social networking service • Diary, community, message, review, photo album, etc. • Invitation only •Largest and fastest growing SNS in Japan
    • Latest information - Friends new diary - Comments history - Communities topics Friends - Friends new reviews - Friends new albums My latest diaries and reviews Community listing User Testimonials
    • History of mixi •Development started in December 2003 • Only 1 engineer (me) • 4 months of coding •Opened on February 2004
    • Two months later •10,000 users •600,000 PV/day
    • The “Oh crap!” factor •This model works •But how do we scale out?
    • The first year •The online population of mixi grew significantly •600 users to 210,000 users
    • The second year •210,000 users to 2 million users
    • And now?
    • More than 3.7 million users 15,000 new users/day Population of Japan is: 127 million Internet users: 86.7 million Source CIA Factbook
    • 70% of active users (last login less than 72 hours)
    • Average user spends 3 hours 20 minutes on mixi per week
    • Ranked 35th on Alexa worldwide, and 3rd in Japan
    • PV growth in 2 years Google Japan mixi Amazon Japan
    • Users growth in 2 years 3,500,000 Users 2,625,000 1,750,000 875,000 0 04/03 05/03 06/03
    • Our technology solutions
    • The technology behind •Linux 2.6 •Apache 2.0 •MySQL •Perl 5.8 •memcached •Squid
    • E ST RE QU mod_proxy E QU EST images R mod_perl memcached HOT OBJECTS diary cluster message cluster other cluster Powered by
    • MySQL •More than 100 MySQL servers •Add more than 10 servers/month •Non-persistent connection •Mostly InnoDB •Heavily rely on the use of DB partitioning (our own solution)
    • DB replication •MySQL server load gets heavy •Add more slaves DB REQUEST Replicate T E) W RI RY( Q UE mod_perl QUERY (READ) DB
    • DB replication •Classic problem with DB SLAVES 50 writes/s replication 25 reads/s SLAVES 50 writes/s 25 reads/s MASTER 50 writes/s MASTER 50 writes/s 50 reads/s 50 writes/s 50 writes/s 25 reads/s 100 100 50 writes/s reads/s reads/s 50 writes/s 50 reads/s 25 reads/s
    • Some statistics •Diary related tables • Read 85% • Write 15% •Message related tables • Read 75% • Write 25%
    • DB partitioning •Replication couldn’t keep up anymore •Try to split the DB
    • How to split? user A user B user C message tables Splitting diary tables vertically by users or splitting other tables horizontally by table types DB
    • Vertical partition user A user B user C message tables diary tables other tables DB DB 1 DB 2
    • Vertical partition •Too many tables to deal with at one time •The transition in splitting gets complex and difficult
    • Horizontal partition $dbh = $db->load_dbh(type => $dbh = $db->load_dbh(); “message”); message tables message tables NEW DB diary tables $dbh = $db->load_dbh(type => “diary”); other tables diary tables NEW DB OLD DB Also called level 1 partitioning within mixi
    • Partition map for level 1 •Small and static •Just put it in configuration file •For example: $DB_DIARY = ‘DBI:mysql:host=db1;database=diary’; $DB_MESSAGE = ‘DBI:mysql:host=db2;database=message’; ...
    • Easy transition mod_perl 1 Writes to both DBs W TE RI RI TE W AD RE AD RE Shifts reads 3 SELECT OLD DB NEW DB INSERT IGNORE 2 Copies in background
    • Problems with level 1 •Cannot use JOIN anymore • Use FEDERATED TABLE from MySQL 5 • Or do SELECT twice which is faster than using FEDERATED TABLEs • If table is small, just duplicate it
    • Next step •When the new DB gets overloaded •We split the DB, yet again •Get ready for level 2
    • Partitioning key •user id, message id •Choose wisely! user A user B message tables message tables or user id message id
    • Level 2 partition user A user B user C user D message tables LEVEL 1 DB message tables message DB NEW tables NODE 1 NODE 2
    • Partition map for level 2 •Big and dynamic •Cannot put it all in configuration file
    • Partition map for level 2 •Manager based • Use another DB to do the partition mapping •Algorithm based • Partition map is counted inside application • node_id = member_id % TOTAL_NODE
    • Manager based MANAGER message tables DB NODE 1 node_id=2 2 Returns node_id message tables 1 Asks for node_id NODE 2 user_id=14 message tables mod_perl 3 Connects to node NODE 3
    • Algorithm based 1 Computes node_id message tables node_id=(user_id%3)+1 node_id=3 NODE 1 number of nodes = 3 mod_perl message tables NODE 2 message tables 2 Connects to node NODE 3
    • Manager based •Pros: • Easy to manage • Add a new node, move data between nodes •Cons: • This process increases by 1 query for partition map • It needs to send a request to the manager
    • Algorithm based •Pros: • Application servers can compute node id by themselves • Bypass the connection to the manager •Cons: • Difficult to manage • Adding new nodes is tricky
    • Adding nodes is tricky old_node_id=(member_id%2)+1 3 Copies in background number of nodes = 2 new_node_id=(member_id%4)+1 1 Adds a new application logic number of nodes = 4 NODE 1 COPY WRITE mod_perl NODE 2 READ + COPY WR 2 Writes to both DBs ITE if node_id is different NODE 3 RE AD NODE 4 4 Shifts reads
    • Problems with level 2 member tables • Too many connections to different DBs NODE 1 • Fortunately, on mixi, member tables the majority are small NODE 2 data sets member tables • Cache them all by NODE 3 using distributed memory caching community tables • We rarely hit the DB NODE 1 • Average page load time is about 0.02 sec* community tables NODE 2 * depending on data sets average load time may vary
    • Caching •memcached • Also used in LiveJournal, Slashdot, etc •Install server on mod_perl machine •39 machines x 2 GB memory
    • Summary of DB partitioning •Level 1 partition (split by table types) •Level 2 partition (split by partitioning key) • Manager based • Algorithm based
    • Summary of DB partitioning 1 Split by table types user A user B user C message tables message tables LEVEL 1 diary tables 2 Split by partitioning key other tables OLD DB message tables message tables LEVEL 2 LEVEL 2
    • Image Servers
    • Statistics •Total size is more than 8 TB of storage •Growth rate is about 23 GB / day •We use MySQL to store metadata only
    • Two types of images •Frequently accessed images • Number of image files is relatively small (about a few million files) • For example, user profile photos, community logos •Rarely accessed images • About hundred millions of image files • Diary photos, album photos, etc
    • Frequently accessed images •Few hundred GBs of files •Distribute via the use of FTP and Squid •Third party Content Delivery Network
    • Frequently accessed images Squid CDN sto1.mixi.jp sto2.mixi.jp 2 Pull images from storage mod_perl UPLOAD Storage 1 Uploads to storage
    • Rarely accessed images •Few TBs of files •Newer files get accessed more often •Cache hit ratio is very bad •Distribute directly from storage
    • Uploading rarely accessed images Storage MANAGER 2 Arranges a pair sto1.mixi.jp DB of area_id area_id Storage =1,2 sto2.mixi.jp Assigns a id for Storage 1 an image file abc.gif sto3.mixi.jp AD O PL AD U mod_perl UP LO 3 Uploads image Storage to storage sto4.mixi.jp
    • Viewing rarely accessed images 7 Asks for abc.gif User Storage 8 Returns abc.gif sto1.mixi.jp 1 Asks for view_diary.pl 6 Returns view_diary.pl Storage and URL for abc.gif sto2.mixi.jp 2 Detects abc.gif 5 Creates in view_diary.pl mod_perl image URL Storage abc.gif area_id =1 sto3.mixi.jp 3 Asks for area_id 4 Returns area_id MANAGER Storage sto4.mixi.jp DB
    • To do •Try MySQL Cluster •Try to implement better algorithm • Consistent hashing? • Linear hashing? •Level 3 partitioning? • Split again by timestamp?
    • Questions?
    • Thank you •Further questions to bkesuma@mixi.co.jp •We are hiring :) •Have a nice day!