Your SlideShare is downloading. ×
  • Like
Scalability at GROU.PS
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Scalability at GROU.PS

  • 4,009 views
Published

How does GROU.PS scale to serving 1PB of assets each month. memcache, nginx, gearman, tornado, libevent, kqueue, epoll, mysql, sharding, replication, memcached, tokyo cabinet

How does GROU.PS scale to serving 1PB of assets each month. memcache, nginx, gearman, tornado, libevent, kqueue, epoll, mysql, sharding, replication, memcached, tokyo cabinet

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
4,009
On SlideShare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
47
Comments
0
Likes
4

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Scalability at GROU.PS
    EmreSokullu
  • 2. Disclaimer
    We’re not fully there yet
    We hire: jobs@groups-inc.com
  • 3. Challenges @ GROU.PS
    3M unique visitors per month
    120M page views
    1PB assets to be served every month
    Video,Photos, Files
    Support for 5Gbit/s
    Very dynamic pages:
    With social networks; p(u,t) = HTML
    p(g,u,t) = HTML -> WHERE group_id = ? AND …
  • 4. What is GROU.PS ?
  • 5.
  • 6.
  • 7.
  • 8.
  • 9. Distributed Architecture
    25+ servers, S3 cloud, EdgeCast CDN
    4 cores +
    All Linux: Red Hat
    Some Debian, Ubuntu, CentOS
  • 10. Amazon Technologies
    S3
    CloudFront
    EC2 (elastic IP and persistent storage)
    SimpleDB
    Queue technologies, distributed hadoop and more…
  • 11. Amazon Technologies
    Downside:
    Not so cheap
    Bad database performance
  • 12. Serving Content?
    Use MogileFS
    Distributed file serving
    Use CDN
    hot content served off from local servers
    Sysctl tunings needed!
  • 13. Our typical sysctl additions
    net.ipv4.tcp_syncookies = 1
    net.ipv4.tcp_synack_retries = 2
    ## Emre edited
    # http://www.oracle-base.com/articles/11g/OracleDB11gR1InstallationOnFedora8.php
    kernel.shmall = 2097152
    kernel.shmmax = 2147483648
    kernel.shmmni = 4096
    # semaphores: semmsl, semmns, semopm, semmni
    kernel.sem = 250 32000 100 128
    net.ipv4.ip_local_port_range = 1024 65000
    net.core.rmem_default=4194304
    #net.core.rmem_max=4194304
    net.core.wmem_default=262144
    #net.core.wmem_max=262144
    fs.file-max=5049800
    vm.swappiness=10
    ## Emre edited
    # from http://forums.softlayer.com/showthread.php?t=3252
    net.ipv4.tcp_rmem = 4096 87380 8388608
    net.ipv4.tcp_wmem = 4096 87380 8388608
    net.core.rmem_max = 8388608
    net.core.wmem_max = 8388608
    net.core.netdev_max_backlog = 5000
    net.ipv4.tcp_window_scaling = 1
    net.ipv4.ip_nonlocal_bind=1
    # http://rackerhacker.com/2007/08/24/apache-no-space-left-on-device-couldnt-create-accept-lock/
    kernel.msgmni = 1024
    kernel.sem = 250 256000 32 1024
    net.ipv4.ip_conntrack_max = 524288
    net.ipv4.netfilter.ip_conntrack_max = 524288
  • 14. MySQL
    Load off via memcache
    $memcache->set(“group_by_name.jtpd”, 1122, false, 0);
    $memcache->set(“home_module_html.1122”,…, true, 30);
    function getGroupID($group_name) { global $memcache; if( !isset($memcache) || ($res=($memcache->get(“group_by_name.{$group_name}”)))===false ) { // get it from mysql and memcache } else { return $res; // serve from memcache }}
  • 15. MySQL
    Replication easy
    Split Reads
    What about writes?
    That’s where sharding comes to play
    Vertical Sharding
    Horizontal Sharding
    MMM
  • 16. MySQL
    Runs poorly on multi-cores
    query_cache_size = 0 # on master
    query_cache_type = 0 # on master
    thread_concurrency = 8 # total cores
    max_connections = 750 # shouldn’t exceed that
    innodb_buffer_pool_size = 10G # a little less than the total amount
  • 17. MySQL Query Optimization
    INDEX group, user
    WHERE group = ? AND user = ?
    Not WHERE user = ? AND group = ?
    B-tree
  • 18. MySQL Query Optimization
    SHOW PROCESSLIST
    Maatkit, mk-query-digest
    Percona builds
  • 19. NOSQL
    Voldemort, Linkedin
    Cassandra, Facebook
    Tokyo Cabinet, mixi
  • 20. Logging
    Database logging is not the solution
    File system is expensive too
    A legal necessity
  • 21. Logging
    Solution:
    Scribe & Thrift
    By Facebook
    Eventually consistent
  • 22. Nginx & libevent
  • 23. Nginx & libevent
    Handles 10000 connections
    5gbit/s
    Rambler
    Wordpress
    Grou.ps
  • 24. Postfix
    Run multiple instances
    Spam Clusters
  • 25. Monitoring
    Munin + monit
    Other alternatives:
    Cacti
    Nagios
    Hyperic – vmware
  • 26. PHP
  • 27. More to come on my blog
    http://emresokullu.com
    More fine tuning tips
    Become a member of my community
    Love grou.ps ;)
    Convert to PHP
    We’re hiring: jobs@groups-inc.com