SlideShare a Scribd company logo
1 of 70
Download to read offline
DATABASE
SHARDING
@NETLOG
Tags
performance,
partitioning, federation,
database, php, mySQL,
memcached, sphinx
jurriaanpersyn.com
lead web dev at Netlog
php + mysql + frontend
since 3 years
A Pan European Social Network


                   Over 40 million active members
                   Over 50 million unique visitors per
                   month
                   Over 5 billion page views per month
                   Over 26 active languages and 30+
                   countries
                   6 billion online minutes per month
                   Top 5 most active countries: Italy,
                   Belgium, Turkey, Switzerland and
                   Germany
Our reach beyond Europe
Visitor statistics
Database statistics



• huge amounts of data (eg. 100+ million
  friendships on nl.netlog.com)

• write-heavy app (1.4/1 read-write ratio)
• typical db up to 3000+ queries/sec (15h-22h)
Performance problems



• most technologies in the web stack are
  stateless

• the only layer not being stateless is the data
  itself

• hardest (backend) performance problems were
  mostly database related
We build Netlog on open source software



• php                    • squid
• mysql                  • and many more ...
• apache
• debian
• memcached
• sphinx
• lighttpd
Topics
- Netlog history of
  scaling
- Sharding architecture
- Implications
- Our approach
- Tackling the problems
Master
 r/w
Master
          w




Slave            Slave
  r                r
Top
               Master
                 w

Messages                    Friends
  r/w                         r/w
            Top      Top
           Slave    Slave
             r        r
Top
                     Master
                       w

   Messages                           Friends
      w                                  w
                  Top      Top
                 Slave    Slave
                   r        r
                                  Frnds    Frnds
Msgs     Msgs
                                  Slave    Slave
Slave    Slave
                                    r        r
  r        r
1040:
                             Top
               Too many
              connections   Master
                              w

   Messages                                  Friends
      w                                         w
                     Top          Top
                    Slave        Slave
                      r            r
                                         Frnds    Frnds
Msgs     Msgs
                                         Slave    Slave
Slave    Slave
                                           r        r
  r        r
1040:
                                Top
                  Too many
                 connections   Master      1040:
                                 w       Too many
    1040:                               connections
  Too many
    Messages                                          Friends
 connections

        w                                                w
                        Top          Top
                       Slave        Slave
                         r            r
                                             Frnds         Frnds
Msgs        Msgs1040:
              Too many
                                             Slave         Slave
Slave       Slave
             connections
                                               r             r
  r           r
?
More vertical
partitioning?
Master-to-master
  replication?
Caching?
Sharding!
pieces
  fragments
horizontal part.
   federation
Photos
                     %10 = 2
          Photos                  Photos
          %10 = 1                 %10 = 3
Photos                                      Photos
%10 = 0                                     %10 = 4
                    Aggregation
Photos                                      Photos
%10 = 9                                     %10 = 5
          Photos                  Photos
          %10 = 8                 %10 = 6
                     Photos
                     %10 = 7
More data?
More shards!
A simple example



   • A blog
   • Split users across 2 databases
   • Users with even ID go to database 1
   • Users with uneven ID go to database 2
   • Query: Give me the blog messages from
     author with id 26
What was ...



$db = DB::getInstance();

$db->prepare(quot;SELECT title, message
	 	 	 	 FROM BLOG_MESSAGES
	 	 	 	 WHERE userid = {userID}quot;);

$db->assignInt('userID', $userID);

$db->execute();
Becomes ...



$db = DB::getInstance($userID);

$db->prepare(quot;SELECT title, message
	 	 	 	 FROM BLOG_MESSAGES
	 	 	 	 WHERE userid = {userID}quot;);

$db->assignInt('userID', $userID);

$db->execute();
What to partition on?



 • “sharding key”
 • eg. for a blog
  • partition on publication date?
  • partition on author?
    • author name?
    • author id?
  • partition on category?
How to partition that key?



 • Partitioning schemes
 • eg.
  • Vertical
  • Range based
  • Key or hash based
  • Directory based
The result



 • Several smaller tables means:
 • Your users love you again
  • You’re actually online!
 • Your DBA loves you again
  • Less DB load/machine, less crashing
      machines

   • Even maintainance queries go faster again!
Some implications ...



 • Problem: No cross-shard SQL queries
  •            between shards becomes
       (LEFT) JOIN

      impossibly complicated

 • Solutions:
  • It’s possible to design (parts of) the
      application so there’s no need for cross-
      shard queries
Some implications ...



 • Solutions:
  • Parallel querying of shards
  • Double systems
    • guestbook messages:
      • shard on id of guestbook owner
      • shard on id of writer
  • Denormalizing data
Some implications ...



 • Data consistency / referential integrity?
  • enforce integrity on application level
  • “cross server transactions”
      1. start transactions on both servers

      2. commit transactions on both servers

   • check/fix routines can become substantial
      part of development cost
Some implications ...



 • Balancing (eg. balanced on a user’s ID)
  • What about “power users”?
  • Differences in hardware performance?
  • Adding servers if you grow?
  • Partitioning scheme choice is important
    • directory based++, but: SPOF? overhead?
Some implications ...



 • Your web servers talk to more / several
    databases

   • network implications
   • keep connections open? close after every
      query? pools?
Existing / related solutions?



 •   MySQL Cluster (high availability, performance,
     not distribution of writes)

 •   MySQL Partitioning (not feature complete
     when we needed it, Netlog not 5.1 ready/
     compatible at that time, not directory based
     (?))

 •   HiveDB (mySQL sharding framework in Java,
     requires JVM, php interface in infancy state)
Existing / related solutions?



 •   MySQL Proxy (used by following two options,
     introduces LUA)

 •   HSCALE (not feature complete, partitions files,
     not yet cross-server, builds on MySQL Proxy)

 •   Spock Proxy (fork of MySQL Proxy, atm only
     range based partitioning)

 •   HyperTable (HQL)

 •   HBase / BigTable
Existing / related solutions?



 •   Hibernate Shards
     (whole new DB layer for us)

 •   SQLAlchemy
     (for Python)

 •   memcached from Mysql
     (SQL-functions or storage engine)

 •   Oracle RAC
     (well, not MySQL)
Goals



 • flexible for your hardware department
 • no massive rewrite
 • incremental implementation
 • support for multiple sharding schemes
 • easy to understand
 • well balanced
Our solution



 • in-house
 • 100% php
 • middleware between application logic and
    class DB


 • complete caching layer built in
 • most sharded data carved by  $userID
sharddbhost001

    sharddb001            sharddb002

shard0001 shard0002   shard0005 shard0006



shard0003 shard0004   shard0007 shard0008
Overview of our Sharding implementation



 • Sharding Management
  • Lookup System (directory based)
  • Balancer / Manager
 • Sharded Tables API
  • Database Access Layer
  • Caching Layer
Sharding Management Directory



 • Lookup system translates              to the right
                               $userID

   db connection details

   •           to $shardID
       $userID
       (via SQL/memcache - combination not
       fixed!)

   •           to $hostname & $databasename
       $shardID
       (generated configuration files, with flags
       about shard being available for read/write)
Sharded Tables API



 • All sharded records have a
  • “shard key” ($userID)
  • “$itemID” (identification of the record)
    • related to  $userID


    • mostly: combined auto_increment
        column
Sharded Tables API



 • Example API:
  • An object per    $tableName/$userID-combination


    • implementation of a class providing basic
        CRUD functions

     • typically a class for accessing database
        records with “a user’s items”
How we query the “simple” example ...



  Query: Give me the blog messages from author
  with id 26. 3 queries:

   1. where is user 26? > shard 5

   2. on shard 5 > give me all “blogIDs” (itemIDs)
      of user 26 > blogID 10, 12 & 30

   3. on shard 5 > fetch details WHERE blogID
      IN(10,12,30) > array of title + message
Sharding Management: Balancing Shards



 • Define ‘load’ percentage for shards (#users),
   databases (#users, #filesize), hosts (#sql
   reads, #sql writes, #cpu load, #users, ...)

 • Balance loads and start move operations
  • We can do this on userID level
  • completely in PHP / transparant / no user
     downtime
Some implications ...



 •   Downtime of a single databasehost affects
     only users on shards on that DB

     •   a few profiles not available

     •   communication with that profile not
         available
Tackling the problems



 • memcached
 • parallel processing
 • sphinx
Memcached



 • Makes it faster
  • much faster
  • much much faster
 • Makes some “cross shard” things possible
Using memcached

  function isObamaPresident()
{
	 $memcache = new Memcache();
	 $result = $memcache->get('isobamapresident'); // fetch
	 if ($result === false)
	{
	 	 // do some database heavy stuff
	 	 $db = DB::getInstance();
	 	 $votes = $db->prepare(quot;SELECT COUNT(*) FROM VOTES WHERE
vote = 'OBAMA'quot;)->execute();
	 	 $result = ($votes > (USA_CITIZEN_COUNT / 2)) ? 'Sure
is!' : 'Nope.'; // well, ideally
	 	 $memcache->set('isobamapresident', $result, 0);
	}
	 return $result;
}
Typical usage for Netlog Sharding API



• “Shard key” to “shard id” is cached
• Each sharded record is cached as array
  (key: table/userID/itemID)

• Caches with lists, and caches with counts
  (key: where/order/...-clauses)

• “Revision number” per table/shard key-
  combination
How we cache the “simple” example ...



  Query: Give me the blog messages from author
  with id 26. 3 memcache requests:
1. where is user 26? > +/- 100% cache hit ratio

2. on shard 5 > give me all “blogIDs” (itemIDs) of
   user 26 > list query (cache result of query)

3. on shard 5 > fetch details WHERE blogID
   IN(10,12,30) > details of each item +/- 100%
   cache hit ratio (multi get on memcache)
CacheRevisionNumbers


 • What? Cached version number to use in other
     cache-keys

 • Why? Caching of counts / lists
 • Example: cache key for list of users latest
     photos (simplified):   ”USER_PHOTOS” . $userID .
     $cacheRevisionNumber . ”ORDERBYDATEADDDESCLIMIT10”;


 •                       is number, bumped on
     $cacheRevisionNumber

     every CUD-action, clears caches of all counts
     +lists, else unlimited ttl.

 • “number” is current/cached timestamp
Some implications ...



 • Several caching modes:
     •   READ_INSERT_MODE


     •   READ_UPDATE_INSERT_MODE


 •   More PHP processing

     •   Needs memory

     •   PHP-webservers scale more easily
Parallel processing



 • Split single HTTP-request into several
    requests and batch process (eg. Friends Of
    Friends feature)

   • Fetching friends of friends requires looping
      over friends (shard) (memcache makes this
      possible)

   • But: people with 1000+ friends > Process in
      batches of 500?

   • Processing 1000+ takes longer then 2*500+
Sphinx Search



 • Problem:
    How do you give an overview of eg. latest
    photos from different users? (on different
    shards)

 • Solution:
    Check Jayme’s presentation “Sphinx search
    optimization”, distributed full text search.
    (Use it for more than searching!)
Tips & thoughts



 • First and foremost
  • Don’t do if it, if you don’t need to!
      (37signals.com)

   • Shard early and often!
      (startuplessonslearned.blogspot.com)
Tips & thoughts



 • “Don’t do if it, if you don’t need to!”?
  • Sharding isn’t that easy
  • There is no out of the box solution that
      works for every set-up/technology/...

   • Maintainance does get a bit tougher
   • Complicates your set-up
   • There might be cheaper and better
      hardware available tomorrow.
Tips & thoughts



 • “Shard early and often!”?
  • What is the most relevant key to shard on
      for your app? (eg. userID)

   • Do you know that key on every query you
      send to a database? (function calls, objects)

   • Design your application / database schemas
      so you know that key everywhere you need
      it. (Migrating schemas is also hard.)
Tips & thoughts



 • Don’t forget server tuning / hardware
    upgrade / sql optimization

   • can be easier that (re-)sharding
 • Only scale those parts of your application that
    require high performance

   • (some clutter can remain on your db's, but
      are they causing harm?)
Tips & thoughts



• Migration to sharding system
   • Each shard reads from the master and
      copies data (can run in parallel, eg. from
      different slaves, to minimize downtime)

   • Usage of Sharded Tables API is possible
      without tables actually been sharded
      (transition phase).
Tips & thoughts



• We don’t need super high end hardware for our
  database systems anymore. Saves $.

• Backups of your data will be different.
• You can run alters/backups/maintainance
  queries on (temp) slaves of a shard and switch
  master/slave to minimize downtime.

• If read-write performance isn’t an issue to
  shard your data, maybe the downtime it takes
  for an ALTER query on a big table can be.
Tips & thoughts



• We didn’t have to balance user by user yet
   • power users balanced inactive users
   • “virtual shards” allowed us to split eg. 1
      host with 12 db’s into 2 hosts with 6 db’s

• Have a plan for scaling up.
   • When average load reaches x how will you
      add?

• Goes together w/ replication+clustering
don’t do it if you don’t need to,
   but prepare for when you do


     the last thing you want is
to be unreachable due to popularity
netlog.com/go/developer
     jurriaan@netlog.com
Resources
• the great development and it services team at Netlog
• www.netlog.com/go/developer
• www.37signals.com/svn/posts/1509-mr-moore-gets-to-punt-on-
  sharding
• www.addsimplicity.com/adding_simplicity_an_engi/2008/08/shard-
  lessons.html
• www.scribd.com/doc/2592098/DVPmysqlucFederation-at-Flickr-Doing-
  Billions-of-Queries-Per-Day
• startuplessonslearned.blogspot.com/2009/01/sharding-for-startups.html
• www.codefutures.com/weblog/database-sharding
• www.25hoursaday.com/weblog/2009/01/16/
  BuildingScalableDatabasesProsAndConsOfVariousDatabaseShardingSch
  emes.aspx
• highscalability.com
• dev.mysql.com/doc/refman/5.1/en/partitioning.html
• www.hibernate.org/414.html
• en.wikipedia.org/wiki/SQLAlchemy
Resources
• spockproxy.sourceforge.net
• www.scribd.com/doc/3865300/Scaling-Web-Sites-by-Sharding-and-
  Replication
• oracle2mysql.wordpress.com/2007/08/23/scale-out-notes-on-sharding-
  unique-keys-foreign-keys
• www.flickr.com/photos/kt

  Notes to the slides will become available soon at the Netlog Developer
  Blog and my personal blog (www.jurriaanpersyn.com)

More Related Content

Similar to Database Sharding At Netlog

Compass, Sass, and the Enlightened CSS Developer
Compass, Sass, and the Enlightened CSS DeveloperCompass, Sass, and the Enlightened CSS Developer
Compass, Sass, and the Enlightened CSS DeveloperWynn Netherland
 
Under the Hood with Docker Swarm Mode - Drew Erny and Nishant Totla, Docker
Under the Hood with Docker Swarm Mode - Drew Erny and Nishant Totla, DockerUnder the Hood with Docker Swarm Mode - Drew Erny and Nishant Totla, Docker
Under the Hood with Docker Swarm Mode - Drew Erny and Nishant Totla, DockerDocker, Inc.
 
API's, Freebase, and the Collaborative Semantic web
API's, Freebase, and the Collaborative Semantic webAPI's, Freebase, and the Collaborative Semantic web
API's, Freebase, and the Collaborative Semantic webDan Delany
 
Clojure at BackType
Clojure at BackTypeClojure at BackType
Clojure at BackTypenathanmarz
 
Flickr Architecture Presentation
Flickr Architecture PresentationFlickr Architecture Presentation
Flickr Architecture Presentationeraz
 
Wichert Akkerman Plone Deployment Practices The Plone.Org Setup
Wichert Akkerman   Plone Deployment Practices   The Plone.Org SetupWichert Akkerman   Plone Deployment Practices   The Plone.Org Setup
Wichert Akkerman Plone Deployment Practices The Plone.Org SetupVincenzo Barone
 
Wichert Akkerman - Plone.Org Infrastructure
Wichert Akkerman - Plone.Org InfrastructureWichert Akkerman - Plone.Org Infrastructure
Wichert Akkerman - Plone.Org InfrastructureVincenzo Barone
 
Ruby on Rails 3.1: Let's bring the fun back into web programing
Ruby on Rails 3.1: Let's bring the fun back into web programingRuby on Rails 3.1: Let's bring the fun back into web programing
Ruby on Rails 3.1: Let's bring the fun back into web programingBozhidar Batsov
 
When To Use Ruby On Rails
When To Use Ruby On RailsWhen To Use Ruby On Rails
When To Use Ruby On Railsdosire
 
Flickr and PHP - Cal Henderson
Flickr and PHP - Cal HendersonFlickr and PHP - Cal Henderson
Flickr and PHP - Cal Hendersonkangaro10a
 
Alfresco Security Best Practices 2012
Alfresco Security Best Practices 2012Alfresco Security Best Practices 2012
Alfresco Security Best Practices 2012Toni de la Fuente
 
Ruby on Rails 101 - Presentation Slides for a Five Day Introductory Course
Ruby on Rails 101 - Presentation Slides for a Five Day Introductory CourseRuby on Rails 101 - Presentation Slides for a Five Day Introductory Course
Ruby on Rails 101 - Presentation Slides for a Five Day Introductory Coursepeter_marklund
 
Dynomite at Erlang Factory
Dynomite at Erlang FactoryDynomite at Erlang Factory
Dynomite at Erlang Factorymoonpolysoft
 
Vim Eye for the Rails Guy - Cheatsheet
Vim Eye for the Rails Guy - CheatsheetVim Eye for the Rails Guy - Cheatsheet
Vim Eye for the Rails Guy - CheatsheetKarmen Blake
 
Perl University: Getting Started with Perl
Perl University: Getting Started with PerlPerl University: Getting Started with Perl
Perl University: Getting Started with Perlbrian d foy
 
DB Replication With Rails
DB Replication With RailsDB Replication With Rails
DB Replication With Railsschoefmax
 
How microservices fail, and what to do about it
How microservices fail, and what to do about itHow microservices fail, and what to do about it
How microservices fail, and what to do about itRichard Rodger
 

Similar to Database Sharding At Netlog (20)

Compass, Sass, and the Enlightened CSS Developer
Compass, Sass, and the Enlightened CSS DeveloperCompass, Sass, and the Enlightened CSS Developer
Compass, Sass, and the Enlightened CSS Developer
 
Under the Hood with Docker Swarm Mode - Drew Erny and Nishant Totla, Docker
Under the Hood with Docker Swarm Mode - Drew Erny and Nishant Totla, DockerUnder the Hood with Docker Swarm Mode - Drew Erny and Nishant Totla, Docker
Under the Hood with Docker Swarm Mode - Drew Erny and Nishant Totla, Docker
 
API's, Freebase, and the Collaborative Semantic web
API's, Freebase, and the Collaborative Semantic webAPI's, Freebase, and the Collaborative Semantic web
API's, Freebase, and the Collaborative Semantic web
 
Clojure at BackType
Clojure at BackTypeClojure at BackType
Clojure at BackType
 
Flickr Architecture Presentation
Flickr Architecture PresentationFlickr Architecture Presentation
Flickr Architecture Presentation
 
Wichert Akkerman Plone Deployment Practices The Plone.Org Setup
Wichert Akkerman   Plone Deployment Practices   The Plone.Org SetupWichert Akkerman   Plone Deployment Practices   The Plone.Org Setup
Wichert Akkerman Plone Deployment Practices The Plone.Org Setup
 
Wichert Akkerman - Plone.Org Infrastructure
Wichert Akkerman - Plone.Org InfrastructureWichert Akkerman - Plone.Org Infrastructure
Wichert Akkerman - Plone.Org Infrastructure
 
Ruby on Rails 3.1: Let's bring the fun back into web programing
Ruby on Rails 3.1: Let's bring the fun back into web programingRuby on Rails 3.1: Let's bring the fun back into web programing
Ruby on Rails 3.1: Let's bring the fun back into web programing
 
When To Use Ruby On Rails
When To Use Ruby On RailsWhen To Use Ruby On Rails
When To Use Ruby On Rails
 
Flickr and PHP - Cal Henderson
Flickr and PHP - Cal HendersonFlickr and PHP - Cal Henderson
Flickr and PHP - Cal Henderson
 
Alfresco Security Best Practices 2012
Alfresco Security Best Practices 2012Alfresco Security Best Practices 2012
Alfresco Security Best Practices 2012
 
Ruby on Rails 101 - Presentation Slides for a Five Day Introductory Course
Ruby on Rails 101 - Presentation Slides for a Five Day Introductory CourseRuby on Rails 101 - Presentation Slides for a Five Day Introductory Course
Ruby on Rails 101 - Presentation Slides for a Five Day Introductory Course
 
Dynomite at Erlang Factory
Dynomite at Erlang FactoryDynomite at Erlang Factory
Dynomite at Erlang Factory
 
Vidoop CouchDB Talk
Vidoop CouchDB TalkVidoop CouchDB Talk
Vidoop CouchDB Talk
 
MySQL Proxy tutorial
MySQL Proxy tutorialMySQL Proxy tutorial
MySQL Proxy tutorial
 
Vim Eye for the Rails Guy - Cheatsheet
Vim Eye for the Rails Guy - CheatsheetVim Eye for the Rails Guy - Cheatsheet
Vim Eye for the Rails Guy - Cheatsheet
 
Perl University: Getting Started with Perl
Perl University: Getting Started with PerlPerl University: Getting Started with Perl
Perl University: Getting Started with Perl
 
DB Replication With Rails
DB Replication With RailsDB Replication With Rails
DB Replication With Rails
 
How microservices fail, and what to do about it
How microservices fail, and what to do about itHow microservices fail, and what to do about it
How microservices fail, and what to do about it
 
Spark streaming
Spark streamingSpark streaming
Spark streaming
 

More from Jurriaan Persyn

An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.Jurriaan Persyn
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcachedJurriaan Persyn
 
Developing Social Games in the Cloud
Developing Social Games in the CloudDeveloping Social Games in the Cloud
Developing Social Games in the CloudJurriaan Persyn
 
Meet the OpenSocial Containers: Netlog
Meet the OpenSocial Containers: NetlogMeet the OpenSocial Containers: Netlog
Meet the OpenSocial Containers: NetlogJurriaan Persyn
 
Get Your Frontend Sorted (Barcamp Gent 2008)
Get Your Frontend Sorted (Barcamp Gent 2008)Get Your Frontend Sorted (Barcamp Gent 2008)
Get Your Frontend Sorted (Barcamp Gent 2008)Jurriaan Persyn
 

More from Jurriaan Persyn (6)

An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 
Engagor Walkthrough
Engagor WalkthroughEngagor Walkthrough
Engagor Walkthrough
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Developing Social Games in the Cloud
Developing Social Games in the CloudDeveloping Social Games in the Cloud
Developing Social Games in the Cloud
 
Meet the OpenSocial Containers: Netlog
Meet the OpenSocial Containers: NetlogMeet the OpenSocial Containers: Netlog
Meet the OpenSocial Containers: Netlog
 
Get Your Frontend Sorted (Barcamp Gent 2008)
Get Your Frontend Sorted (Barcamp Gent 2008)Get Your Frontend Sorted (Barcamp Gent 2008)
Get Your Frontend Sorted (Barcamp Gent 2008)
 

Recently uploaded

Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dashnarutouzumaki53779
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 

Recently uploaded (20)

Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dash
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 

Database Sharding At Netlog

  • 3. jurriaanpersyn.com lead web dev at Netlog php + mysql + frontend since 3 years
  • 4. A Pan European Social Network Over 40 million active members Over 50 million unique visitors per month Over 5 billion page views per month Over 26 active languages and 30+ countries 6 billion online minutes per month Top 5 most active countries: Italy, Belgium, Turkey, Switzerland and Germany
  • 7. Database statistics • huge amounts of data (eg. 100+ million friendships on nl.netlog.com) • write-heavy app (1.4/1 read-write ratio) • typical db up to 3000+ queries/sec (15h-22h)
  • 8. Performance problems • most technologies in the web stack are stateless • the only layer not being stateless is the data itself • hardest (backend) performance problems were mostly database related
  • 9. We build Netlog on open source software • php • squid • mysql • and many more ... • apache • debian • memcached • sphinx • lighttpd
  • 10. Topics - Netlog history of scaling - Sharding architecture - Implications - Our approach - Tackling the problems
  • 12. Master w Slave Slave r r
  • 13. Top Master w Messages Friends r/w r/w Top Top Slave Slave r r
  • 14. Top Master w Messages Friends w w Top Top Slave Slave r r Frnds Frnds Msgs Msgs Slave Slave Slave Slave r r r r
  • 15. 1040: Top Too many connections Master w Messages Friends w w Top Top Slave Slave r r Frnds Frnds Msgs Msgs Slave Slave Slave Slave r r r r
  • 16. 1040: Top Too many connections Master 1040: w Too many 1040: connections Too many Messages Friends connections w w Top Top Slave Slave r r Frnds Frnds Msgs Msgs1040: Too many Slave Slave Slave Slave connections r r r r
  • 17. ?
  • 22. pieces fragments horizontal part. federation
  • 23. Photos %10 = 2 Photos Photos %10 = 1 %10 = 3 Photos Photos %10 = 0 %10 = 4 Aggregation Photos Photos %10 = 9 %10 = 5 Photos Photos %10 = 8 %10 = 6 Photos %10 = 7
  • 25.
  • 26. A simple example • A blog • Split users across 2 databases • Users with even ID go to database 1 • Users with uneven ID go to database 2 • Query: Give me the blog messages from author with id 26
  • 27. What was ... $db = DB::getInstance(); $db->prepare(quot;SELECT title, message FROM BLOG_MESSAGES WHERE userid = {userID}quot;); $db->assignInt('userID', $userID); $db->execute();
  • 28. Becomes ... $db = DB::getInstance($userID); $db->prepare(quot;SELECT title, message FROM BLOG_MESSAGES WHERE userid = {userID}quot;); $db->assignInt('userID', $userID); $db->execute();
  • 29. What to partition on? • “sharding key” • eg. for a blog • partition on publication date? • partition on author? • author name? • author id? • partition on category?
  • 30. How to partition that key? • Partitioning schemes • eg. • Vertical • Range based • Key or hash based • Directory based
  • 31. The result • Several smaller tables means: • Your users love you again • You’re actually online! • Your DBA loves you again • Less DB load/machine, less crashing machines • Even maintainance queries go faster again!
  • 32.
  • 33. Some implications ... • Problem: No cross-shard SQL queries • between shards becomes (LEFT) JOIN impossibly complicated • Solutions: • It’s possible to design (parts of) the application so there’s no need for cross- shard queries
  • 34. Some implications ... • Solutions: • Parallel querying of shards • Double systems • guestbook messages: • shard on id of guestbook owner • shard on id of writer • Denormalizing data
  • 35. Some implications ... • Data consistency / referential integrity? • enforce integrity on application level • “cross server transactions” 1. start transactions on both servers 2. commit transactions on both servers • check/fix routines can become substantial part of development cost
  • 36. Some implications ... • Balancing (eg. balanced on a user’s ID) • What about “power users”? • Differences in hardware performance? • Adding servers if you grow? • Partitioning scheme choice is important • directory based++, but: SPOF? overhead?
  • 37. Some implications ... • Your web servers talk to more / several databases • network implications • keep connections open? close after every query? pools?
  • 38. Existing / related solutions? • MySQL Cluster (high availability, performance, not distribution of writes) • MySQL Partitioning (not feature complete when we needed it, Netlog not 5.1 ready/ compatible at that time, not directory based (?)) • HiveDB (mySQL sharding framework in Java, requires JVM, php interface in infancy state)
  • 39. Existing / related solutions? • MySQL Proxy (used by following two options, introduces LUA) • HSCALE (not feature complete, partitions files, not yet cross-server, builds on MySQL Proxy) • Spock Proxy (fork of MySQL Proxy, atm only range based partitioning) • HyperTable (HQL) • HBase / BigTable
  • 40. Existing / related solutions? • Hibernate Shards (whole new DB layer for us) • SQLAlchemy (for Python) • memcached from Mysql (SQL-functions or storage engine) • Oracle RAC (well, not MySQL)
  • 41. Goals • flexible for your hardware department • no massive rewrite • incremental implementation • support for multiple sharding schemes • easy to understand • well balanced
  • 42. Our solution • in-house • 100% php • middleware between application logic and class DB • complete caching layer built in • most sharded data carved by $userID
  • 43. sharddbhost001 sharddb001 sharddb002 shard0001 shard0002 shard0005 shard0006 shard0003 shard0004 shard0007 shard0008
  • 44. Overview of our Sharding implementation • Sharding Management • Lookup System (directory based) • Balancer / Manager • Sharded Tables API • Database Access Layer • Caching Layer
  • 45. Sharding Management Directory • Lookup system translates to the right $userID db connection details • to $shardID $userID (via SQL/memcache - combination not fixed!) • to $hostname & $databasename $shardID (generated configuration files, with flags about shard being available for read/write)
  • 46. Sharded Tables API • All sharded records have a • “shard key” ($userID) • “$itemID” (identification of the record) • related to $userID • mostly: combined auto_increment column
  • 47. Sharded Tables API • Example API: • An object per $tableName/$userID-combination • implementation of a class providing basic CRUD functions • typically a class for accessing database records with “a user’s items”
  • 48. How we query the “simple” example ... Query: Give me the blog messages from author with id 26. 3 queries: 1. where is user 26? > shard 5 2. on shard 5 > give me all “blogIDs” (itemIDs) of user 26 > blogID 10, 12 & 30 3. on shard 5 > fetch details WHERE blogID IN(10,12,30) > array of title + message
  • 49. Sharding Management: Balancing Shards • Define ‘load’ percentage for shards (#users), databases (#users, #filesize), hosts (#sql reads, #sql writes, #cpu load, #users, ...) • Balance loads and start move operations • We can do this on userID level • completely in PHP / transparant / no user downtime
  • 50. Some implications ... • Downtime of a single databasehost affects only users on shards on that DB • a few profiles not available • communication with that profile not available
  • 51. Tackling the problems • memcached • parallel processing • sphinx
  • 52. Memcached • Makes it faster • much faster • much much faster • Makes some “cross shard” things possible
  • 53. Using memcached function isObamaPresident() { $memcache = new Memcache(); $result = $memcache->get('isobamapresident'); // fetch if ($result === false) { // do some database heavy stuff $db = DB::getInstance(); $votes = $db->prepare(quot;SELECT COUNT(*) FROM VOTES WHERE vote = 'OBAMA'quot;)->execute(); $result = ($votes > (USA_CITIZEN_COUNT / 2)) ? 'Sure is!' : 'Nope.'; // well, ideally $memcache->set('isobamapresident', $result, 0); } return $result; }
  • 54. Typical usage for Netlog Sharding API • “Shard key” to “shard id” is cached • Each sharded record is cached as array (key: table/userID/itemID) • Caches with lists, and caches with counts (key: where/order/...-clauses) • “Revision number” per table/shard key- combination
  • 55. How we cache the “simple” example ... Query: Give me the blog messages from author with id 26. 3 memcache requests: 1. where is user 26? > +/- 100% cache hit ratio 2. on shard 5 > give me all “blogIDs” (itemIDs) of user 26 > list query (cache result of query) 3. on shard 5 > fetch details WHERE blogID IN(10,12,30) > details of each item +/- 100% cache hit ratio (multi get on memcache)
  • 56. CacheRevisionNumbers • What? Cached version number to use in other cache-keys • Why? Caching of counts / lists • Example: cache key for list of users latest photos (simplified): ”USER_PHOTOS” . $userID . $cacheRevisionNumber . ”ORDERBYDATEADDDESCLIMIT10”; • is number, bumped on $cacheRevisionNumber every CUD-action, clears caches of all counts +lists, else unlimited ttl. • “number” is current/cached timestamp
  • 57. Some implications ... • Several caching modes: • READ_INSERT_MODE • READ_UPDATE_INSERT_MODE • More PHP processing • Needs memory • PHP-webservers scale more easily
  • 58. Parallel processing • Split single HTTP-request into several requests and batch process (eg. Friends Of Friends feature) • Fetching friends of friends requires looping over friends (shard) (memcache makes this possible) • But: people with 1000+ friends > Process in batches of 500? • Processing 1000+ takes longer then 2*500+
  • 59. Sphinx Search • Problem: How do you give an overview of eg. latest photos from different users? (on different shards) • Solution: Check Jayme’s presentation “Sphinx search optimization”, distributed full text search. (Use it for more than searching!)
  • 60. Tips & thoughts • First and foremost • Don’t do if it, if you don’t need to! (37signals.com) • Shard early and often! (startuplessonslearned.blogspot.com)
  • 61. Tips & thoughts • “Don’t do if it, if you don’t need to!”? • Sharding isn’t that easy • There is no out of the box solution that works for every set-up/technology/... • Maintainance does get a bit tougher • Complicates your set-up • There might be cheaper and better hardware available tomorrow.
  • 62. Tips & thoughts • “Shard early and often!”? • What is the most relevant key to shard on for your app? (eg. userID) • Do you know that key on every query you send to a database? (function calls, objects) • Design your application / database schemas so you know that key everywhere you need it. (Migrating schemas is also hard.)
  • 63. Tips & thoughts • Don’t forget server tuning / hardware upgrade / sql optimization • can be easier that (re-)sharding • Only scale those parts of your application that require high performance • (some clutter can remain on your db's, but are they causing harm?)
  • 64. Tips & thoughts • Migration to sharding system • Each shard reads from the master and copies data (can run in parallel, eg. from different slaves, to minimize downtime) • Usage of Sharded Tables API is possible without tables actually been sharded (transition phase).
  • 65. Tips & thoughts • We don’t need super high end hardware for our database systems anymore. Saves $. • Backups of your data will be different. • You can run alters/backups/maintainance queries on (temp) slaves of a shard and switch master/slave to minimize downtime. • If read-write performance isn’t an issue to shard your data, maybe the downtime it takes for an ALTER query on a big table can be.
  • 66. Tips & thoughts • We didn’t have to balance user by user yet • power users balanced inactive users • “virtual shards” allowed us to split eg. 1 host with 12 db’s into 2 hosts with 6 db’s • Have a plan for scaling up. • When average load reaches x how will you add? • Goes together w/ replication+clustering
  • 67. don’t do it if you don’t need to, but prepare for when you do the last thing you want is to be unreachable due to popularity
  • 68. netlog.com/go/developer jurriaan@netlog.com
  • 69. Resources • the great development and it services team at Netlog • www.netlog.com/go/developer • www.37signals.com/svn/posts/1509-mr-moore-gets-to-punt-on- sharding • www.addsimplicity.com/adding_simplicity_an_engi/2008/08/shard- lessons.html • www.scribd.com/doc/2592098/DVPmysqlucFederation-at-Flickr-Doing- Billions-of-Queries-Per-Day • startuplessonslearned.blogspot.com/2009/01/sharding-for-startups.html • www.codefutures.com/weblog/database-sharding • www.25hoursaday.com/weblog/2009/01/16/ BuildingScalableDatabasesProsAndConsOfVariousDatabaseShardingSch emes.aspx • highscalability.com • dev.mysql.com/doc/refman/5.1/en/partitioning.html • www.hibernate.org/414.html • en.wikipedia.org/wiki/SQLAlchemy
  • 70. Resources • spockproxy.sourceforge.net • www.scribd.com/doc/3865300/Scaling-Web-Sites-by-Sharding-and- Replication • oracle2mysql.wordpress.com/2007/08/23/scale-out-notes-on-sharding- unique-keys-foreign-keys • www.flickr.com/photos/kt Notes to the slides will become available soon at the Netlog Developer Blog and my personal blog (www.jurriaanpersyn.com)