Scaling databases on the cloud

                                                                  D e e p a k A n u p a l l i
                                                                  S e r v e r A r c h i t e c t

                                 C L O U D               C O M P U T I N G - C O M I N G                          O F    A G E

                             A      T R E A T I S E                    O N         R E A L - L I F E        U S E       C A S E S




Copyright (c) 2009, Pramati Technologies Private Limited. Imaginea is a Pramati business. All
trade names and trade marks are owned by their respective owners
                                                                                                11/4/2009     1
We are
 •   An emerging leader in product
     development services offering
     specialized services in Product
     Engineering, Interaction design
     and Test engineering.
 •   US Headquarters in Sunnyvale,
     CA; India development centers in
     Hyderabad and Chennai
 •   A 250+ strong and growing team
 •   A business unit of Pramati
     technologies
 •   Rich Experience in SaaS
     Engineering, Performance
     engineering, Cloud Computing,
     Web2.0, sf.com integrations and
     managing Amazon EC2
     Deployment
 •   Track record of delivering
     significant customer satisfaction
Initiatives in Cloud
• Dekoh:
  http://www.dekoh.com
• SocialTwist:
  http://www.socialtwist.com
• MyPicks Beijing 2008:
  http://apps.new.facebook.com/mypicksbeijing/Home
• Qontext:
  http://www.qontext.com
Application requirements

• High reliability
• Low Latency
• Dynamic Scalability
   – Millions of Users
   – Volumes of data
• Across the tiers
   – Web
   – Application
   – Data
Our biggest challenge

• DB Perf bound by Disk I/O
• Vertical scaling is an option
   – Ex: PlentyOfFish.com: 512GB RAM, 32CPUs
   – Expensive
  – Only possible to an extent on cloud servers
Vertical Scaling: Limitations
  • Not everything will fit in
    memory
  • Lot of reads ~ Lot of
    page faults + disk seeks
  • RAID 6 or RAID 10
    disks
  • 200MBps-1GBps is the
    max speed

         Think Horizontal !
Replication
 • Master-slave replication (MySQL
                                             Writes
   or Oracle RAC)
 • Writes on one Master
                                             Master
 • Reads on many Slaves
 • Application aware
 • Works in read mostly scenario             Writes

 • Adds Slave lag
                                     Slave   Slave    Slave


                                              Reads
Sharding
 • Partition data across masters
 • Writes and Reads are distributed                  Shard Logic
 • Application is modified accordingly
 • Also use replication with fewer slaves
   to minimize slave lag                    Master      Master     Master

 • Choose a partitioning strategy that
   uniformly distributes data

                                            Slave       Slave      Slave
Sharding Schemes
 •   Vertical
                                   shard_id = getShard(“profile”)
 •   Profile DB, friend DB         shard_id = getShard(profileID)
 •   Not uniform
                                   Select * from Profile where id = ?
 •   Range based
 •   ID range, Location or Date
     based
 •   Not uniform                     Corporate           Corporate

 •   Key or Hash based
 •   ID hash
 •   Fixed masters
                                  Tweets         Posts
 •   Directory
 •   Mapping of ID to Shard
 •   Single point of failure
Sharding Complexities
 •   No Joins
 •   De-normalize the data
 •   Data Integrity
 •   Application should enforce integrity
 •   Re-shard
 •   Changing the sharding scheme requires re-partitioning
     the entire data
De-normalization
 • Recent 10 messages to a recipient
 • Schema                                   Messages    Recipients
 • Messages Table stores message info
                                            timestamp
 • Recipients Table stores
 • Requires Join on Messages & Recipients
   table
 • De-normalize                             Messages    Recipients

 • Store timestamp in Recipients table as
                                            timestamp   timestamp
   well
Relationships

• When data is partitioned into shards,
  foreign keys become obsolete
• De-normalization avoids having
  relationships                                      Application
• If data can’t be de-normalized further,
  use memcached
• But, this requires change in SQL queries      MemCached


                                             Shard    Shard    Shard
                                               1        2        3
Cloud Databases/Data stores

•   Amazon SimpleDB
•   Google BigTable
•   Apache HBase
•   Facebook/Apache Hive
•   CouchDB
•   Cassandra
•   Many more…
Amazon SimpleDB
•   Schema-less distributed key-value store
•   Highly reliable and scalable
•   Automatic indexing of columns
•   Querying with SQL-like syntax
•   Supports multiple values for key/attribute
•   Value for Money
Problems Addressed
• High Availability
   – multiple nodes forming a ring
• Partitioning
   – Consistent hashing
• Replication
   – Replicated to multiple nodes
• Eventual Consistency
   – Asynchronous replication of data using vector clocks
SimpleDB adoption

•   No Joins
•   No transactional support
•   String is the only data type
•   No aggregator functions
•   No full-text searches
•   Limits enforced on size of results, predicates, data etc.
Google BigTable
•   Distributed Key-value store
•   Runs on top of Google File System (GFS)
•   Timestamp versioned data
•   Automatic indexing of columns
BigTable adoption
• Google Search, Maps, Earth, Orkut, Youtube,
  Reader, etc.
• Google App Engine(GAE) uses BigTable as its
  datastore
• DataNucleus supports JPA for BigTable
• Limited transaction support
• Eventual consistency
Hive
 • Hive is a data warehouse
 • Runs on top of Hadoop Distributed
   File system (HDFS)
 • Supports SQL-like syntax
 • User defined types and functions
 • Extensibility with Map-Reduce
Hive adoption
 • Facebook uses Hive to analyze historical
   data of users and content
 • Doesn’t support indexing of columns
 • Brute force mechanism to compute
   analytics
CouchDB
•   CouchDB is a document-oriented datastore
•   Schema-free
•   Accessible through RESTful JSON API
•   Distributed with incremental replication
•   Querying through Javascript
Is there a solution for all?


• Different data-stores address different problem spaces
• Identify what best suites your app
Thank You
   deepak@pramati.com



http://hysea.in
C L O U D               C O M P U T I N G - C O M I N G                                      O F      A G E

A     T R E A T I S E                    O N        R E A L - L I F E                       U S E     C A S E S



Scaling databases on the cloud



Copyright © 2009, Imaginea Inc. Not to be distributed or communicated without permission.           11/4/2009   24

Scaing databases on the cloud

  • 1.
    Scaling databases onthe cloud D e e p a k A n u p a l l i S e r v e r A r c h i t e c t C L O U D C O M P U T I N G - C O M I N G O F A G E A T R E A T I S E O N R E A L - L I F E U S E C A S E S Copyright (c) 2009, Pramati Technologies Private Limited. Imaginea is a Pramati business. All trade names and trade marks are owned by their respective owners 11/4/2009 1
  • 2.
    We are • An emerging leader in product development services offering specialized services in Product Engineering, Interaction design and Test engineering. • US Headquarters in Sunnyvale, CA; India development centers in Hyderabad and Chennai • A 250+ strong and growing team • A business unit of Pramati technologies • Rich Experience in SaaS Engineering, Performance engineering, Cloud Computing, Web2.0, sf.com integrations and managing Amazon EC2 Deployment • Track record of delivering significant customer satisfaction
  • 3.
    Initiatives in Cloud •Dekoh: http://www.dekoh.com • SocialTwist: http://www.socialtwist.com • MyPicks Beijing 2008: http://apps.new.facebook.com/mypicksbeijing/Home • Qontext: http://www.qontext.com
  • 4.
    Application requirements • Highreliability • Low Latency • Dynamic Scalability – Millions of Users – Volumes of data • Across the tiers – Web – Application – Data
  • 5.
    Our biggest challenge •DB Perf bound by Disk I/O • Vertical scaling is an option – Ex: PlentyOfFish.com: 512GB RAM, 32CPUs – Expensive – Only possible to an extent on cloud servers
  • 6.
    Vertical Scaling: Limitations • Not everything will fit in memory • Lot of reads ~ Lot of page faults + disk seeks • RAID 6 or RAID 10 disks • 200MBps-1GBps is the max speed Think Horizontal !
  • 7.
    Replication • Master-slavereplication (MySQL Writes or Oracle RAC) • Writes on one Master Master • Reads on many Slaves • Application aware • Works in read mostly scenario Writes • Adds Slave lag Slave Slave Slave Reads
  • 8.
    Sharding • Partitiondata across masters • Writes and Reads are distributed Shard Logic • Application is modified accordingly • Also use replication with fewer slaves to minimize slave lag Master Master Master • Choose a partitioning strategy that uniformly distributes data Slave Slave Slave
  • 9.
    Sharding Schemes • Vertical shard_id = getShard(“profile”) • Profile DB, friend DB shard_id = getShard(profileID) • Not uniform Select * from Profile where id = ? • Range based • ID range, Location or Date based • Not uniform Corporate Corporate • Key or Hash based • ID hash • Fixed masters Tweets Posts • Directory • Mapping of ID to Shard • Single point of failure
  • 10.
    Sharding Complexities • No Joins • De-normalize the data • Data Integrity • Application should enforce integrity • Re-shard • Changing the sharding scheme requires re-partitioning the entire data
  • 11.
    De-normalization • Recent10 messages to a recipient • Schema Messages Recipients • Messages Table stores message info timestamp • Recipients Table stores • Requires Join on Messages & Recipients table • De-normalize Messages Recipients • Store timestamp in Recipients table as timestamp timestamp well
  • 12.
    Relationships • When datais partitioned into shards, foreign keys become obsolete • De-normalization avoids having relationships Application • If data can’t be de-normalized further, use memcached • But, this requires change in SQL queries MemCached Shard Shard Shard 1 2 3
  • 13.
    Cloud Databases/Data stores • Amazon SimpleDB • Google BigTable • Apache HBase • Facebook/Apache Hive • CouchDB • Cassandra • Many more…
  • 14.
    Amazon SimpleDB • Schema-less distributed key-value store • Highly reliable and scalable • Automatic indexing of columns • Querying with SQL-like syntax • Supports multiple values for key/attribute • Value for Money
  • 15.
    Problems Addressed • HighAvailability – multiple nodes forming a ring • Partitioning – Consistent hashing • Replication – Replicated to multiple nodes • Eventual Consistency – Asynchronous replication of data using vector clocks
  • 16.
    SimpleDB adoption • No Joins • No transactional support • String is the only data type • No aggregator functions • No full-text searches • Limits enforced on size of results, predicates, data etc.
  • 17.
    Google BigTable • Distributed Key-value store • Runs on top of Google File System (GFS) • Timestamp versioned data • Automatic indexing of columns
  • 18.
    BigTable adoption • GoogleSearch, Maps, Earth, Orkut, Youtube, Reader, etc. • Google App Engine(GAE) uses BigTable as its datastore • DataNucleus supports JPA for BigTable • Limited transaction support • Eventual consistency
  • 19.
    Hive • Hiveis a data warehouse • Runs on top of Hadoop Distributed File system (HDFS) • Supports SQL-like syntax • User defined types and functions • Extensibility with Map-Reduce
  • 20.
    Hive adoption •Facebook uses Hive to analyze historical data of users and content • Doesn’t support indexing of columns • Brute force mechanism to compute analytics
  • 21.
    CouchDB • CouchDB is a document-oriented datastore • Schema-free • Accessible through RESTful JSON API • Distributed with incremental replication • Querying through Javascript
  • 22.
    Is there asolution for all? • Different data-stores address different problem spaces • Identify what best suites your app
  • 23.
    Thank You deepak@pramati.com http://hysea.in
  • 24.
    C L OU D C O M P U T I N G - C O M I N G O F A G E A T R E A T I S E O N R E A L - L I F E U S E C A S E S Scaling databases on the cloud Copyright © 2009, Imaginea Inc. Not to be distributed or communicated without permission. 11/4/2009 24