Your SlideShare is downloading. ×
0
Scaling Databases On The Cloud
Scaling Databases On The Cloud
Scaling Databases On The Cloud
Scaling Databases On The Cloud
Scaling Databases On The Cloud
Scaling Databases On The Cloud
Scaling Databases On The Cloud
Scaling Databases On The Cloud
Scaling Databases On The Cloud
Scaling Databases On The Cloud
Scaling Databases On The Cloud
Scaling Databases On The Cloud
Scaling Databases On The Cloud
Scaling Databases On The Cloud
Scaling Databases On The Cloud
Scaling Databases On The Cloud
Scaling Databases On The Cloud
Scaling Databases On The Cloud
Scaling Databases On The Cloud
Scaling Databases On The Cloud
Scaling Databases On The Cloud
Scaling Databases On The Cloud
Scaling Databases On The Cloud
Scaling Databases On The Cloud
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Scaling Databases On The Cloud

4,089

Published on

The most apt, current database Scaling Techniques for your cloud-app that ensure your application database scales effortlessly without a hitch.

The most apt, current database Scaling Techniques for your cloud-app that ensure your application database scales effortlessly without a hitch.

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,089
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Scaling databases on the cloud D e e p a k A n u p a l l i S e r v e r A r c h i t e c t C L O U D C O M P U T I N G - C O M I N G O F A G E A T R E A T I S E O N R E A L - L I F E U S E C A S E S Copyright (c) 2009, Pramati Technologies Private Limited. Imaginea is a Pramati business. All trade names and trade marks are owned by their respective owners 11/4/2009 1
  • 2. We are • An emerging leader in product development services offering specialized services in Product Engineering, Interaction design and Test engineering. • US Headquarters in Sunnyvale, CA; India development centers in Hyderabad and Chennai • A 250+ strong and growing team • A business unit of Pramati technologies • Rich Experience in SaaS Engineering, Performance engineering, Cloud Computing, Web2.0, sf.com integrations and managing Amazon EC2 Deployment • Track record of delivering significant customer satisfaction
  • 3. Initiatives in Cloud • Dekoh: http://www.dekoh.com • SocialTwist: http://www.socialtwist.com • MyPicks Beijing 2008: http://apps.new.facebook.com/mypicksbeijing/Home • Qontext: http://www.qontext.com
  • 4. Application requirements • High reliability • Low Latency • Dynamic Scalability – Millions of Users – Volumes of data • Across the tiers – Web – Application – Data
  • 5. Our biggest challenge • DB Perf bound by Disk I/O • Vertical scaling is an option – Ex: PlentyOfFish.com: 512GB RAM, 32CPUs – Expensive – Only possible to an extent on cloud servers
  • 6. Vertical Scaling: Limitations • Not everything will fit in memory • Lot of reads ~ Lot of page faults + disk seeks • RAID 6 or RAID 10 disks • 200MBps-1GBps is the max speed Think Horizontal !
  • 7. Replication • Master-slave replication (MySQL Writes or Oracle RAC) • Writes on one Master Master • Reads on many Slaves • Application aware • Works in read mostly scenario Writes • Adds Slave lag Slave Slave Slave Reads
  • 8. Sharding • Partition data across masters • Writes and Reads are distributed Shard Logic • Application is modified accordingly • Also use replication with fewer slaves to minimize slave lag Master Master Master • Choose a partitioning strategy that uniformly distributes data Slave Slave Slave
  • 9. Sharding Schemes • Vertical shard_id = getShard(“profile”) • Profile DB, friend DB shard_id = getShard(profileID) • Not uniform Select * from Profile where id = ? • Range based • ID range, Location or Date based • Not uniform Corporate Corporate • Key or Hash based • ID hash • Fixed masters Tweets Posts • Directory • Mapping of ID to Shard • Single point of failure
  • 10. Sharding Complexities • No Joins • De-normalize the data • Data Integrity • Application should enforce integrity • Re-shard • Changing the sharding scheme requires re-partitioning the entire data
  • 11. De-normalization • Recent 10 messages to a recipient • Schema Messages Recipients • Messages Table stores message info timestamp • Recipients Table stores • Requires Join on Messages & Recipients table • De-normalize Messages Recipients • Store timestamp in Recipients table as timestamp timestamp well
  • 12. Relationships • When data is partitioned into shards, foreign keys become obsolete • De-normalization avoids having relationships Application • If data can’t be de-normalized further, use memcached • But, this requires change in SQL queries MemCached Shard Shard Shard 1 2 3
  • 13. Cloud Databases/Data stores • Amazon SimpleDB • Google BigTable • Apache HBase • Facebook/Apache Hive • CouchDB • Cassandra • Many more…
  • 14. Amazon SimpleDB • Schema-less distributed key-value store • Highly reliable and scalable • Automatic indexing of columns • Querying with SQL-like syntax • Supports multiple values for key/attribute • Value for Money
  • 15. Problems Addressed • High Availability – multiple nodes forming a ring • Partitioning – Consistent hashing • Replication – Replicated to multiple nodes • Eventual Consistency – Asynchronous replication of data using vector clocks
  • 16. SimpleDB adoption • No Joins • No transactional support • String is the only data type • No aggregator functions • No full-text searches • Limits enforced on size of results, predicates, data etc.
  • 17. Google BigTable • Distributed Key-value store • Runs on top of Google File System (GFS) • Timestamp versioned data • Automatic indexing of columns
  • 18. BigTable adoption • Google Search, Maps, Earth, Orkut, Youtube, Reader, etc. • Google App Engine(GAE) uses BigTable as its datastore • DataNucleus supports JPA for BigTable • Limited transaction support • Eventual consistency
  • 19. Hive • Hive is a data warehouse • Runs on top of Hadoop Distributed File system (HDFS) • Supports SQL-like syntax • User defined types and functions • Extensibility with Map-Reduce
  • 20. Hive adoption • Facebook uses Hive to analyze historical data of users and content • Doesn’t support indexing of columns • Brute force mechanism to compute analytics
  • 21. CouchDB • CouchDB is a document-oriented datastore • Schema-free • Accessible through RESTful JSON API • Distributed with incremental replication • Querying through Javascript
  • 22. Is there a solution for all? • Different data-stores address different problem spaces • Identify what best suites your app
  • 23. Thank You deepak@pramati.com http://hysea.in
  • 24. C L O U D C O M P U T I N G - C O M I N G O F A G E A T R E A T I S E O N R E A L - L I F E U S E C A S E S Scaling databases on the cloud Copyright © 2009, Imaginea Inc. Not to be distributed or communicated without permission. 11/4/2009 24

×