Cloudcon East Presentation

1,229
-1

Published on

An introduction to HiveDB originally presented at Cloudcon East.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,229
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
35
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Cloudcon East Presentation

  1. 1. Scaling with HiveDB
  2. 2. The Problem • Hundreds of millions of products • Millions of new products every week • Accelerating growth
  3. 3. Scaling up is no longer an option.
  4. 4. Requirements ➡ OLTP optimized ➡ Constant response time is more important than low-latency ➡ Expand capacity online ➡ Ability to move records ➡ No re-sharding
  5. 5. 2 years ago Clustered Non-relational Unstructured Oracle RAC MS SQL Server Hadoop MySQL Cluster MogileFS Teradata (OLAP) S3
  6. 6. Today Clustered Non-relational Unstructured Oracle RAC MS SQL Server Hypertable Hadoop MySQL Cluster HBase MogileFS DB2 CouchDB S3 Teradata (OLAP) SimpleDB HiveDB
  7. 7. System Storage Interface Partitioning Expansion Node Types Maturity No Oracle RAC Shared SQL Transparent Identical 7 years downtime Memory Requires MySQL Cluster SQL Transparent Mixed 3 years / Local Restart ? No Hypertable Local HQL Transparent Mixed (released downtime 2/08) Degraded 3 years DB2 Local SQL Fixed Hash Performan Identical (25 years ce total) Key-based No 18 months HiveDB Local SQL Mixed Directory downtime (+13 years!)
  8. 8. Operating a distributed system is hard.
  9. 9. Non-functional concerns quickly dominate functional concerns.
  10. 10. Replication Fail over Monitoring
  11. 11. Our approach: minimize cleverness
  12. 12. Build atop solid, easy to manage technologies.
  13. 13. So we can get back to selling t-shirts.
  14. 14. MySQL is fast.
  15. 15. MySQL is well understood.
  16. 16. MySQL replication works.
  17. 17. MySQL is free.
  18. 18. The same for JAVA
  19. 19. The simplest thing ➡ HiveDB is a JDBC Gatekeeper for applications. ➡ Lightest integration ➡ Smallest possible implementation
  20. 20. The eBay approach ➡ Pro: Internally consistent data ➡ Con: Re-sharding of data required ➡ Con: Re-sharding is a DBA operation
  21. 21. Partition by key A simple bucketed partitioning scheme expressed in code.
  22. 22. Partition by key
  23. 23. Directory ➡ No broadcasting ➡ No re-partitioning ➡ Easy to relocate records ➡ Easy to add capacity
  24. 24. Disadvantages ➡ Intelligence moved from the database to the application (Queries have to be planned and indexed) ➡ Can’t join across partitions ➡ NO OLAP! (We consider this a benefit.)
  25. 25. Hibernate Shards Partitioned Hibernate from Google
  26. 26. Why did we build this thing again?
  27. 27. Oh wait, you have to tell it where things are.
  28. 28. Benefits of Shards ➡ Unified data access layer ➡ Result set aggregation across partitions ➡ Everyone in the JAVA-world knows Hibernate.
  29. 29. Working on simplifying it even further.
  30. 30. But does it go?
  31. 31. Case Study: CafePress ➡ Leader in User-generated Commerce ➡ More products than eBay (>250,000,000) ➡ 24/7/365
  32. 32. Case Study: Performance Requirements ➡ Thousands of queries/sec ➡ 10 : 1 read/write ratio ➡ Geographically distributed
  33. 33. Case Study: Test Environment ➡ Real schema ➡ Production-class hardware ➡ Partial data (~40M records)
  34. 34. CafePress HiveDB 2007 Performance Test Environment JMeter (1 thread) JMeter (no threads) command & client.jar client.jar control Measurement Test Controller Workstation Workstation 100MBit switch load generators JMeter (100s of threads) JMeter (100s of threads) client.jar client.jar 48GB backplane non- Dell 2950 / 2x2 Xeon blocking gigabit switch Dell 2950 / 2x2 Xeon 16GB, 6x72GB 15k 16GB, 6x72GB 15k web service Hardware LB (hivedb) Dell 1950 / 2x2 Xeon Dell 1950 / 2x2 Xeon Dell 1950 / 2x2 Xeon Tomcat 5.5 Tomcat 5.5 Tomcat 5.5 Directory Partition 0 Partition 1 databases (mysql) Dell 2950 / 2x2 Xeon Dell 2950 / 2x2 Xeon Dell 2950 / 2x2 Xeon 16GB, 6x72GB 15k 16GB, 6x72GB 15k 16GB, 6x72GB 15k jmccarthy@cafepress.com Modified on April 09 2007
  35. 35. Case Study: Performance Goals ➡ Large object reads: 1500/s ➡ Large object writes: 200/s ➡ Response time: 100ms
  36. 36. Case Study: Performance Results ➡ Large object reads: 1500/s Actual result: 2250/s ➡ Large object writes: 200/s Actual result: 300/s ➡ Response time: 100ms Actual result: 8ms
  37. 37. Case Study: Performance Results ➡ Max read throughput Actual result: 4100/s (CPU limited in Java layer; MySQL <25% utilized)
  38. 38. Case Study: Results ➡ Billions of queries served ➡ Highest DB uptime at CafePress ➡ Hundreds of millions of updates performed
  39. 39. Show don’t tell!
  40. 40. High Availability ➡ We don’t specify a fail over strategy. ➡ We don’t specify a backup or replication strategy.
  41. 41. Fail Over Options ➡ Load balancer ➡ MySQL Proxy ➡ Flipper / MMM ➡ HAProxy
  42. 42. Replication ➡ You should use MySQL replication. ➡ It really works. ➡ You can add in LVM snapshots for even more disastrous disaster recovery.
  43. 43. The Bottleneck There is one problem with HiveDB. The directory is a scaling bottleneck.
  44. 44. There’s no reason it has to be a database.
  45. 45. MemcacheD ➡ Hive directory implemented atop memcached
  46. 46. Roadmap ➡ MemcacheD directory ➡ Simplified Hibernate support ➡ Management web service ➡ Command-line tools ➡ Framework Adapters
  47. 47. Call for developers! ➡ Grails (GORM) ➡ ActiveRecord (JRuby on Rails) ➡ Ambition (JRuby or web service) ➡ iBatis ➡ JPA ➡ ... Postgres?
  48. 48. Contributing ➡ Post to the mailing list http://groups.google.com/group/hivedb-dev ➡ Comment on our site http://www.hivedb.org ➡ Submit a patch / pull request git clone git://github.com/britt/hivedb.git
  49. 49. Photo Credits ➡ http://www.flickr.com/photos/7362313@N07/1240245941/ sizes/o ➡ http://www.flickr.com/photos/ 99287245@N00/2229322675/sizes/o ➡ http://www.flickr.com/photos/ 51035555243@N01/26006138/ ➡ http://www.flickr.com/photos/vgm8383/2176897085/ sizes/o/ ➡ http://t-shirts.cafepress.com/item/money-organic-cotton- tee/73439294
  50. 50. Blobject ➡ Gets around the problem of ALTER statements. No data set of this size can be transformed synchronously. ➡ The hive can contain multiple versions of a serialized record. ➡ Compression
  51. 51. Hadoop! ➡ Query and transform blobbed data asynchronously.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×