Cloudcon East Presentation
Upcoming SlideShare
Loading in...5

Cloudcon East Presentation



Introduction to HiveDB presented at Cloudcon East.

Introduction to HiveDB presented at Cloudcon East.



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Cloudcon East Presentation Cloudcon East Presentation Presentation Transcript

  • Scaling with HiveDB
  • The Problem • Hundreds of millions of products • Millions of new products every week • Accelerating growth
  • Scaling up is no longer an option.
  • Requirements ➡ OLTP optimized ➡ Constant response time is more important than low-latency ➡ Expand capacity online ➡ Ability to move records ➡ No re-sharding
  • 2 years ago Clustered Non-relational Unstructured Oracle RAC MS SQL Server Hadoop MySQL Cluster MogileFS Teradata (OLAP) S3
  • Today Clustered Non-relational Unstructured Oracle RAC MS SQL Server Hypertable Hadoop MySQL Cluster HBase MogileFS DB2 CouchDB S3 Teradata (OLAP) SimpleDB HiveDB
  • System Storage Interface Partitioning Expansion Node Types Maturity No Oracle RAC Shared SQL Transparent Identical 7 years downtime Memory Requires MySQL Cluster SQL Transparent Mixed 3 years / Local Restart ? No Hypertable Local HQL Transparent Mixed (released downtime 2/08) Degraded 3 years DB2 Local SQL Fixed Hash Performan Identical (25 years ce total) Key-based No 18 months HiveDB Local SQL Mixed Directory downtime (+13 years!)
  • Operating a distributed system is hard.
  • Non-functional concerns quickly dominate functional concerns.
  • Replication Fail over Monitoring
  • Our approach: minimize cleverness
  • Build atop solid, easy to manage technologies.
  • So we can get back to selling t-shirts.
  • MySQL is fast.
  • MySQL is well understood.
  • MySQL replication works.
  • MySQL is free.
  • The same for JAVA
  • The simplest thing ➡ HiveDB is a JDBC Gatekeeper for applications. ➡ Lightest integration ➡ Smallest possible implementation
  • The eBay approach ➡ Pro: Internally consistent data ➡ Con: Re-sharding of data required ➡ Con: Re-sharding is a DBA operation
  • Partition by key A simple bucketed partitioning scheme expressed in code.
  • Partition by key
  • Directory ➡ No broadcasting ➡ No re-partitioning ➡ Easy to relocate records ➡ Easy to add capacity
  • Disadvantages ➡ Intelligence moved from the database to the application (Queries have to be planned and indexed) ➡ Can’t join across partitions ➡ NO OLAP! (We consider this a benefit.)
  • Hibernate Shards Partitioned Hibernate from Google
  • Why did we build this thing again?
  • Oh wait, you have to tell it where things are.
  • Benefits of Shards ➡ Unified data access layer ➡ Result set aggregation across partitions ➡ Everyone in the JAVA-world knows Hibernate.
  • Working on simplifying it even further.
  • But does it go?
  • Case Study: CafePress ➡ Leader in User-generated Commerce ➡ More products than eBay (>250,000,000) ➡ 24/7/365
  • Case Study: Performance Requirements ➡ Thousands of queries/sec ➡ 10 : 1 read/write ratio ➡ Geographically distributed
  • Case Study: Test Environment ➡ Real schema ➡ Production-class hardware ➡ Partial data (~40M records)
  • CafePress HiveDB 2007 Performance Test Environment JMeter (1 thread) JMeter (no threads) command & client.jar client.jar control Measurement Test Controller Workstation Workstation 100MBit switch load generators JMeter (100s of threads) JMeter (100s of threads) client.jar client.jar 48GB backplane non- Dell 2950 / 2x2 Xeon blocking gigabit switch Dell 2950 / 2x2 Xeon 16GB, 6x72GB 15k 16GB, 6x72GB 15k web service Hardware LB (hivedb) Dell 1950 / 2x2 Xeon Dell 1950 / 2x2 Xeon Dell 1950 / 2x2 Xeon Tomcat 5.5 Tomcat 5.5 Tomcat 5.5 Directory Partition 0 Partition 1 databases (mysql) Dell 2950 / 2x2 Xeon Dell 2950 / 2x2 Xeon Dell 2950 / 2x2 Xeon 16GB, 6x72GB 15k 16GB, 6x72GB 15k 16GB, 6x72GB 15k Modified on April 09 2007
  • Case Study: Performance Goals ➡ Large object reads: 1500/s ➡ Large object writes: 200/s ➡ Response time: 100ms
  • Case Study: Performance Results ➡ Large object reads: 1500/s Actual result: 2250/s ➡ Large object writes: 200/s Actual result: 300/s ➡ Response time: 100ms Actual result: 8ms
  • Case Study: Performance Results ➡ Max read throughput Actual result: 4100/s (CPU limited in Java layer; MySQL <25% utilized)
  • Case Study: Results ➡ Billions of queries served ➡ Highest DB uptime at CafePress ➡ Hundreds of millions of updates performed
  • Show don’t tell!
  • High Availability ➡ We don’t specify a fail over strategy. ➡ We don’t specify a backup or replication strategy.
  • Fail Over Options ➡ Load balancer ➡ MySQL Proxy ➡ Flipper / MMM ➡ HAProxy
  • Replication ➡ You should use MySQL replication. ➡ It really works. ➡ You can add in LVM snapshots for even more disastrous disaster recovery.
  • The Bottleneck There is one problem with HiveDB. The directory is a scaling bottleneck.
  • There’s no reason it has to be a database.
  • MemcacheD ➡ Hive directory implemented atop memcached
  • Roadmap ➡ MemcacheD directory ➡ Simplified Hibernate support ➡ Management web service ➡ Command-line tools ➡ Framework Adapters
  • Call for developers! ➡ Grails (GORM) ➡ ActiveRecord (JRuby on Rails) ➡ Ambition (JRuby or web service) ➡ iBatis ➡ JPA ➡ ... Postgres?
  • Contributing ➡ Post to the mailing list ➡ Comment on our site ➡ Submit a patch / pull request git clone git://
  • Photo Credits ➡ sizes/o ➡ 99287245@N00/2229322675/sizes/o ➡ 51035555243@N01/26006138/ ➡ sizes/o/ ➡ tee/73439294
  • Blobject ➡ Gets around the problem of ALTER statements. No data set of this size can be transformed synchronously. ➡ The hive can contain multiple versions of a serialized record. ➡ Compression
  • Hadoop! ➡ Query and transform blobbed data asynchronously.