Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Next generation databases july2010


Published on

Published in: Technology
  • Be the first to comment

Next generation databases july2010

  1. 1. This is Not Your Father’s Database: Everything You Need to Know Now About Cloud Computing and Emerging Database Technology <br />Guy Harrison<br />Director Research and Development, Melbourne<br /><br /><br />
  2. 2. Introductions<br />
  3. 3.
  4. 4.
  5. 5. Mainframes<br />After the gold rush<br />Minicomputers<br />Client Server<br />Internet/Y2K Boom<br />
  6. 6. Current Day Trends<br />Big Data<br />Cloud computing<br />Solid State Disk<br />
  7. 7. Big Data<br />The Industrial Revolution of data* <br />User generated data:<br />Twitter, Facebook, Amazon <br />Machine generated data:<br />RFID, POS, cell phones, GPS<br />Traditional RDBMS neither economic or capable<br />*<br />
  8. 8. Big data 1: Google <br />
  9. 9. Map Reduce <br />Map<br />Map<br />Map<br />Map<br />Map<br />Map<br />Map<br />Map<br />Map<br />Map<br />Map<br />Start<br />Reduce<br />Map<br />Map<br />Map<br />Map<br />Map<br />Map<br />Map<br />Map<br />Map<br />Map<br />Map<br />Map<br />Map<br />Map<br />Map<br />Map<br />Map<br />Map<br />Map<br />Map<br />Map<br />Map<br />Map<br />Map<br />
  10. 10. Hadoop: Open source Map-reduce <br />Yahoo! Hadoop cluster:<br />4000 nodes<br />16PB disk<br />64 TB of RAM<br />32,000 Cores<br />Very Low $/TB<br />
  11. 11. Hive<br />SQL<br />Java<br />Results<br />
  12. 12. Big Data 2: Web 2.0<br />
  13. 13. Twitter Growth<br />
  14. 14. The fail whale<br />
  15. 15. Web Servers<br />Memcached Servers<br />Database<br />Servers <br />Read Only Slaves <br />Shard (G-O)<br />Shard (P-Z)<br />Shard (A-F)<br />
  16. 16. Clouds and Elastic provisioning<br />Capacity / Demand<br />Demand<br />Hardware upgrade<br />Under provisioned<br />Capacity<br />Over provisioned<br />Time<br />
  17. 17. CAP Theorem<br />Availability<br />RD<br />B<br />M<br />S<br />Consistency<br />NO<br />GO<br />NoSQL<br />Partition<br />Tolerance<br />
  18. 18. In search of the elastic database <br />Big Web sites AND Cloud applications need servers that scale up (and down) on demand<br />Elastic provisioning works fine for web servers, application servers, etc.<br />However RDBMS does not scale easily:<br />SQL Azure limited to one database <50GB on a single host<br />Oracle’s RAC not supported in cloud environments<br />MySQL sharding “obnoxious”<br />Many are willing to sacrifice relational database features for scalability and operational simplicity<br />
  19. 19. The NoSQL movement<br />
  20. 20. NoSQL (A.K.A.) Cloud databases<br />Generally DO NOT support<br />SQL<br />Transactions<br />Immediate consistency <br />Usually DO support:<br />Elasticity (scale out AND in)<br />Eventual consistency<br />Inherent redundancy and fault tolerance <br />
  21. 21. NoSQL Data Models<br />
  22. 22. MemcacheDB<br />Azure Table Services<br />Key Value Stores<br />Redis<br />Tokyo Cabinet<br />SimpleDB<br />Riak<br />Amazon Dynamo<br />Voldemort<br />Google BigTable<br />Cassandra<br />Hbase<br />Hypertable<br />CouchDB<br />Document DB<br />JSON/XML DB<br />MongoDB<br />Neo4J<br />Graph Databases<br />FlockDB<br />
  23. 23. Not so easy to get the data out....<br />
  24. 24. Amazon AWS Cloud<br />On-Premise<br /> (AKA private Cloud)<br />MySQL<br />Data Hub<br />SQL<br />HBase<br />SimpleDB<br />SQL<br />Data Hub<br />Microsoft Azure Cloud<br />SQL Azure<br />Table Services<br />SQL Server<br />Oracle<br />
  25. 25.
  26. 26. Big Data 3: Data Warehousing <br />
  27. 27. Data Warehouse players<br />
  28. 28. DATAllegro architecture<br />
  29. 29. Column Databases (Vertica, Sybase)<br />Data is stored together in columns<br />Very fast answers to analytic aggregate queries<br />Better compression<br />Not write optimized<br />
  30. 30. Disk drives and Moore’s law<br />Transistor density doubles every 18 months<br />Exponential growth is observed in most electronic components:<br />CPU clock speeds<br />RAM<br />Hard Disk Drive storage density <br />But not in mechanical components<br />Service time (Seek latency) – limited by actuator arm speed and disk circumference <br />Throughput (rotational latency) – limited by speed of rotation, circumference and data density<br />
  31. 31. Big Data vs. Fast Data<br />Disk trends 2001-2009<br />
  32. 32. SSD to the rescue?<br />
  33. 33. Power consumption<br />
  34. 34. Economics of SSD<br />
  35. 35. Fast reads but slow writes<br />
  36. 36. Hierarchical storage management <br />$/GB<br />$/IOP<br />
  37. 37. In Memory Databases: VoltDB & H-Store<br />In Memory Distributed (“Sharded”) Database<br />No transactional IO<br />ACID transactions (k-safety)<br />Single Threaded (no latches or locks)<br />Java Stored Procedure transactions<br />Hierarchical data model <br /><ul><li>Double Shared Nothing (disk OR CPU)
  38. 38. Spool out to DW for ad-hoc analysis
  39. 39. Very high TPS for suitable applications</li></li></ul><li>Oracle EXADATA<br />RAC clusters provide MPP<br />Dedicated storage servers<br />High Speed infiniband channels <br />Smart storage reduces data transfer requirements <br />Hybrid Flash & spinning disk storage system<br />Flash caching in the database systems<br />
  40. 40. The Next Generation?<br />