The World of Structured Storage System


Published on

Published in: Technology, Business
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

The World of Structured Storage System

  1. 1. The World of Structured Storage System one size does not fit all Schubert Zhang, Nov.2009
  2. 2. Summary <ul><li>Structured Storage, differentiate from file stores and blob stores. </li></ul><ul><li>Both relational and non-relational structured storage systems are important. </li></ul><ul><li>No single solution is appropriate for all applications. </li></ul><ul><li>Application-Intent Taxonomy * </li></ul><ul><ul><li>Features-first stores </li></ul></ul><ul><ul><li>Scale-first stores </li></ul></ul><ul><ul><li>Simple structure stores </li></ul></ul><ul><ul><li>Batch-analytic stores </li></ul></ul><ul><ul><li>Purpose-optimized stores </li></ul></ul>
  3. 3. Features-first Stores (main stream: RDBMS) <ul><li>Non-sharded RDBMS </li></ul><ul><li>For feature-rich applications </li></ul><ul><li>Usually OLTP </li></ul><ul><li>Low latency (real time) </li></ul><ul><li>Some in-database calculations, indexing (primary, secondary), strong relation model (1,2,3). </li></ul><ul><li>Examples of common workloads </li></ul><ul><ul><li>enterprise financial systems </li></ul></ul><ul><ul><li>human resources systems </li></ul></ul><ul><ul><li>customer relationship management systems </li></ul></ul><ul><ul><li>etc. </li></ul></ul><ul><li>Examples of products </li></ul><ul><ul><li>Oracle, SQL Server, DB2, MySQL, PostgreSQL, … </li></ul></ul><ul><ul><li>Cloud Solution: Amazon Relational Database Service (Amazon RDS) </li></ul></ul>
  4. 4. Scale-first Stores (main stream: key-value store) <ul><li>Scale is more important than features. </li></ul><ul><li>Must scale without bound and being able to do this without restriction. </li></ul><ul><li>Impossible to run on a single RDBMS. </li></ul><ul><li>Usually OLTP </li></ul><ul><li>Low latency (real time) </li></ul><ul><li>Less in-database calculations, less indexing, less relations. </li></ul><ul><li>Tow solutions </li></ul><ul><ul><li>Shard the application data over a large number of RDBMS instances. </li></ul></ul><ul><ul><ul><li>The data is sharded over 10s or even 100s of independent database instances. </li></ul></ul></ul><ul><ul><ul><li>Not expect cross-database joins, aggregations, global secondary indexes, global stored procedures, and all the other relational database features that are incredibly hard to scale. </li></ul></ul></ul><ul><ul><ul><li>Example: Windows Live Messenger. </li></ul></ul></ul><ul><ul><ul><li>DB2 Parallel Edition, Oracle RAC: support full relational model, but still not good. </li></ul></ul></ul><ul><ul><li>Use a highly scalable key-value store. </li></ul></ul><ul><ul><ul><li>Some key-value store product examples include: Project Voldemort , Ringo , Scalaris , Kai , Dynomite , MemcacheDB , ThruDB , CouchDB , Cassandra , HBase and Hypertable </li></ul></ul></ul><ul><ul><ul><li>Cloud Solution: Amazon SimpleDB. </li></ul></ul></ul><ul><ul><ul><li>Simple primary indexing (distributed B+Tree, hash, partition, etc.). </li></ul></ul></ul><ul><li>Examples of applications </li></ul><ul><ul><li>Very high scale web sites such as Facebook, MySpace, Gmail, Yahoo, and </li></ul></ul><ul><ul><li>Some of these sites actually do make use of relational databases but many do not. </li></ul></ul>
  5. 5. Simple Structured Stores (main stream: key-value store) <ul><li>Many applications don’t need the features, cost, or complexity of an RDBMS, nor the high scalability. </li></ul><ul><li>Just need a simple key-value store. </li></ul><ul><li>Simple query </li></ul><ul><li>Simple Index access </li></ul><ul><li>Simple, cheap, fast, and low operational burden </li></ul><ul><li>Examples </li></ul><ul><ul><li>Low-end: BerkeleyDB </li></ul></ul><ul><ul><li>High-end: Cassandra, Project-Voldemort, Dynamo, etc. </li></ul></ul><ul><ul><li>Cloud: Amazon SimpleDB </li></ul></ul>
  6. 6. Batch-analytic stores/Data Warehouses (main stream: with MapReduce) <ul><li>Traditional Data Warehouse </li></ul><ul><ul><li>Based on RDBMS </li></ul></ul><ul><ul><li>Multidimensional data model, Data Cubes. (Stars, Snowflakes Schemas) </li></ul></ul><ul><ul><li>OLAP </li></ul></ul><ul><ul><li>Big, but not large enough for modern data scale. Hard to scale </li></ul></ul><ul><li>Fashional Data Warehouse </li></ul><ul><ul><li>Sharded/Partitioned data storage (by RDBMS, MPP Database, or proprietary SQL stores) </li></ul></ul><ul><ul><li>Enhanced by MapReduce for queries </li></ul></ul><ul><ul><li>Large, but not unbounded. Not easy to scale, not real distributed. </li></ul></ul><ul><ul><li>Examples: </li></ul></ul><ul><ul><ul><li>Greenplum (SQL+MapReduce+PostgreSQL) </li></ul></ul></ul><ul><ul><ul><li>Aster Data (SQL+MapReduce+MPP RDBMS) </li></ul></ul></ul><ul><ul><ul><li>HadoopDB (SQL+Hive+Hadoop MapReduce+RDBMS) </li></ul></ul></ul>
  7. 7. Purpose-Optimized Stores <ul><li>Columnar DB/DW </li></ul><ul><ul><li>Vertica (analytic, columnar, aggressive compression, shared nothing, hybrid data store, fast write, fast read) </li></ul></ul><ul><li>DW Appliances (Special hardware and solutions) </li></ul><ul><ul><li>Teradata </li></ul></ul><ul><ul><li>Netezza </li></ul></ul><ul><li>Special DB </li></ul><ul><ul><li>StreamBase, etc. </li></ul></ul><ul><ul><li>… </li></ul></ul>
  8. 8. Future DBMS/DW? <ul><li>????? Mixed / Hybrid ???? </li></ul><ul><li>Stores raw data + Calculated informational data. </li></ul><ul><li>Raw data are collected, historic data. Be stored in distributed data storage system </li></ul><ul><ul><li>Shared nothing </li></ul></ul><ul><ul><li>Large scale. </li></ul></ul><ul><ul><li>Commodity Hardware </li></ul></ul><ul><li>Informational data be stored in RDBMS (small size) or distributed key-value stores. </li></ul><ul><li>MapReduce </li></ul><ul><li>Examples: </li></ul><ul><ul><li>Hive (more need to be completed) </li></ul></ul><ul><ul><li>To be created. </li></ul></ul>
  9. 9. Amazon AWS Cloud Structured Storage Solutions <ul><li>For feature-first applications </li></ul><ul><ul><li>RDS (Cloud based Relational Database service, MySQL) </li></ul></ul><ul><ul><li>Amazon EC2- RDMBS AMI (+ EBS) </li></ul></ul><ul><li>For scale-first applications </li></ul><ul><ul><li>SimpleDB (Cloud based simple key-value store, no relation model) </li></ul></ul><ul><ul><li>Amazon EC2- KeyValue AMI (+EBS) </li></ul></ul><ul><li>The guess for Amazon’s future structured storage solution. </li></ul><ul><ul><li>Enhanced SimpleDB (enhanced scalability, etc. May take some ideas from BigTable, Dynamo, etc.) </li></ul></ul><ul><ul><li>New high scalable key-value stores, such as HBase (when it become stronger). </li></ul></ul>
  10. 10. There is no one-size-fits-all solution! <ul><li>There are too many contradictory requirements in the structured data world. </li></ul><ul><li>The contradiction of data processing </li></ul><ul><ul><li>Real-time or near-real-tome data availability, support “up-to-now” data measurement. </li></ul></ul><ul><ul><li>Batch processing for large size of data, such as aggregation. </li></ul></ul><ul><li>The contradiction of data access: </li></ul><ul><ul><li>Low-latency fast query response, like Lookup. </li></ul></ul><ul><ul><li>High-latency ad-hoc analytic query for historical data. </li></ul></ul><ul><li>But, there is no one-size-fits-all answer for above contradictory requirements. </li></ul><ul><li>“ Important not to try to be all things to all people!” – Jeff Dean, Keynote at LADIS’09 </li></ul>
  11. 11. References <ul><li>One Size Fits All: An Idea Whose Time Has Come and Gone: http:// </li></ul><ul><li>James Hamilton's Blog: </li></ul>