Hpts 2011 flexible_oltp


Published on

Flexible OLTP data models in the future

There has been a flurry of highly scalable data stores and a dramatic spike in the interest level. The solutions with the most mindshare seem to be inspired by Dynamo's (Amazon) eventually consistency model or a data model that promotes nested, self-describing data structures like BigTable from Google. At the same time you see projects within these corporations evolving to architectures like MegaStore and Dremel (Google) where features from the column-oriented data model is blended together with the relational model.

The shift from just highly structured data to unstructured and semistructured content is evident. New applications are being developed or existing applications are being modified at break neck speed. Developers want the data model evolution to be extremely simple and want support for nested structures so they can map to representations like JSON with ease so there is little impedance between the application programming model and the database. Next generation enterprise applications will increasingly work with structured and semi-structured data from a multitude of data sources. A pure relational model is too rigid and a pure BigTable like model has too many shortcomings and cannot be integrated with existing relational databases systems.

In this talk, I walk through an alternative. We prefer the familiar "row oriented" over "column oriented" approach but still tilt the relational model - mostly the schema definition to support partitioning and colocation, redundancy level and support for dynamic and nested columns.
Each of these extensions will support different desired attributes - partitioning and colocation primitives cover horizontal scaling, availability primitives allow explicit support for replication model and the placement policies (local vs across data centers), dynamic columns will address flexibility for schema evolution (different rows have different columns and added with no DDL requirements) and nested columns that support organizing data in a hierarchy.

We draw inspiration for the data model from Pat helland's 'Life beyond distributed transactions' by adopting entity groups as a first class artifact designers start with, and define relationships between entities within the group (associations based on reference as well as containment). Rationalizing the design around entity groups will force the designer to think about data access patterns and how the data will be colocated in partitions. We then cover why ACID properties and sophiticated querying becomes significantly less challenging to accomplish. There are many ideas around partitioning policies, tradeoffs in supporting transactions and joins across entity groups that are worth discussion.

The idea is to present a model and generate discussion on how to achieve the best of both worlds. Flexible schemas without losing referential integrity, support for associations and the po

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Hpts 2011 flexible_oltp

  1. 1. High Performance Transaction Systems, 2011, Asilomar, CA Flexible OLTP Data models in the future Jags Ramnarayan Disclaimer: Any positions expressed here are my own and do not necessarily reflect the positions of my employer VMWare.Confidential
  2. 2. Agenda Perspective on some trends Basic concepts in VMWare GemFire/SQLFire Beyond key based partitioning Beyond the SQL Data model2 Confidential
  3. 3. Trends, Observations High demand for low/predicable latency, handle huge load spikes, in-memory on commodity, big data Input is streaming in nature • High, bursty rates ... structured and unstructured • continuous correlations and derived events Increasingly data is bi-temporal in nature • very high ingest rates that tend to be bursty • optimizations for inserts and mass migration of historical data to data warehouse. • occasional joins across in-memory and data warehouse3 Confidential
  4. 4. Trends, Observations DB schema rapidly evolving • Services are added/changed every week... DB model cannot be rigid • programmer drives the change • DBA only for operational support? DB Instance is ACID but nothing ACID across the enterprise • many silos and data duplicated across independent databases • Cleansing, de-duplication is fact of life and will never go away So, why is ACID so important for most use cases? • Folks want deterministic outcome not ACID4 Confidential
  5. 5. VMWare offering - vFabric GemFire (GA), SQLFire (in beta) GemFire: Distributed, memory oriented, Object (KV) data management SQLFire: Similar but SQL is the interface Target market today • OLTP upto few TB range (all in memory) • real-time, low latency, very high concurrent load • Not focused on “big data” batch analytics5 Confidential
  6. 6. Some random characteristics66 Confidential
  7. 7. What is different?77 Confidential
  8. 8. Beyond Key based Hash Partitioning • We all know Hash partitioning provides uniform load balance • List, range, or using custom application expression • Exploit OLTP characteristics for partitioning • Often it is the number of entities that grows over time and not the size of the entity. • Customer count perpetually grows, not the size of the customer info • Most often access is very restricted to a few entities • given a FlightID, fetch flightAvailability records • given a customerID, add/remove orders, shipment records • Root entity frequently fetched with its immediate children8 Confidential
  9. 9. Grouping entities • Related entities share a "entity group" key and are colocated • Grouping based on foreign key relationships: look for FK in the compound PK • advantage here is that not all entities in group have to share the same key Entity Groups FlightID is the entity group KeyCreateTable FlightAvailability(..) partitioned by FlightID colocated with Flights 9 Confidential
  10. 10. Why does this scale? • requests pruned to a single node or subset of cluster • Transactional "write set" is mostly confined to a single entity group • Unit of serializability now confined to a single "primary" member managing the entity group • Common query joins: across tables that belong to the same group • If all concurrent access were to be uniformly distributed across the "entity group" set then you can linearly scale with cluster size10 Confidential
  11. 11. Invariably, access patterns are more complex • Scalable joins when entity grouping is not possible • Reference tables • M-M relationships • Distributed joins impedes scaling significantly • pipelining intermediate data sets impacts other concurrent activity • Answer today: • Use replicated tables for reference data • one side in the M-M • Assumptions • update rate on reference data is low • one side of the M-M related tables is small and infrequently changing11 Confidential
  12. 12. It doesn’t end here • realizing a "partition aware" design is difficult • 80-20 rule: 80% of access at a point in time is on 20% of the data • lumpy distribution causes hotspots • hash partitioning solves this but doesnt help range searches • some help: Multi-attribute Grid declustering • rebalancing may not help as the entity group (the lump) is a unit of redistribution • Static grouping vs dynamic grouping • e.g online gaming: multiple players that all have to be grouped together lasts only for a game (http://www.cs.ucsb.edu/~sudipto/papers/socc10-das.pdf)12 Confidential
  13. 13. “Good enough” scalable transactions • Assumptions • Small in duration and “write set” • Conflicts are rare • Single row operations always atomic and isolated • No statement level read consistency for queries • Writers almost never block readers • Single phase commit protocol • Eagerly “write lock(local)” on each cohort. • “Fail fast” if lock cannot be acquired • Transaction isolation at commit time is guaranteed on "write set" in a single partition13 Confidential
  14. 14. Rough thoughts on “Schema flexibility” • New generation of developers don’t seem to like Schemas  • Drivers • Many source of data: it is semi-structured and changing rapidly • DB model changes are frequent • Adding UDTs and altering tables seen as "rigid“ • E.g. • E-commerce app introduces a few products with a stable schema Source: http://www.nosqldatabases.com/main/2011/4/11/augmen ting-rdbms-with-mongodb-for-ecommerce.html14 Confidential
  15. 15. “Schema free”, “Schema less”, etc • Then, keeps adding support for new products • Or, keeps removing products • XML datatypes or UDTs or organizing tables in a hierarchy is unnatural and complex • JSON is considered fat free alternative to XML15 Confidential
  16. 16. The “Polyglot” Data store • Current thinking Single OLTP data store for: 1. complex, obese, perpetually changing object graphs session state, workflow state 2. Highly structured, transactional data sourced from enterprise DBs 3. semi-structured, self describing, rapidly evolving data syndicated content, etc Distributed data store that supports Objects, SQL and JSON ?16 Confidential
  17. 17. Object columns with dynamic attributes • Extend SQL with dynamic, self describing attributes contained in Object columns • Object columns are containers for self describing K-V pairs (think JSON) • values can be objects themselves supporting nesting (composition) • Can contain collections • Very easy in most object environments • Reflection provides dynamic type under the covers • And, hence the object fields become queriable. For interoperability, the type system could be JSON17 Confidential
  18. 18. Some Examples with Object columns 1. Session State- Object tables easily integrate with session state modules in popular app servers create table sessionState (key String, value Object) hash partitioned redundancy level 1; 2. Semi-structured docs create table myDocuments (key varchar, documentID varchar, creationTime date, doc Object, tags Object) hash partitioned redundancy level 1; - doc could be a JSON object with each row having different attributes in the object - tags is a collection of strings18 Confidential
  19. 19. More information at http://communities.vmware.com/community/vmtn/appplatform/v fabric_sqlfire Q&A19 Confidential