SlideShare a Scribd company logo
1 of 19
Download to read offline
High Performance Transaction Systems, 2011, Asilomar, CA



               Flexible OLTP Data models in the
                            future
                                                         Jags Ramnarayan




               Disclaimer:
               Any positions expressed here are my own and do not
               necessarily reflect the positions of my employer VMWare.



Confidential
Agenda

 Perspective on some trends

 Basic concepts in VMWare GemFire/SQLFire

 Beyond key based partitioning

 Beyond the SQL Data model




2                        Confidential
Trends, Observations
 High demand for low/predicable latency, handle huge
  load spikes, in-memory on commodity, big data

 Input is streaming in nature
  • High, bursty rates ... structured and unstructured
  • continuous correlations and derived events

 Increasingly data is bi-temporal in nature
  • very high ingest rates that tend to be bursty
  • optimizations for inserts and mass migration of historical
    data to data warehouse.
  • occasional joins across in-memory and data warehouse


3                               Confidential
Trends, Observations
 DB schema rapidly evolving
  • Services are added/changed every week... DB model cannot be
     rigid
    • programmer drives the change
    • DBA only for operational support?

 DB Instance is ACID but nothing ACID across the
  enterprise
  • many silos and data duplicated across independent databases
  • Cleansing, de-duplication is fact of life and will never go
    away

 So, why is ACID so important for most use cases?
 • Folks want deterministic outcome not ACID

4                             Confidential
VMWare offering - vFabric GemFire (GA), SQLFire (in beta)


 GemFire: Distributed, memory oriented, Object (KV)
  data management

 SQLFire: Similar but SQL is the interface

 Target market today
  • OLTP upto few TB range (all in memory)
  • real-time, low latency, very high concurrent load
  • Not focused on “big data” batch analytics




5                            Confidential
Some random characteristics




66                                 Confidential
What is different?




77                        Confidential
Beyond Key based Hash Partitioning

    • We all know Hash partitioning provides uniform load balance
    • List, range, or using custom application expression

    • Exploit OLTP characteristics for partitioning

      • Often it is the number of entities that grows over time and not the size
       of the entity.
        • Customer count perpetually grows, not the size of the
          customer info

      • Most often access is very restricted to a few entities
         • given a FlightID, fetch flightAvailability records
         • given a customerID, add/remove orders, shipment records

      • Root entity frequently fetched with its immediate children


8                                        Confidential
Grouping entities

     • Related entities share a "entity group" key and are colocated
     • Grouping based on foreign key relationships: look for FK in the
      compound PK
      • advantage here is that not all entities in group have to share the same key



                                         Entity Groups

                                                           FlightID is the
                                                           entity group Key




CreateTable FlightAvailability(..) partitioned by FlightID colocated with Flights

 9                                          Confidential
Why does this scale?

 • requests pruned to a single node or subset of cluster

     • Transactional "write set" is mostly confined to a single entity
      group

     • Unit of serializability now confined to a single "primary" member
      managing the entity group

     • Common query joins: across tables that belong to the same
      group

     • If all concurrent access were to be uniformly distributed across
      the "entity group" set then you can linearly scale with cluster
      size




10                                   Confidential
Invariably, access patterns are more complex

 • Scalable joins when entity grouping is not possible
     • Reference tables
     • M-M relationships
 • Distributed joins impedes scaling significantly
     • pipelining intermediate data sets impacts other concurrent activity


 • Answer today:
     • Use replicated tables for reference data
     • one side in the M-M

     • Assumptions
        • update rate on reference data is low
        • one side of the M-M related tables is small and infrequently
          changing



11                                          Confidential
It doesn’t end here

 • realizing a "partition aware" design is difficult
 • 80-20 rule: 80% of access at a point in time is on 20% of the data

 • lumpy distribution causes hotspots
     • hash partitioning solves this but doesn't help range
      searches
     • some help: Multi-attribute Grid declustering
     • rebalancing may not help as the entity group (the lump) is
      a unit of redistribution

 • Static grouping vs dynamic grouping
     • e.g online gaming: multiple players that all have to be
      grouped together lasts only for a game
      (http://www.cs.ucsb.edu/~sudipto/papers/socc10-das.pdf)




12                                      Confidential
“Good enough” scalable transactions

 • Assumptions
     • Small in duration and “write set”
     • Conflicts are rare

 • Single row operations always atomic and isolated
 • No statement level read consistency for queries
       • Writers almost never block readers

 • Single phase commit protocol
     • Eagerly “write lock(local)” on each cohort.

     • “Fail fast” if lock cannot be acquired

     • Transaction isolation at commit time is guaranteed on "write
      set" in a single partition


13                                  Confidential
Rough thoughts on “Schema flexibility”

 • New generation of developers don’t seem to like Schemas 
 • Drivers
   • Many source of data: it is semi-structured and changing rapidly
   • DB model changes are frequent
   • Adding UDTs and altering tables seen as "rigid“
 • E.g.
     • E-commerce app introduces a few products with a stable schema




                                                       Source:
                                                       http://www.nosqldatabases.com/main/2011/4/11/augmen
                                                       ting-rdbms-with-mongodb-for-ecommerce.html


14                                      Confidential
“Schema free”, “Schema less”, etc

     • Then, keeps adding support for new products
        • Or, keeps removing products




       • XML datatypes or UDTs or organizing tables in a hierarchy is
         unnatural and complex
       • JSON is considered fat free alternative to XML
15                                   Confidential
The “Polyglot” Data store

 • Current thinking

 Single OLTP data store for:
 1. complex, obese, perpetually changing object graphs
         session state, workflow state

 2. Highly structured, transactional data
         sourced from enterprise DBs

 3. semi-structured, self describing, rapidly evolving
       data
         syndicated content, etc


     Distributed data store that supports Objects, SQL and JSON ?


16                                 Confidential
Object columns with dynamic attributes


 • Extend SQL with dynamic, self describing attributes
     contained in Object columns

 • Object columns are containers for self describing K-V
     pairs (think JSON)
     • values can be objects themselves supporting nesting
      (composition)

 • Can contain collections

 • Very easy in most object environments
     • Reflection provides dynamic type under the covers
     • And, hence the object fields become queriable. For
      interoperability, the type system could be JSON


17                              Confidential
Some Examples with Object columns


 1. Session State- Object tables easily integrate with
      session state modules in popular app servers
     create table sessionState (key String, value
     Object) hash partitioned redundancy level 1;



 2. Semi-structured docs
     create table myDocuments (key varchar,
     documentID varchar, creationTime date, doc
     Object, tags Object) hash partitioned redundancy
     level 1;
      - doc could be a JSON object with each row having different
      attributes in the object
      - tags is a collection of strings




18                                  Confidential
More information at
     http://communities.vmware.com/community/vmtn/appplatform/v
     fabric_sqlfire



     Q&A

19                             Confidential

More Related Content

What's hot

Mongo db model relationships with documents
Mongo db model relationships with documentsMongo db model relationships with documents
Mongo db model relationships with documentsDr. Awase Khirni Syed
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
 
Comparison between rdbms and nosql
Comparison between rdbms and nosqlComparison between rdbms and nosql
Comparison between rdbms and nosqlbharati k
 
SeaJUG May 2012 mybatis
SeaJUG May 2012 mybatisSeaJUG May 2012 mybatis
SeaJUG May 2012 mybatisWill Iverson
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.Taras Matyashovsky
 
2010 05-21, object-relational mapping using hibernate v2
2010 05-21, object-relational mapping using hibernate v22010 05-21, object-relational mapping using hibernate v2
2010 05-21, object-relational mapping using hibernate v2alvaro alcocer sotil
 
ORM, JPA, & Hibernate Overview
ORM, JPA, & Hibernate OverviewORM, JPA, & Hibernate Overview
ORM, JPA, & Hibernate OverviewBrett Meyer
 
ARCHITECTING LARGE ENTERPRISE JAVA PROJECTS - vJUG
ARCHITECTING LARGE ENTERPRISE JAVA PROJECTS - vJUGARCHITECTING LARGE ENTERPRISE JAVA PROJECTS - vJUG
ARCHITECTING LARGE ENTERPRISE JAVA PROJECTS - vJUGMarkus Eisele
 
What ya gonna do?
What ya gonna do?What ya gonna do?
What ya gonna do?CQD
 
Avoiding.the.pitfallsof.oracle.migration.2013
Avoiding.the.pitfallsof.oracle.migration.2013Avoiding.the.pitfallsof.oracle.migration.2013
Avoiding.the.pitfallsof.oracle.migration.2013EDB
 
Webinar: MongoDB and Polyglot Persistence Architecture
Webinar: MongoDB and Polyglot Persistence ArchitectureWebinar: MongoDB and Polyglot Persistence Architecture
Webinar: MongoDB and Polyglot Persistence ArchitectureMongoDB
 
Non-Relational Databases at ACCU2011
Non-Relational Databases at ACCU2011Non-Relational Databases at ACCU2011
Non-Relational Databases at ACCU2011Gavin Heavyside
 
Orm and hibernate
Orm and hibernateOrm and hibernate
Orm and hibernates4al_com
 
Distributed applications using Hazelcast
Distributed applications using HazelcastDistributed applications using Hazelcast
Distributed applications using HazelcastTaras Matyashovsky
 
Demystfying nosql databases
Demystfying nosql databasesDemystfying nosql databases
Demystfying nosql databasesMike King
 
NoSQL and MySQL webinar - best of both worlds
NoSQL and MySQL webinar - best of both worldsNoSQL and MySQL webinar - best of both worlds
NoSQL and MySQL webinar - best of both worldsMat Keep
 

What's hot (20)

Mongo db model relationships with documents
Mongo db model relationships with documentsMongo db model relationships with documents
Mongo db model relationships with documents
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
Comparison between rdbms and nosql
Comparison between rdbms and nosqlComparison between rdbms and nosql
Comparison between rdbms and nosql
 
Persistence hibernate
Persistence hibernatePersistence hibernate
Persistence hibernate
 
Hibernate
HibernateHibernate
Hibernate
 
SeaJUG May 2012 mybatis
SeaJUG May 2012 mybatisSeaJUG May 2012 mybatis
SeaJUG May 2012 mybatis
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.
 
2010 05-21, object-relational mapping using hibernate v2
2010 05-21, object-relational mapping using hibernate v22010 05-21, object-relational mapping using hibernate v2
2010 05-21, object-relational mapping using hibernate v2
 
ORM, JPA, & Hibernate Overview
ORM, JPA, & Hibernate OverviewORM, JPA, & Hibernate Overview
ORM, JPA, & Hibernate Overview
 
ARCHITECTING LARGE ENTERPRISE JAVA PROJECTS - vJUG
ARCHITECTING LARGE ENTERPRISE JAVA PROJECTS - vJUGARCHITECTING LARGE ENTERPRISE JAVA PROJECTS - vJUG
ARCHITECTING LARGE ENTERPRISE JAVA PROJECTS - vJUG
 
What ya gonna do?
What ya gonna do?What ya gonna do?
What ya gonna do?
 
Avoiding.the.pitfallsof.oracle.migration.2013
Avoiding.the.pitfallsof.oracle.migration.2013Avoiding.the.pitfallsof.oracle.migration.2013
Avoiding.the.pitfallsof.oracle.migration.2013
 
Webinar: MongoDB and Polyglot Persistence Architecture
Webinar: MongoDB and Polyglot Persistence ArchitectureWebinar: MongoDB and Polyglot Persistence Architecture
Webinar: MongoDB and Polyglot Persistence Architecture
 
iForum 2015: SQL vs. NoSQL
iForum 2015: SQL vs. NoSQLiForum 2015: SQL vs. NoSQL
iForum 2015: SQL vs. NoSQL
 
Nosql intro
Nosql introNosql intro
Nosql intro
 
Non-Relational Databases at ACCU2011
Non-Relational Databases at ACCU2011Non-Relational Databases at ACCU2011
Non-Relational Databases at ACCU2011
 
Orm and hibernate
Orm and hibernateOrm and hibernate
Orm and hibernate
 
Distributed applications using Hazelcast
Distributed applications using HazelcastDistributed applications using Hazelcast
Distributed applications using Hazelcast
 
Demystfying nosql databases
Demystfying nosql databasesDemystfying nosql databases
Demystfying nosql databases
 
NoSQL and MySQL webinar - best of both worlds
NoSQL and MySQL webinar - best of both worldsNoSQL and MySQL webinar - best of both worlds
NoSQL and MySQL webinar - best of both worlds
 

Viewers also liked

Tuple map reduce: beyond classic mapreduce
Tuple map reduce: beyond classic mapreduceTuple map reduce: beyond classic mapreduce
Tuple map reduce: beyond classic mapreducedatasalt
 
Handling Data in Mega Scale Web Systems
Handling Data in Mega Scale Web SystemsHandling Data in Mega Scale Web Systems
Handling Data in Mega Scale Web SystemsVineet Gupta
 
Reduce Side Joins
Reduce Side Joins Reduce Side Joins
Reduce Side Joins Edureka!
 
Introduction to Tokenization
Introduction to TokenizationIntroduction to Tokenization
Introduction to TokenizationNabeel Yoosuf
 
Relational Algebra and MapReduce
Relational Algebra and MapReduceRelational Algebra and MapReduce
Relational Algebra and MapReducePietro Michiardi
 
Efficient Duplicate Detection Over Massive Data Sets
Efficient Duplicate Detection Over Massive Data SetsEfficient Duplicate Detection Over Massive Data Sets
Efficient Duplicate Detection Over Massive Data SetsPradeeban Kathiravelu, Ph.D.
 
What is Payment Tokenization?
What is Payment Tokenization?What is Payment Tokenization?
What is Payment Tokenization?Rambus Inc
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsLynn Langit
 
MySQL Visual Analysis and Scale-out Strategy definition - Webinar deck
MySQL Visual Analysis and Scale-out Strategy definition - Webinar deckMySQL Visual Analysis and Scale-out Strategy definition - Webinar deck
MySQL Visual Analysis and Scale-out Strategy definition - Webinar deckVladi Vexler
 

Viewers also liked (11)

Tuple map reduce: beyond classic mapreduce
Tuple map reduce: beyond classic mapreduceTuple map reduce: beyond classic mapreduce
Tuple map reduce: beyond classic mapreduce
 
Handling Data in Mega Scale Web Systems
Handling Data in Mega Scale Web SystemsHandling Data in Mega Scale Web Systems
Handling Data in Mega Scale Web Systems
 
Reduce Side Joins
Reduce Side Joins Reduce Side Joins
Reduce Side Joins
 
Introduction to Tokenization
Introduction to TokenizationIntroduction to Tokenization
Introduction to Tokenization
 
Relational Algebra and MapReduce
Relational Algebra and MapReduceRelational Algebra and MapReduce
Relational Algebra and MapReduce
 
Denormalization
DenormalizationDenormalization
Denormalization
 
Efficient Duplicate Detection Over Massive Data Sets
Efficient Duplicate Detection Over Massive Data SetsEfficient Duplicate Detection Over Massive Data Sets
Efficient Duplicate Detection Over Massive Data Sets
 
What is Payment Tokenization?
What is Payment Tokenization?What is Payment Tokenization?
What is Payment Tokenization?
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
MySQL Visual Analysis and Scale-out Strategy definition - Webinar deck
MySQL Visual Analysis and Scale-out Strategy definition - Webinar deckMySQL Visual Analysis and Scale-out Strategy definition - Webinar deck
MySQL Visual Analysis and Scale-out Strategy definition - Webinar deck
 

Similar to High Performance Transaction Systems Flexible Data Models

SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7abdulrahmanhelan
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxRahul Borate
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxRahul Borate
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Don Demcsak
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An OverviewC. Scyphers
 
Common Patterns of Multi Data-Center Architectures with Apache Kafka
Common Patterns of Multi Data-Center Architectures with Apache KafkaCommon Patterns of Multi Data-Center Architectures with Apache Kafka
Common Patterns of Multi Data-Center Architectures with Apache Kafkaconfluent
 
Bigdata antipatterns
Bigdata antipatternsBigdata antipatterns
Bigdata antipatternsAnurag S
 
Container Attached Storage with OpenEBS - CNCF Paris Meetup
Container Attached Storage with OpenEBS - CNCF Paris MeetupContainer Attached Storage with OpenEBS - CNCF Paris Meetup
Container Attached Storage with OpenEBS - CNCF Paris MeetupMayaData Inc
 
Silicon valley nosql meetup april 2012
Silicon valley nosql meetup  april 2012Silicon valley nosql meetup  april 2012
Silicon valley nosql meetup april 2012InfiniteGraph
 
Solving k8s persistent workloads using k8s DevOps style
Solving k8s persistent workloads using k8s DevOps styleSolving k8s persistent workloads using k8s DevOps style
Solving k8s persistent workloads using k8s DevOps styleMayaData
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureVenu Anuganti
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...Qian Lin
 

Similar to High Performance Transaction Systems Flexible Data Models (20)

SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An Overview
 
Common Patterns of Multi Data-Center Architectures with Apache Kafka
Common Patterns of Multi Data-Center Architectures with Apache KafkaCommon Patterns of Multi Data-Center Architectures with Apache Kafka
Common Patterns of Multi Data-Center Architectures with Apache Kafka
 
Bigdata antipatterns
Bigdata antipatternsBigdata antipatterns
Bigdata antipatterns
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
Container Attached Storage with OpenEBS - CNCF Paris Meetup
Container Attached Storage with OpenEBS - CNCF Paris MeetupContainer Attached Storage with OpenEBS - CNCF Paris Meetup
Container Attached Storage with OpenEBS - CNCF Paris Meetup
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
 
NOSQL
NOSQLNOSQL
NOSQL
 
BigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearchBigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearch
 
Silicon valley nosql meetup april 2012
Silicon valley nosql meetup  april 2012Silicon valley nosql meetup  april 2012
Silicon valley nosql meetup april 2012
 
Solving k8s persistent workloads using k8s DevOps style
Solving k8s persistent workloads using k8s DevOps styleSolving k8s persistent workloads using k8s DevOps style
Solving k8s persistent workloads using k8s DevOps style
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
 
NOsql Presentation.pdf
NOsql Presentation.pdfNOsql Presentation.pdf
NOsql Presentation.pdf
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
 

Recently uploaded

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 

Recently uploaded (20)

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 

High Performance Transaction Systems Flexible Data Models

  • 1. High Performance Transaction Systems, 2011, Asilomar, CA Flexible OLTP Data models in the future Jags Ramnarayan Disclaimer: Any positions expressed here are my own and do not necessarily reflect the positions of my employer VMWare. Confidential
  • 2. Agenda  Perspective on some trends  Basic concepts in VMWare GemFire/SQLFire  Beyond key based partitioning  Beyond the SQL Data model 2 Confidential
  • 3. Trends, Observations  High demand for low/predicable latency, handle huge load spikes, in-memory on commodity, big data  Input is streaming in nature • High, bursty rates ... structured and unstructured • continuous correlations and derived events  Increasingly data is bi-temporal in nature • very high ingest rates that tend to be bursty • optimizations for inserts and mass migration of historical data to data warehouse. • occasional joins across in-memory and data warehouse 3 Confidential
  • 4. Trends, Observations  DB schema rapidly evolving • Services are added/changed every week... DB model cannot be rigid • programmer drives the change • DBA only for operational support?  DB Instance is ACID but nothing ACID across the enterprise • many silos and data duplicated across independent databases • Cleansing, de-duplication is fact of life and will never go away  So, why is ACID so important for most use cases? • Folks want deterministic outcome not ACID 4 Confidential
  • 5. VMWare offering - vFabric GemFire (GA), SQLFire (in beta)  GemFire: Distributed, memory oriented, Object (KV) data management  SQLFire: Similar but SQL is the interface  Target market today • OLTP upto few TB range (all in memory) • real-time, low latency, very high concurrent load • Not focused on “big data” batch analytics 5 Confidential
  • 7. What is different? 77 Confidential
  • 8. Beyond Key based Hash Partitioning • We all know Hash partitioning provides uniform load balance • List, range, or using custom application expression • Exploit OLTP characteristics for partitioning • Often it is the number of entities that grows over time and not the size of the entity. • Customer count perpetually grows, not the size of the customer info • Most often access is very restricted to a few entities • given a FlightID, fetch flightAvailability records • given a customerID, add/remove orders, shipment records • Root entity frequently fetched with its immediate children 8 Confidential
  • 9. Grouping entities • Related entities share a "entity group" key and are colocated • Grouping based on foreign key relationships: look for FK in the compound PK • advantage here is that not all entities in group have to share the same key Entity Groups FlightID is the entity group Key CreateTable FlightAvailability(..) partitioned by FlightID colocated with Flights 9 Confidential
  • 10. Why does this scale? • requests pruned to a single node or subset of cluster • Transactional "write set" is mostly confined to a single entity group • Unit of serializability now confined to a single "primary" member managing the entity group • Common query joins: across tables that belong to the same group • If all concurrent access were to be uniformly distributed across the "entity group" set then you can linearly scale with cluster size 10 Confidential
  • 11. Invariably, access patterns are more complex • Scalable joins when entity grouping is not possible • Reference tables • M-M relationships • Distributed joins impedes scaling significantly • pipelining intermediate data sets impacts other concurrent activity • Answer today: • Use replicated tables for reference data • one side in the M-M • Assumptions • update rate on reference data is low • one side of the M-M related tables is small and infrequently changing 11 Confidential
  • 12. It doesn’t end here • realizing a "partition aware" design is difficult • 80-20 rule: 80% of access at a point in time is on 20% of the data • lumpy distribution causes hotspots • hash partitioning solves this but doesn't help range searches • some help: Multi-attribute Grid declustering • rebalancing may not help as the entity group (the lump) is a unit of redistribution • Static grouping vs dynamic grouping • e.g online gaming: multiple players that all have to be grouped together lasts only for a game (http://www.cs.ucsb.edu/~sudipto/papers/socc10-das.pdf) 12 Confidential
  • 13. “Good enough” scalable transactions • Assumptions • Small in duration and “write set” • Conflicts are rare • Single row operations always atomic and isolated • No statement level read consistency for queries • Writers almost never block readers • Single phase commit protocol • Eagerly “write lock(local)” on each cohort. • “Fail fast” if lock cannot be acquired • Transaction isolation at commit time is guaranteed on "write set" in a single partition 13 Confidential
  • 14. Rough thoughts on “Schema flexibility” • New generation of developers don’t seem to like Schemas  • Drivers • Many source of data: it is semi-structured and changing rapidly • DB model changes are frequent • Adding UDTs and altering tables seen as "rigid“ • E.g. • E-commerce app introduces a few products with a stable schema Source: http://www.nosqldatabases.com/main/2011/4/11/augmen ting-rdbms-with-mongodb-for-ecommerce.html 14 Confidential
  • 15. “Schema free”, “Schema less”, etc • Then, keeps adding support for new products • Or, keeps removing products • XML datatypes or UDTs or organizing tables in a hierarchy is unnatural and complex • JSON is considered fat free alternative to XML 15 Confidential
  • 16. The “Polyglot” Data store • Current thinking Single OLTP data store for: 1. complex, obese, perpetually changing object graphs session state, workflow state 2. Highly structured, transactional data sourced from enterprise DBs 3. semi-structured, self describing, rapidly evolving data syndicated content, etc Distributed data store that supports Objects, SQL and JSON ? 16 Confidential
  • 17. Object columns with dynamic attributes • Extend SQL with dynamic, self describing attributes contained in Object columns • Object columns are containers for self describing K-V pairs (think JSON) • values can be objects themselves supporting nesting (composition) • Can contain collections • Very easy in most object environments • Reflection provides dynamic type under the covers • And, hence the object fields become queriable. For interoperability, the type system could be JSON 17 Confidential
  • 18. Some Examples with Object columns 1. Session State- Object tables easily integrate with session state modules in popular app servers create table sessionState (key String, value Object) hash partitioned redundancy level 1; 2. Semi-structured docs create table myDocuments (key varchar, documentID varchar, creationTime date, doc Object, tags Object) hash partitioned redundancy level 1; - doc could be a JSON object with each row having different attributes in the object - tags is a collection of strings 18 Confidential
  • 19. More information at http://communities.vmware.com/community/vmtn/appplatform/v fabric_sqlfire Q&A 19 Confidential