SQLFireScalable SQL instead of NoSQLJags RamnarayanChief Architect, GemFire ProductsJags Ramnarayan
AgendaVarious NoSQL attributes and why SQLSQLFire features + DemoScalability patternsHash partitioningEntity groups and collocationScaling behavior using “data aware stored procedures”Consistency model How we do distributed transactionsShared nothing persistence
3We Challenge the traditional RDBMS design NOT SQLFirst write to LOGSecond write to Data filesBuffers primarily tuned for IOToo much I/O
Design roots don’t necessarily apply today
Too much focus on ACID
Disk synchronization bottlenecksConfidential
“Shared nothing” commodity clustersfocus shifts to memory, distributing data and clusteringScale by partitioning the data and move behavior to data nodesHA within cluster and across data centersAdd capacity to scale dynamicallyCommon themes in next-gen DB architectures4NoSQL, Data Grids, Data Fabrics, NewSQLConfidential
What is different ?Several data models
 Key-value
Column family (inspired by Google BigTable)
Document
Graph
Most focus on making model less rigid than SQL
Consistency model is not ACIDLow scale Very high scale High scale Tunable ConsistencyEventualSTRICT – Full ACID (RDB)5
What is our take with SQLFire?Eventual consistency is too difficult for the average developerWrite(A,1)  Read(A) may return 2 or (1,2) SQL : Flexible, easily understood, strong type system      essential for integrity as well as query engine efficiency
SQLFireReplicated, partitioned tables in memory. Redundancy through memory copies.Data resides on disk when you explicitly say soPowerful SQL engine: standard SQL for select, DMLDDL has SQLF extensionsLeverages GemFire data grid engine.
SQLFireApplications access the distributed DB using JDBC, ADO.NETConsistency model is FIFO, TunableDistributed transactions without global locks
SQLFireAsynchronous replication over WANSynchronous replication within clusterClients failover, failbackEasily integrate with existing DBs - caching framework to read through, write through or write behind
SQLFireWhen nodes are added, data and behavior is rebalanced without blocking current clients"Data aware procedures“ -  standard Java stored procedures with "data aware" and parallelism extensions
Flexible Deployment TopologiesJava Application cluster can host an embedded clustered database by just changing the URLjdbc:sqlfire:;mcast-port=33666;host-data=trueConfidential11
Flexible Deployment TopologiesConfidential12
Partitioning & Replication
Explore features through exampleAssume, thousands of flight rows, millions of flightavailability records
SQLF Creating TablesCREATE TABLE FLIGHTS   (      FLIGHT_ID CHAR(6) NOT NULL  PRIMARY KEY,      SEGMENT_NUMBER INTEGER NOT NULL ,      ORIG_AIRPORT CHAR(3),      DEPART_TIME TIME, …   ) ;Hash partitioned on PK by defaultTablePartitioned TablePartitioned TablePartitioned TableSQLFSQLFSQLF
CREATE TABLE FLIGHTAVAILABILITY   (      FLIGHT_ID CHAR(6) NOT NULL ,      SEGMENT_NUMBER INTEGER NOT NULL ,      FLIGHT_DATE DATE NOT NULL ,      ECONOMY_SEATS_TAKEN INTEGER DEFAULT 0, …) PARTITION BY COLUMN (FLIGHT_ID)COLOCATE WITH (FLIGHTS)CREATE TABLE FLIGHTS   (      FLIGHT_ID CHAR(6) NOT NULL ,      SEGMENT_NUMBER INTEGER NOT NULL ,      ORIG_AIRPORT CHAR(3),      DEPART_TIME TIME, …)PARTITION BY COLUMN (FLIGHT_ID);CREATE TABLE Airlines AIRLINE CHAR(2) NOT NULL PRIMARY KEY,      AIRLINE_FULL VARCHAR(24),      BASIC_RATE DOUBLE PRECISION,      DISTANCE_DISCOUNT DOUBLE PRECISION,…. )CREATE TABLE FLIGHTS   (      FLIGHT_ID CHAR(6) NOT NULL ,      SEGMENT_NUMBER INTEGER NOT NULL ,      ORIG_AIRPORT CHAR(3),      DEPART_TIME TIME, …)PARTITION BY COLUMN (FLIGHT_ID)REDUNDANCY 1;CREATE TABLE Airlines AIRLINE CHAR(2) NOT NULL PRIMARY KEY,      AIRLINE_FULL VARCHAR(24),      BASIC_RATE DOUBLE PRECISION,      DISTANCE_DISCOUNT DOUBLE PRECISION,…. )REPLICATE;Replicated TableReplicated TableReplicated TableTableRedundant PartitionRedundant PartitionPartitioned TablePartitioned TableRedundant PartitionPartitioned TableSQLFSQLFSQLFSQLF Creating TablesColocated PartitionColocated PartitionColocated Partition
CREATE TABLE Airlines AIRLINE CHAR(2) NOT NULL PRIMARY KEY,      AIRLINE_FULL VARCHAR(24),      BASIC_RATE DOUBLE PRECISION,      DISTANCE_DISCOUNT DOUBLE PRECISION,…. )TableSQLFSQLFSQLFSQLF Creating Tables
CREATE TABLE Airlines AIRLINE CHAR(2) NOT NULL PRIMARY KEY,      AIRLINE_FULL VARCHAR(24),      BASIC_RATE DOUBLE PRECISION,      DISTANCE_DISCOUNT DOUBLE PRECISION,…. )REPLICATE;Replicated TableReplicated TableReplicated TableSQLFSQLFSQLFSQLF Creating Tables
SQLF Creating TablesCREATE TABLE FLIGHTS   (      FLIGHT_ID CHAR(6) NOT NULL ,      SEGMENT_NUMBER INTEGER NOT NULL ,      ORIG_AIRPORT CHAR(3),      DEPART_TIME TIME,PARTITION BY COLUMN (FLIGHT_ID);TableReplicated TableReplicated TableReplicated TablePartitioned TablePartitioned TablePartitioned TableSQLFSQLFSQLF
CREATE TABLE FLIGHTS   (      FLIGHT_ID CHAR(6) NOT NULL ,      SEGMENT_NUMBER INTEGER NOT NULL ,      ORIG_AIRPORT CHAR(3),      DEPART_TIME TIME, …)PARTITION BY COLUMN (FLIGHT_ID)REDUNDANCY 1;TableReplicated TableReplicated TableReplicated TablePartitioned TablePartitioned TablePartitioned TableRedundant PartitionRedundant PartitionRedundant PartitionSQLFSQLFSQLFSQLF Creating Tables
CREATE TABLE FLIGHTAVAILABILITY   (      FLIGHT_ID CHAR(6) NOT NULL ,      SEGMENT_NUMBER INTEGER NOT NULL ,      FLIGHT_DATE DATE NOT NULL ,      ECONOMY_SEATS_TAKEN INTEGER DEFAULT 0, …) PARTITION BY COLUMN (FLIGHT_ID)COLOCATE WITH (FLIGHTS)TableReplicated TableReplicated TableReplicated TablePartitioned TablePartitioned TablePartitioned TableColocated PartitionColocated PartitionColocated PartitionRedundant PartitionRedundant PartitionRedundant PartitionSQLFSQLFSQLFSQLF Creating Tables
By default, it is only the data dictionary that is persisted to disk.TableReplicated TableReplicated TableReplicated TablePartitioned TablePartitioned TablePartitioned TableColocated PartitionColocated PartitionColocated PartitionRedundant PartitionRedundant PartitionRedundant PartitionSQLFSQLFSQLFSQLF Creating Tables
CREATE TABLE FLIGHTAVAILABILITY   (      FLIGHT_ID CHAR(6) NOT NULL ,      SEGMENT_NUMBER INTEGER NOT NULL ,      FLIGHT_DATE DATE NOT NULL ,      ECONOMY_SEATS_TAKEN INTEGER DEFAULT 0, …) PARTITION BY COLUMN (FLIGHT_ID)   COLOCATE WITH (FLIGHTS)   PERSISTENT ;TableReplicated TableReplicated TableReplicated TablePartitioned TablePartitioned TablePartitioned TableRedundant PartitionRedundant PartitionRedundant PartitionSQLFSQLFSQLFSQLF Creating TablesColocated PartitionColocated PartitionColocated Partition
Partitioning OptionsTo partition using the Primay Key, use:(Primary Key’s Java implementation must hash evenly across its range)PARTITION BY PRIMARY KEYCREATE TABLE FLIGHTS   (      FLIGHT_ID CHAR(6) NOT NULL ,      SEGMENT_NUMBER INTEGER NOT NULL ,      ORIG_AIRPORT CHAR(3),      DEPART_TIME TIME, … )PARTITION BY PRIMARY KEY;
Partitioning OptionsWhen you wish to partition on a column or columns that are not the primary key, use:PARTITION BY COLUMN (column-name [ , column-name ]*)CREATE TABLE FLIGHTAVAILABILITY   (      FLIGHT_ID CHAR(6) NOT NULL ,      SEGMENT_NUMBER INTEGER NOT NULL ,      FLIGHT_DATE DATE NOT NULL ,      ECONOMY_SEATS_TAKEN INTEGER DEFAULT 0, …)PARTITION BY COLUMN (FLIGHT_ID);
Partitioning OptionsYou can partition entries based on a range of values of one of the columns:PARTITION BY RANGE (column-name )( VALUES BETWEEN value AND value[ , VALUES BETWEEN value AND value ]*)CREATE TABLE FLIGHTAVAILABILITY   (      FLIGHT_ID CHAR(6) NOT NULL ,      SEGMENT_NUMBER INTEGER NOT NULL ,      FLIGHT_DATE DATE NOT NULL ,      ECONOMY_SEATS_TAKEN INTEGER DEFAULT 0, …)PARTITION BY RANGE ( economy_seats_taken )(  VALUES BETWEEN 0 AND 50,   VALUES BETWEEN 50 AND 100,   VALUES BETWEEN 100 AND 500);
Partitioning OptionsYou can explicitly partition entries based on a list of potential values of a column:PARTITION BY LIST ( column-name ) ( VALUES ( value [ , value ]* ) [ , VALUES ( value [ , value ]* ) ]* ) CREATE TABLE Orders (OrderId INT NOT NULL, ItemId INT, NumItems INT, CustomerId INT, OrderDate DATE, Priority INT, Status CHAR(10), CONSTRAINT Pk_Orders PRIMARY KEY (OrderId) CONSTRAINT Fk_Items FOREIGN KEY (ItemId) REFERENCES Items(ItemId))PARTITION BY LIST ( Status )(   VALUES ( 'pending', 'returned' ),    VALUES ( 'shipped', 'received' ),    VALUES ( 'hold' ));
Default PartitioningYesStartUse explicit directivesIs partitioning declared?NoIs the referenced table partitioned on the foreign key?YesColocate with referenced tableYesAre there foreign keys?NoIf no PARTITION BY clause is specified, GemFire SQLF will automatically partition and colocate tables based on this algorithm.YesPartition by primary keyIs there a primary key?Hashing is performed on the Java implementation of the column’s type.NoYesPartition by the first UNIQUE columnAre there UNIQUE columns?NoPartition by internally generated row id
Demo default partitioned tables, colocation, persistent tables
Scaling with Partitioned tables
Hash partitioning for linear scalingKey Hashing provides single hop access to its partitionBut, what if the access is not based on the key … say, joins are involved
Hash partitioning only goes so farConsider this query :Select * from flights, flightAvailabilitywhere <equijoin flights with flightAvailability> and flightId ='xxx';If both tables are hash partitioned the join logic will need execution on all nodes where flightavailability data is storedDistributed joins are expensive and inhibit scalingjoins across distributed nodes could involve distributed locks and potentially a lot of intermediate data transfer across nodesEquiJOIN of rows across multiple nodes is not supported in SQLFire 1.0
Partition aware DB designDesigner thinks about how data maps to partitionsThe main idea is to:minimize excessive data distribution by keeping the most frequently accessed and joined data collocated on partitionsCollocate transaction working set on partitions so complex 2-phase commits/paxos commit is eliminated or minimized.Read Pat Helland’s “Life beyond Distributed Transactions” and the Google MegaStore paper
Partition aware DB designTurns out OLTP systems lend themselves well to this needTypically it is the number of entities that grows over time and not the size of the entity. Customer count perpetually grows, not the size of the customer infoMost often access is very restricted and based on select entitiesgiven a FlightID, fetch flightAvailability recordsgiven a customerID, add/remove orders, shipment recordsIdentify partition key for “Entity Group”"entity groups": set of entities across several related tables that can all share a single identifierflightIDis shared between the parent and child tablesCustomerID shared between customer, order and shipment tables
Partition aware DB designEntity groups defined in SQLFire using “colocation” clauseEntity group guaranteed to be collocated in presence of failures or rebalanceNow, complex queries can be executed without requiring excessive distributed data access
Partition Aware DB designSTAR schema design is the norm in OLTP designFact tables (fast changing) are natural partitioning candidatesPartition by: FlightID … Availability, history rows colocated with FlightsDimension tables are natural replicated table candidatesReplicate Airlines, Countries, Cities on all nodesDealing with Joins involving M-M relationshipsCan the one side of the M-M become a replicated table?If not, run the Join logic in a parallel stored procedure to minimize distributionElse, split the query into multiple queries in application

vFabric SQLFire Introduction

  • 1.
    SQLFireScalable SQL insteadof NoSQLJags RamnarayanChief Architect, GemFire ProductsJags Ramnarayan
  • 2.
    AgendaVarious NoSQL attributesand why SQLSQLFire features + DemoScalability patternsHash partitioningEntity groups and collocationScaling behavior using “data aware stored procedures”Consistency model How we do distributed transactionsShared nothing persistence
  • 3.
    3We Challenge thetraditional RDBMS design NOT SQLFirst write to LOGSecond write to Data filesBuffers primarily tuned for IOToo much I/O
  • 4.
    Design roots don’tnecessarily apply today
  • 5.
  • 6.
  • 7.
    “Shared nothing” commodityclustersfocus shifts to memory, distributing data and clusteringScale by partitioning the data and move behavior to data nodesHA within cluster and across data centersAdd capacity to scale dynamicallyCommon themes in next-gen DB architectures4NoSQL, Data Grids, Data Fabrics, NewSQLConfidential
  • 8.
    What is different?Several data models
  • 9.
  • 10.
    Column family (inspiredby Google BigTable)
  • 11.
  • 12.
  • 13.
    Most focus onmaking model less rigid than SQL
  • 14.
    Consistency model isnot ACIDLow scale Very high scale High scale Tunable ConsistencyEventualSTRICT – Full ACID (RDB)5
  • 15.
    What is ourtake with SQLFire?Eventual consistency is too difficult for the average developerWrite(A,1)  Read(A) may return 2 or (1,2) SQL : Flexible, easily understood, strong type system essential for integrity as well as query engine efficiency
  • 16.
    SQLFireReplicated, partitioned tablesin memory. Redundancy through memory copies.Data resides on disk when you explicitly say soPowerful SQL engine: standard SQL for select, DMLDDL has SQLF extensionsLeverages GemFire data grid engine.
  • 17.
    SQLFireApplications access thedistributed DB using JDBC, ADO.NETConsistency model is FIFO, TunableDistributed transactions without global locks
  • 18.
    SQLFireAsynchronous replication overWANSynchronous replication within clusterClients failover, failbackEasily integrate with existing DBs - caching framework to read through, write through or write behind
  • 19.
    SQLFireWhen nodes areadded, data and behavior is rebalanced without blocking current clients"Data aware procedures“ - standard Java stored procedures with "data aware" and parallelism extensions
  • 20.
    Flexible Deployment TopologiesJavaApplication cluster can host an embedded clustered database by just changing the URLjdbc:sqlfire:;mcast-port=33666;host-data=trueConfidential11
  • 21.
  • 22.
  • 23.
    Explore features throughexampleAssume, thousands of flight rows, millions of flightavailability records
  • 24.
    SQLF Creating TablesCREATETABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL PRIMARY KEY, SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), DEPART_TIME TIME, … ) ;Hash partitioned on PK by defaultTablePartitioned TablePartitioned TablePartitioned TableSQLFSQLFSQLF
  • 25.
    CREATE TABLE FLIGHTAVAILABILITY ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , FLIGHT_DATE DATE NOT NULL , ECONOMY_SEATS_TAKEN INTEGER DEFAULT 0, …) PARTITION BY COLUMN (FLIGHT_ID)COLOCATE WITH (FLIGHTS)CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), DEPART_TIME TIME, …)PARTITION BY COLUMN (FLIGHT_ID);CREATE TABLE Airlines AIRLINE CHAR(2) NOT NULL PRIMARY KEY, AIRLINE_FULL VARCHAR(24), BASIC_RATE DOUBLE PRECISION, DISTANCE_DISCOUNT DOUBLE PRECISION,…. )CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), DEPART_TIME TIME, …)PARTITION BY COLUMN (FLIGHT_ID)REDUNDANCY 1;CREATE TABLE Airlines AIRLINE CHAR(2) NOT NULL PRIMARY KEY, AIRLINE_FULL VARCHAR(24), BASIC_RATE DOUBLE PRECISION, DISTANCE_DISCOUNT DOUBLE PRECISION,…. )REPLICATE;Replicated TableReplicated TableReplicated TableTableRedundant PartitionRedundant PartitionPartitioned TablePartitioned TableRedundant PartitionPartitioned TableSQLFSQLFSQLFSQLF Creating TablesColocated PartitionColocated PartitionColocated Partition
  • 26.
    CREATE TABLE AirlinesAIRLINE CHAR(2) NOT NULL PRIMARY KEY, AIRLINE_FULL VARCHAR(24), BASIC_RATE DOUBLE PRECISION, DISTANCE_DISCOUNT DOUBLE PRECISION,…. )TableSQLFSQLFSQLFSQLF Creating Tables
  • 27.
    CREATE TABLE AirlinesAIRLINE CHAR(2) NOT NULL PRIMARY KEY, AIRLINE_FULL VARCHAR(24), BASIC_RATE DOUBLE PRECISION, DISTANCE_DISCOUNT DOUBLE PRECISION,…. )REPLICATE;Replicated TableReplicated TableReplicated TableSQLFSQLFSQLFSQLF Creating Tables
  • 28.
    SQLF Creating TablesCREATETABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), DEPART_TIME TIME,PARTITION BY COLUMN (FLIGHT_ID);TableReplicated TableReplicated TableReplicated TablePartitioned TablePartitioned TablePartitioned TableSQLFSQLFSQLF
  • 29.
    CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), DEPART_TIME TIME, …)PARTITION BY COLUMN (FLIGHT_ID)REDUNDANCY 1;TableReplicated TableReplicated TableReplicated TablePartitioned TablePartitioned TablePartitioned TableRedundant PartitionRedundant PartitionRedundant PartitionSQLFSQLFSQLFSQLF Creating Tables
  • 30.
    CREATE TABLE FLIGHTAVAILABILITY ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , FLIGHT_DATE DATE NOT NULL , ECONOMY_SEATS_TAKEN INTEGER DEFAULT 0, …) PARTITION BY COLUMN (FLIGHT_ID)COLOCATE WITH (FLIGHTS)TableReplicated TableReplicated TableReplicated TablePartitioned TablePartitioned TablePartitioned TableColocated PartitionColocated PartitionColocated PartitionRedundant PartitionRedundant PartitionRedundant PartitionSQLFSQLFSQLFSQLF Creating Tables
  • 31.
    By default, itis only the data dictionary that is persisted to disk.TableReplicated TableReplicated TableReplicated TablePartitioned TablePartitioned TablePartitioned TableColocated PartitionColocated PartitionColocated PartitionRedundant PartitionRedundant PartitionRedundant PartitionSQLFSQLFSQLFSQLF Creating Tables
  • 32.
    CREATE TABLE FLIGHTAVAILABILITY ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , FLIGHT_DATE DATE NOT NULL , ECONOMY_SEATS_TAKEN INTEGER DEFAULT 0, …) PARTITION BY COLUMN (FLIGHT_ID) COLOCATE WITH (FLIGHTS) PERSISTENT ;TableReplicated TableReplicated TableReplicated TablePartitioned TablePartitioned TablePartitioned TableRedundant PartitionRedundant PartitionRedundant PartitionSQLFSQLFSQLFSQLF Creating TablesColocated PartitionColocated PartitionColocated Partition
  • 33.
    Partitioning OptionsTo partitionusing the Primay Key, use:(Primary Key’s Java implementation must hash evenly across its range)PARTITION BY PRIMARY KEYCREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), DEPART_TIME TIME, … )PARTITION BY PRIMARY KEY;
  • 34.
    Partitioning OptionsWhen youwish to partition on a column or columns that are not the primary key, use:PARTITION BY COLUMN (column-name [ , column-name ]*)CREATE TABLE FLIGHTAVAILABILITY ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , FLIGHT_DATE DATE NOT NULL , ECONOMY_SEATS_TAKEN INTEGER DEFAULT 0, …)PARTITION BY COLUMN (FLIGHT_ID);
  • 35.
    Partitioning OptionsYou canpartition entries based on a range of values of one of the columns:PARTITION BY RANGE (column-name )( VALUES BETWEEN value AND value[ , VALUES BETWEEN value AND value ]*)CREATE TABLE FLIGHTAVAILABILITY ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , FLIGHT_DATE DATE NOT NULL , ECONOMY_SEATS_TAKEN INTEGER DEFAULT 0, …)PARTITION BY RANGE ( economy_seats_taken )( VALUES BETWEEN 0 AND 50, VALUES BETWEEN 50 AND 100, VALUES BETWEEN 100 AND 500);
  • 36.
    Partitioning OptionsYou canexplicitly partition entries based on a list of potential values of a column:PARTITION BY LIST ( column-name ) ( VALUES ( value [ , value ]* ) [ , VALUES ( value [ , value ]* ) ]* ) CREATE TABLE Orders (OrderId INT NOT NULL, ItemId INT, NumItems INT, CustomerId INT, OrderDate DATE, Priority INT, Status CHAR(10), CONSTRAINT Pk_Orders PRIMARY KEY (OrderId) CONSTRAINT Fk_Items FOREIGN KEY (ItemId) REFERENCES Items(ItemId))PARTITION BY LIST ( Status )( VALUES ( 'pending', 'returned' ), VALUES ( 'shipped', 'received' ), VALUES ( 'hold' ));
  • 37.
    Default PartitioningYesStartUse explicitdirectivesIs partitioning declared?NoIs the referenced table partitioned on the foreign key?YesColocate with referenced tableYesAre there foreign keys?NoIf no PARTITION BY clause is specified, GemFire SQLF will automatically partition and colocate tables based on this algorithm.YesPartition by primary keyIs there a primary key?Hashing is performed on the Java implementation of the column’s type.NoYesPartition by the first UNIQUE columnAre there UNIQUE columns?NoPartition by internally generated row id
  • 38.
    Demo default partitionedtables, colocation, persistent tables
  • 39.
  • 40.
    Hash partitioning forlinear scalingKey Hashing provides single hop access to its partitionBut, what if the access is not based on the key … say, joins are involved
  • 41.
    Hash partitioning onlygoes so farConsider this query :Select * from flights, flightAvailabilitywhere <equijoin flights with flightAvailability> and flightId ='xxx';If both tables are hash partitioned the join logic will need execution on all nodes where flightavailability data is storedDistributed joins are expensive and inhibit scalingjoins across distributed nodes could involve distributed locks and potentially a lot of intermediate data transfer across nodesEquiJOIN of rows across multiple nodes is not supported in SQLFire 1.0
  • 42.
    Partition aware DBdesignDesigner thinks about how data maps to partitionsThe main idea is to:minimize excessive data distribution by keeping the most frequently accessed and joined data collocated on partitionsCollocate transaction working set on partitions so complex 2-phase commits/paxos commit is eliminated or minimized.Read Pat Helland’s “Life beyond Distributed Transactions” and the Google MegaStore paper
  • 43.
    Partition aware DBdesignTurns out OLTP systems lend themselves well to this needTypically it is the number of entities that grows over time and not the size of the entity. Customer count perpetually grows, not the size of the customer infoMost often access is very restricted and based on select entitiesgiven a FlightID, fetch flightAvailability recordsgiven a customerID, add/remove orders, shipment recordsIdentify partition key for “Entity Group”"entity groups": set of entities across several related tables that can all share a single identifierflightIDis shared between the parent and child tablesCustomerID shared between customer, order and shipment tables
  • 44.
    Partition aware DBdesignEntity groups defined in SQLFire using “colocation” clauseEntity group guaranteed to be collocated in presence of failures or rebalanceNow, complex queries can be executed without requiring excessive distributed data access
  • 45.
    Partition Aware DBdesignSTAR schema design is the norm in OLTP designFact tables (fast changing) are natural partitioning candidatesPartition by: FlightID … Availability, history rows colocated with FlightsDimension tables are natural replicated table candidatesReplicate Airlines, Countries, Cities on all nodesDealing with Joins involving M-M relationshipsCan the one side of the M-M become a replicated table?If not, run the Join logic in a parallel stored procedure to minimize distributionElse, split the query into multiple queries in application

Editor's Notes

  • #5  &lt;-- Strict(Full ACID) ----FIFO(tunable) ---- Eventual ---&gt; (Inpired by Amazon dynamo) RDBMS is synonymous with ACID Tunable: ACID transactions is a choice; by default it could be FIFO Eventual: All bets are off ... you may write and read back and get a different answer or multiple answers (netflix example)