SQLFire<br />Scalable SQL instead of NoSQL<br />Jags Ramnarayan<br />Chief Architect, GemFire Products<br />Jags Ramnaraya...
Agenda<br />Various NoSQL attributes and why SQL<br />SQLFire features + Demo<br />Scalability patterns<br />Hash partitio...
3<br />We Challenge the traditional RDBMS design NOT SQL<br />First write to LOG<br />Second write to Data files<br />Buff...
Design roots don’t necessarily apply today
Too much focus on ACID
Disk synchronization bottlenecks</li></ul>Confidential<br />
“Shared nothing” commodity clusters<br />focus shifts to memory, distributing data and clustering<br />Scale by partitioni...
What is different ?<br /><ul><li>Several data models
 Key-value
Column family (inspired by Google BigTable)
Document
Graph
Most focus on making model less rigid than SQL
Consistency model is not ACID</li></ul>Low scale <br />Very high scale <br />High scale <br />Tunable Consistency<br />Eve...
What is our take with SQLFire?<br />Eventual consistency is too difficult for the average developer<br />Write(A,1)  Read...
SQLFire<br />Replicated, partitioned tables in memory. Redundancy through memory copies.<br />Data resides on disk when yo...
SQLFire<br />Applications access the distributed DB using JDBC, ADO.NET<br />Consistency model is FIFO, Tunable<br />Distr...
SQLFire<br />Asynchronous replication over WAN<br />Synchronous replication within cluster<br />Clients failover, failback...
SQLFire<br />When nodes are added, data and behavior is rebalanced without blocking current clients<br />"Data aware proce...
Flexible Deployment Topologies<br />Java Application cluster can host an embedded clustered database by just changing the ...
Flexible Deployment Topologies<br />Confidential<br />12<br />
Partitioning & Replication<br />
Explore features through example<br />Assume, thousands of flight rows, millions of flightavailability records<br />
SQLF Creating Tables<br />CREATE TABLE FLIGHTS<br />   (<br />      FLIGHT_ID CHAR(6) NOT NULL  PRIMARY KEY,<br />      SE...
CREATE TABLE FLIGHTAVAILABILITY<br />   (<br />      FLIGHT_ID CHAR(6) NOT NULL ,<br />      SEGMENT_NUMBER INTEGER NOT NU...
CREATE TABLE Airlines<br /> AIRLINE CHAR(2) NOT NULL PRIMARY KEY,<br />      AIRLINE_FULL VARCHAR(24),<br />      BASIC_RA...
CREATE TABLE Airlines<br /> AIRLINE CHAR(2) NOT NULL PRIMARY KEY,<br />      AIRLINE_FULL VARCHAR(24),<br />      BASIC_RA...
SQLF Creating Tables<br />CREATE TABLE FLIGHTS<br />   (<br />      FLIGHT_ID CHAR(6) NOT NULL ,<br />      SEGMENT_NUMBER...
CREATE TABLE FLIGHTS<br />   (<br />      FLIGHT_ID CHAR(6) NOT NULL ,<br />      SEGMENT_NUMBER INTEGER NOT NULL ,<br /> ...
CREATE TABLE FLIGHTAVAILABILITY<br />   (<br />      FLIGHT_ID CHAR(6) NOT NULL ,<br />      SEGMENT_NUMBER INTEGER NOT NU...
By default, it is only the data dictionary that is persisted to disk.<br />Table<br />Replicated Table<br />Replicated Tab...
CREATE TABLE FLIGHTAVAILABILITY<br />   (<br />      FLIGHT_ID CHAR(6) NOT NULL ,<br />      SEGMENT_NUMBER INTEGER NOT NU...
Partitioning Options<br />To partition using the Primay Key, use:<br />(Primary Key’s Java implementation must hash evenly...
Partitioning Options<br />When you wish to partition on a column or columns that are not the primary key, use:<br />PARTIT...
Partitioning Options<br />You can partition entries based on a range of values of one of the columns:<br />PARTITION BY RA...
Partitioning Options<br />You can explicitly partition entries based on a list of potential values of a column:<br />PARTI...
Default Partitioning<br />Yes<br />Start<br />Use explicit directives<br />Is partitioning declared?<br />No<br />Is the <...
Demo <br />default partitioned tables, colocation, persistent tables<br />
Scaling with Partitioned tables<br />
Hash partitioning for linear scaling<br />Key Hashing provides single hop access to its partition<br />But, what if the ac...
Hash partitioning only goes so far<br />Consider this query :<br />Select * from flights, flightAvailability<br />where <e...
Partition aware DB design<br />Designer thinks about how data maps to partitions<br />The main idea is to:<br />minimize e...
Partition aware DB design<br />Turns out OLTP systems lend themselves well to this need<br />Typically it is the number of...
Partition aware DB design<br />Entity groups defined in SQLFire using “colocation” clause<br />Entity group guaranteed to ...
Partition Aware DB design<br />STAR schema design is the norm in OLTP design<br />Fact tables (fast changing) are natural ...
Upcoming SlideShare
Loading in …5
×

vFabric SQLFire Introduction

4,472 views

Published on

VMWare vFabric SQLFire - scalable SQL instead of NoSQL

There is quite a bit of buzz thesedays on "NoSQL" databases. The lack of transactions and good support for querying (SQL) has been a problem for many to adopt these solutions. This talk presents, VMWare SQLFire, a distributed SQL data management solution that melds Apache Derby (borrowing SQL drivers, parsing and some aspects of the engine) and an object data grid (GemFire) to offer a horizontally scalable, memory oriented data management system where developers can continue to use SQL. We focus on new primitives that extend the well known SQL Data definition syntax for data partitioning and replication strategies but leaving the "select" and data manipulation part of SQL intact so it only minimally impacts your application.

I gave this presentation at What's next, Paris 2011(http://www.whatsnextparis.com/abouttheseminar.html).

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,472
On SlideShare
0
From Embeds
0
Number of Embeds
25
Actions
Shares
0
Downloads
177
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • &lt;-- Strict(Full ACID) ----FIFO(tunable) ---- Eventual ---&gt; (Inpired by Amazon dynamo) RDBMS is synonymous with ACID Tunable: ACID transactions is a choice; by default it could be FIFO Eventual: All bets are off ... you may write and read back and get a different answer or multiple answers (netflix example)
  • vFabric SQLFire Introduction

    1. 1. SQLFire<br />Scalable SQL instead of NoSQL<br />Jags Ramnarayan<br />Chief Architect, GemFire Products<br />Jags Ramnarayan <br />
    2. 2. Agenda<br />Various NoSQL attributes and why SQL<br />SQLFire features + Demo<br />Scalability patterns<br />Hash partitioning<br />Entity groups and collocation<br />Scaling behavior using “data aware stored procedures”<br />Consistency model <br />How we do distributed transactions<br />Shared nothing persistence<br />
    3. 3. 3<br />We Challenge the traditional RDBMS design NOT SQL<br />First write to LOG<br />Second write to Data files<br />Buffers primarily tuned for IO<br /><ul><li>Too much I/O
    4. 4. Design roots don’t necessarily apply today
    5. 5. Too much focus on ACID
    6. 6. Disk synchronization bottlenecks</li></ul>Confidential<br />
    7. 7. “Shared nothing” commodity clusters<br />focus shifts to memory, distributing data and clustering<br />Scale by partitioning the data and move behavior to data nodes<br />HA within cluster and across data centers<br />Add capacity to scale dynamically<br />Common themes in next-gen DB architectures<br />4<br />NoSQL, Data Grids, Data Fabrics, NewSQL<br />Confidential<br />
    8. 8. What is different ?<br /><ul><li>Several data models
    9. 9. Key-value
    10. 10. Column family (inspired by Google BigTable)
    11. 11. Document
    12. 12. Graph
    13. 13. Most focus on making model less rigid than SQL
    14. 14. Consistency model is not ACID</li></ul>Low scale <br />Very high scale <br />High scale <br />Tunable Consistency<br />Eventual<br />STRICT – Full ACID (RDB)<br />5<br />
    15. 15. What is our take with SQLFire?<br />Eventual consistency is too difficult for the average developer<br />Write(A,1)  Read(A) may return 2 or (1,2) <br />SQL : Flexible, easily understood, strong type system <br /> essential for integrity as well as query engine efficiency<br />
    16. 16. SQLFire<br />Replicated, partitioned tables in memory. Redundancy through memory copies.<br />Data resides on disk when you explicitly say so<br />Powerful SQL engine: standard SQL for select, DML<br />DDL has SQLF extensions<br />Leverages GemFire data grid engine.<br />
    17. 17. SQLFire<br />Applications access the distributed DB using JDBC, ADO.NET<br />Consistency model is FIFO, Tunable<br />Distributed transactions without global locks<br />
    18. 18. SQLFire<br />Asynchronous replication over WAN<br />Synchronous replication within cluster<br />Clients failover, failback<br />Easily integrate with existing DBs - caching framework to read through, write through or write behind<br />
    19. 19. SQLFire<br />When nodes are added, data and behavior is rebalanced without blocking current clients<br />"Data aware procedures“ - standard Java stored procedures with "data aware" and parallelism extensions<br />
    20. 20. Flexible Deployment Topologies<br />Java Application cluster can host an embedded clustered database by just changing the URL<br />jdbc:sqlfire:;mcast-port=33666;host-data=true<br />Confidential<br />11<br />
    21. 21. Flexible Deployment Topologies<br />Confidential<br />12<br />
    22. 22. Partitioning & Replication<br />
    23. 23. Explore features through example<br />Assume, thousands of flight rows, millions of flightavailability records<br />
    24. 24. SQLF Creating Tables<br />CREATE TABLE FLIGHTS<br /> (<br /> FLIGHT_ID CHAR(6) NOT NULL PRIMARY KEY,<br /> SEGMENT_NUMBER INTEGER NOT NULL ,<br /> ORIG_AIRPORT CHAR(3),<br /> DEPART_TIME TIME, …<br /> ) ;<br />Hash partitioned on PK by default<br />Table<br />Partitioned Table<br />Partitioned Table<br />Partitioned Table<br />SQLF<br />SQLF<br />SQLF<br />
    25. 25. CREATE TABLE FLIGHTAVAILABILITY<br /> (<br /> FLIGHT_ID CHAR(6) NOT NULL ,<br /> SEGMENT_NUMBER INTEGER NOT NULL ,<br /> FLIGHT_DATE DATE NOT NULL ,<br /> ECONOMY_SEATS_TAKEN INTEGER DEFAULT 0, …)<br /> PARTITION BY COLUMN (FLIGHT_ID)<br />COLOCATE WITH (FLIGHTS)<br />CREATE TABLE FLIGHTS<br /> (<br /> FLIGHT_ID CHAR(6) NOT NULL ,<br /> SEGMENT_NUMBER INTEGER NOT NULL ,<br /> ORIG_AIRPORT CHAR(3),<br /> DEPART_TIME TIME, …)<br />PARTITION BY COLUMN (FLIGHT_ID);<br />CREATE TABLE Airlines<br /> AIRLINE CHAR(2) NOT NULL PRIMARY KEY,<br /> AIRLINE_FULL VARCHAR(24),<br /> BASIC_RATE DOUBLE PRECISION,<br /> DISTANCE_DISCOUNT DOUBLE PRECISION,…. )<br />CREATE TABLE FLIGHTS<br /> (<br /> FLIGHT_ID CHAR(6) NOT NULL ,<br /> SEGMENT_NUMBER INTEGER NOT NULL ,<br /> ORIG_AIRPORT CHAR(3),<br /> DEPART_TIME TIME, …)<br />PARTITION BY COLUMN (FLIGHT_ID)REDUNDANCY 1;<br />CREATE TABLE Airlines<br /> AIRLINE CHAR(2) NOT NULL PRIMARY KEY,<br /> AIRLINE_FULL VARCHAR(24),<br /> BASIC_RATE DOUBLE PRECISION,<br /> DISTANCE_DISCOUNT DOUBLE PRECISION,…. )<br />REPLICATE;<br />Replicated Table<br />Replicated Table<br />Replicated Table<br />Table<br />Redundant Partition<br />Redundant Partition<br />Partitioned Table<br />Partitioned Table<br />Redundant Partition<br />Partitioned Table<br />SQLF<br />SQLF<br />SQLF<br />SQLF Creating Tables<br />Colocated Partition<br />Colocated Partition<br />Colocated Partition<br />
    26. 26. CREATE TABLE Airlines<br /> AIRLINE CHAR(2) NOT NULL PRIMARY KEY,<br /> AIRLINE_FULL VARCHAR(24),<br /> BASIC_RATE DOUBLE PRECISION,<br /> DISTANCE_DISCOUNT DOUBLE PRECISION,…. )<br />Table<br />SQLF<br />SQLF<br />SQLF<br />SQLF Creating Tables<br />
    27. 27. CREATE TABLE Airlines<br /> AIRLINE CHAR(2) NOT NULL PRIMARY KEY,<br /> AIRLINE_FULL VARCHAR(24),<br /> BASIC_RATE DOUBLE PRECISION,<br /> DISTANCE_DISCOUNT DOUBLE PRECISION,…. )<br />REPLICATE;<br />Replicated Table<br />Replicated Table<br />Replicated Table<br />SQLF<br />SQLF<br />SQLF<br />SQLF Creating Tables<br />
    28. 28. SQLF Creating Tables<br />CREATE TABLE FLIGHTS<br /> (<br /> FLIGHT_ID CHAR(6) NOT NULL ,<br /> SEGMENT_NUMBER INTEGER NOT NULL ,<br /> ORIG_AIRPORT CHAR(3),<br /> DEPART_TIME TIME,<br />PARTITION BY COLUMN (FLIGHT_ID);<br />Table<br />Replicated Table<br />Replicated Table<br />Replicated Table<br />Partitioned Table<br />Partitioned Table<br />Partitioned Table<br />SQLF<br />SQLF<br />SQLF<br />
    29. 29. CREATE TABLE FLIGHTS<br /> (<br /> FLIGHT_ID CHAR(6) NOT NULL ,<br /> SEGMENT_NUMBER INTEGER NOT NULL ,<br /> ORIG_AIRPORT CHAR(3),<br /> DEPART_TIME TIME, …)<br />PARTITION BY COLUMN (FLIGHT_ID)REDUNDANCY 1;<br />Table<br />Replicated Table<br />Replicated Table<br />Replicated Table<br />Partitioned Table<br />Partitioned Table<br />Partitioned Table<br />Redundant Partition<br />Redundant Partition<br />Redundant Partition<br />SQLF<br />SQLF<br />SQLF<br />SQLF Creating Tables<br />
    30. 30. CREATE TABLE FLIGHTAVAILABILITY<br /> (<br /> FLIGHT_ID CHAR(6) NOT NULL ,<br /> SEGMENT_NUMBER INTEGER NOT NULL ,<br /> FLIGHT_DATE DATE NOT NULL ,<br /> ECONOMY_SEATS_TAKEN INTEGER DEFAULT 0, …)<br /> PARTITION BY COLUMN (FLIGHT_ID)<br />COLOCATE WITH (FLIGHTS)<br />Table<br />Replicated Table<br />Replicated Table<br />Replicated Table<br />Partitioned Table<br />Partitioned Table<br />Partitioned Table<br />Colocated Partition<br />Colocated Partition<br />Colocated Partition<br />Redundant Partition<br />Redundant Partition<br />Redundant Partition<br />SQLF<br />SQLF<br />SQLF<br />SQLF Creating Tables<br />
    31. 31. By default, it is only the data dictionary that is persisted to disk.<br />Table<br />Replicated Table<br />Replicated Table<br />Replicated Table<br />Partitioned Table<br />Partitioned Table<br />Partitioned Table<br />Colocated Partition<br />Colocated Partition<br />Colocated Partition<br />Redundant Partition<br />Redundant Partition<br />Redundant Partition<br />SQLF<br />SQLF<br />SQLF<br />SQLF Creating Tables<br />
    32. 32. CREATE TABLE FLIGHTAVAILABILITY<br /> (<br /> FLIGHT_ID CHAR(6) NOT NULL ,<br /> SEGMENT_NUMBER INTEGER NOT NULL ,<br /> FLIGHT_DATE DATE NOT NULL ,<br /> ECONOMY_SEATS_TAKEN INTEGER DEFAULT 0, …)<br /> PARTITION BY COLUMN (FLIGHT_ID)<br /> COLOCATE WITH (FLIGHTS)<br /> PERSISTENT ;<br />Table<br />Replicated Table<br />Replicated Table<br />Replicated Table<br />Partitioned Table<br />Partitioned Table<br />Partitioned Table<br />Redundant Partition<br />Redundant Partition<br />Redundant Partition<br />SQLF<br />SQLF<br />SQLF<br />SQLF Creating Tables<br />Colocated Partition<br />Colocated Partition<br />Colocated Partition<br />
    33. 33. Partitioning Options<br />To partition using the Primay Key, use:<br />(Primary Key’s Java implementation must hash evenly across its range)<br />PARTITION BY PRIMARY KEY<br />CREATE TABLE FLIGHTS<br /> (<br /> FLIGHT_ID CHAR(6) NOT NULL ,<br /> SEGMENT_NUMBER INTEGER NOT NULL ,<br /> ORIG_AIRPORT CHAR(3),<br /> DEPART_TIME TIME, … )<br />PARTITION BY PRIMARY KEY;<br />
    34. 34. Partitioning Options<br />When you wish to partition on a column or columns that are not the primary key, use:<br />PARTITION BY COLUMN (column-name [ , column-name ]*)<br />CREATE TABLE FLIGHTAVAILABILITY<br /> (<br /> FLIGHT_ID CHAR(6) NOT NULL ,<br /> SEGMENT_NUMBER INTEGER NOT NULL ,<br /> FLIGHT_DATE DATE NOT NULL ,<br /> ECONOMY_SEATS_TAKEN INTEGER DEFAULT 0, …)<br />PARTITION BY COLUMN (FLIGHT_ID);<br />
    35. 35. Partitioning Options<br />You can partition entries based on a range of values of one of the columns:<br />PARTITION BY RANGE (column-name )<br />( VALUES BETWEEN value AND value<br />[ , VALUES BETWEEN value AND value ]*)<br />CREATE TABLE FLIGHTAVAILABILITY<br /> (<br /> FLIGHT_ID CHAR(6) NOT NULL ,<br /> SEGMENT_NUMBER INTEGER NOT NULL ,<br /> FLIGHT_DATE DATE NOT NULL ,<br /> ECONOMY_SEATS_TAKEN INTEGER DEFAULT 0, …)<br />PARTITION BY RANGE ( economy_seats_taken )<br />( VALUES BETWEEN 0 AND 50,<br /> VALUES BETWEEN 50 AND 100,<br /> VALUES BETWEEN 100 AND 500);<br />
    36. 36. Partitioning Options<br />You can explicitly partition entries based on a list of potential values of a column:<br />PARTITION BY LIST ( column-name ) <br />( VALUES ( value [ , value ]* ) [ , VALUES ( value [ , value ]* ) ]* ) <br />CREATE TABLE Orders<br /> (OrderId INT NOT NULL, ItemId INT, NumItems INT, CustomerId INT, OrderDate DATE, Priority INT, Status CHAR(10),<br /> CONSTRAINT Pk_Orders PRIMARY KEY (OrderId)<br /> CONSTRAINT Fk_Items FOREIGN KEY (ItemId) REFERENCES Items(ItemId))<br />PARTITION BY LIST ( Status )<br />( VALUES ( 'pending', 'returned' ),<br /> VALUES ( 'shipped', 'received' ),<br /> VALUES ( 'hold' ));<br />
    37. 37. Default Partitioning<br />Yes<br />Start<br />Use explicit directives<br />Is partitioning declared?<br />No<br />Is the <br />referenced table partitioned on the foreign key?<br />Yes<br />Colocate with referenced table<br />Yes<br />Are there foreign keys?<br />No<br />If no PARTITION BY clause is specified, GemFire SQLF will automatically partition and colocate tables based on this algorithm.<br />Yes<br />Partition by primary key<br />Is there a primary key?<br />Hashing is performed on the Java implementation of the column’s type.<br />No<br />Yes<br />Partition by the first UNIQUE column<br />Are there UNIQUE columns?<br />No<br />Partition by internally generated row id<br />
    38. 38. Demo <br />default partitioned tables, colocation, persistent tables<br />
    39. 39. Scaling with Partitioned tables<br />
    40. 40. Hash partitioning for linear scaling<br />Key Hashing provides single hop access to its partition<br />But, what if the access is not based on the key … say, joins are involved<br />
    41. 41. Hash partitioning only goes so far<br />Consider this query :<br />Select * from flights, flightAvailability<br />where <equijoin flights with flightAvailability> <br />and flightId ='xxx';<br />If both tables are hash partitioned the join logic will need execution on all nodes where flightavailability data is stored<br />Distributed joins are expensive and inhibit scaling<br />joins across distributed nodes could involve distributed locks and potentially a lot of intermediate data transfer across nodes<br />EquiJOIN of rows across multiple nodes is not supported in SQLFire 1.0<br />
    42. 42. Partition aware DB design<br />Designer thinks about how data maps to partitions<br />The main idea is to:<br />minimize excessive data distribution by keeping the most frequently accessed and joined data collocated on partitions<br />Collocate transaction working set on partitions so complex 2-phase commits/paxos commit is eliminated or minimized.<br />Read Pat Helland’s “Life beyond Distributed Transactions” and the Google MegaStore paper<br />
    43. 43. Partition aware DB design<br />Turns out OLTP systems lend themselves well to this need<br />Typically it is the number of entities that grows over time and not the size of the entity. <br />Customer count perpetually grows, not the size of the customer info<br />Most often access is very restricted and based on select entities<br />given a FlightID, fetch flightAvailability records<br />given a customerID, add/remove orders, shipment records<br />Identify partition key for “Entity Group”<br />"entity groups": set of entities across several related tables that can all share a single identifier<br />flightIDis shared between the parent and child tables<br />CustomerID shared between customer, order and shipment tables<br />
    44. 44. Partition aware DB design<br />Entity groups defined in SQLFire using “colocation” clause<br />Entity group guaranteed to be collocated in presence of failures or rebalance<br />Now, complex queries can be executed without requiring excessive distributed data access<br />
    45. 45. Partition Aware DB design<br />STAR schema design is the norm in OLTP design<br />Fact tables (fast changing) are natural partitioning candidates<br />Partition by: FlightID … Availability, history rows colocated with Flights<br />Dimension tables are natural replicated table candidates<br />Replicate Airlines, Countries, Cities on all nodes<br />Dealing with Joins involving M-M relationships<br />Can the one side of the M-M become a replicated table?<br />If not, run the Join logic in a parallel stored procedure to minimize distribution<br />Else, split the query into multiple queries in application<br />
    46. 46. Scaling Application logic with Parallel “Data Aware procedures”<br />
    47. 47. Procedures<br />Java Stored Procedures may be created according to the SQL Standard<br />CREATE PROCEDURE getOverBookedFlights<br />(IN argument OBJECT, OUT result OBJECT)<br />LANGUAGE JAVA PARAMETER STYLE JAVA <br />READS SQL DATA DYNAMIC RESULT SETS 1 <br />EXTERNAL NAME com.acme.OverBookedFLights;<br />SQLFabric also supports the JDBC type Types.JAVA_OBJECT. A parameter of type JAVA_OBJECT supports an arbitrary Serializable Java object. <br />In this case, the procedure will be executed on the server to which a client is connected (or locally for Peer Clients)<br />
    48. 48. Data Aware Procedures<br />CALL [PROCEDURE]<br />procedure_name<br />( [ expression [, expression ]* ] )<br />[ WITH RESULT PROCESSOR processor_name ]<br />[ { ON TABLE table_name [ WHERE whereClause ] } |<br /> { ON {ALL | SERVER GROUPS (server_group_name [, server_group_name ]*) }}<br />]<br />Client<br />Fabric Server 1<br />Fabric Server 2<br />Parallelize procedure and prune to nodes with required data<br />Extend the procedure call with the following syntax:<br />CALL getOverBookedFlights( <bind arguments><br />ON TABLE FLIGHTAVAILABILITY <br />WHERE FLIGHTID = <SomeFLIGHTID> ;<br />Hint the data the procedure depends on<br />If table is partitioned by columns in the where clause the procedure execution is pruned to nodes with the data (node with <someFLIGHTID> in this case)<br />
    49. 49. Parallelize procedure then aggregate (reduce)<br />CALL [PROCEDURE]<br />procedure_name<br />( [ expression [, expression ]* ] )<br />[ WITH RESULT PROCESSOR processor_name]<br />[ { ON TABLE table_name [ WHERE whereClause ] } |<br /> { ON {ALL | SERVER GROUPS (server_group_name [, server_group_name ]*) }}<br />]<br />register a Java Result Processor (optional in some cases):<br />CALL SQLF.CreateResultProcessor( processor_name, processor_class_name);<br />Client<br />Fabric Server 1<br />Fabric Server 2<br />Fabric Server 3<br />
    50. 50. Consistency model<br />
    51. 51. Consistency Model without Transactions<br />Replication within cluster is always eager and synchronous<br />Row updates are always atomic; No need to use transactions<br />FIFO consistency: writes performed by a single thread are seen by all other processes in the order in which they were issued<br />Consistency in Partitioned tables<br />a partitioned table row owned by one member at a point in time<br />all updates are serialized to replicas through owner<br />"Total ordering" at a row level: atomic and isolated<br />Membership changes and consistency<br />Pessimistic concurrency support using ‘Select for update’<br />Support for referential integrity<br />
    52. 52. Distributed Transactions<br />Full support for distributed transactions (Single phase commit)<br />Highly scalable without any centralized coordinator or lock manager<br />We make some important assumptions<br />Most OLTP transactions are small in duration and size<br />W-W conflicts are very rare in practice<br />How does it work?<br />Each data node has a sub-coordinator to track TX state<br />Eagerly acquire local “write” locks on each replica<br />Object owned by a single primary at a point in time<br />Fail fast if lock cannot be obtained<br />Atomic and works with the cluster Failure detection system<br />Isolated until commit<br />Only support local isolation during commit<br />
    53. 53. Parallel disk persistence<br />
    54. 54. Why is disk latency so high?<br />Challenges<br />Disk seek times is still > 2ms<br />OLTP transactions are small writes<br />Flushing to disk will result in a seek<br />Best rates in 100s per second<br />RDBs and NoSQL try to avoid the problem<br />Append to transaction logs; out-of-band writes to data files<br />But, reads can cause seeks to disk<br />
    55. 55. Disk persistence in SQLF<br />Parallel log structured storage<br />Each partition writes in parallel<br />Backups write to disk also<br />Increase reliability against h/w loss<br /><ul><li>Don’t seek to disk
    56. 56. Don’t flush all the way to disk
    57. 57. Use OS scheduler to time write
    58. 58. Do this on primary + secondary
    59. 59. Realize very high throughput</li></li></ul><li>Performance benchmark<br />
    60. 60. How does it perform? Scale?<br />Scale from 2 to 10 servers (one per host)<br />Scale from 200 to 1200 simulated clients (10 hosts)<br />Single partitioned table: int PK, 40 fields (20 ints, 20 strings)<br />
    61. 61. How does it perform? Scale?<br />CPU% remained low per server – about 30% indicating many more clients could be handled<br />
    62. 62. Is latency low with scale?<br />Latency decreases with server capacity<br />50-70% take < 1 millisecond<br />About 90% take less than 2 milliseconds<br />Small percentage of outliers<br />
    63. 63. Q & A<br />VMWarevFabricSQLFire BETA will be released in Early June<br />Checkout community.gemstone.com<br />
    64. 64. Built using GemFire object data fabric + Derby<br />52<br />

    ×