vFabric SQLFire Introduction

  • 3,945 views
Uploaded on

VMWare vFabric SQLFire - scalable SQL instead of NoSQL …

VMWare vFabric SQLFire - scalable SQL instead of NoSQL

There is quite a bit of buzz thesedays on "NoSQL" databases. The lack of transactions and good support for querying (SQL) has been a problem for many to adopt these solutions. This talk presents, VMWare SQLFire, a distributed SQL data management solution that melds Apache Derby (borrowing SQL drivers, parsing and some aspects of the engine) and an object data grid (GemFire) to offer a horizontally scalable, memory oriented data management system where developers can continue to use SQL. We focus on new primitives that extend the well known SQL Data definition syntax for data partitioning and replication strategies but leaving the "select" and data manipulation part of SQL intact so it only minimally impacts your application.

I gave this presentation at What's next, Paris 2011(http://www.whatsnextparis.com/abouttheseminar.html).

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,945
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
170
Comments
0
Likes
4

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • <-- Strict(Full ACID) ----FIFO(tunable) ---- Eventual ---> (Inpired by Amazon dynamo) RDBMS is synonymous with ACID Tunable: ACID transactions is a choice; by default it could be FIFO Eventual: All bets are off ... you may write and read back and get a different answer or multiple answers (netflix example)

Transcript

  • 1. SQLFire
    Scalable SQL instead of NoSQL
    Jags Ramnarayan
    Chief Architect, GemFire Products
    Jags Ramnarayan
  • 2. Agenda
    Various NoSQL attributes and why SQL
    SQLFire features + Demo
    Scalability patterns
    Hash partitioning
    Entity groups and collocation
    Scaling behavior using “data aware stored procedures”
    Consistency model
    How we do distributed transactions
    Shared nothing persistence
  • 3. 3
    We Challenge the traditional RDBMS design NOT SQL
    First write to LOG
    Second write to Data files
    Buffers primarily tuned for IO
    • Too much I/O
    • 4. Design roots don’t necessarily apply today
    • 5. Too much focus on ACID
    • 6. Disk synchronization bottlenecks
    Confidential
  • 7. “Shared nothing” commodity clusters
    focus shifts to memory, distributing data and clustering
    Scale by partitioning the data and move behavior to data nodes
    HA within cluster and across data centers
    Add capacity to scale dynamically
    Common themes in next-gen DB architectures
    4
    NoSQL, Data Grids, Data Fabrics, NewSQL
    Confidential
  • 8. What is different ?
    • Several data models
    • 9. Key-value
    • 10. Column family (inspired by Google BigTable)
    • 11. Document
    • 12. Graph
    • 13. Most focus on making model less rigid than SQL
    • 14. Consistency model is not ACID
    Low scale
    Very high scale
    High scale
    Tunable Consistency
    Eventual
    STRICT – Full ACID (RDB)
    5
  • 15. What is our take with SQLFire?
    Eventual consistency is too difficult for the average developer
    Write(A,1)  Read(A) may return 2 or (1,2)
    SQL : Flexible, easily understood, strong type system
    essential for integrity as well as query engine efficiency
  • 16. SQLFire
    Replicated, partitioned tables in memory. Redundancy through memory copies.
    Data resides on disk when you explicitly say so
    Powerful SQL engine: standard SQL for select, DML
    DDL has SQLF extensions
    Leverages GemFire data grid engine.
  • 17. SQLFire
    Applications access the distributed DB using JDBC, ADO.NET
    Consistency model is FIFO, Tunable
    Distributed transactions without global locks
  • 18. SQLFire
    Asynchronous replication over WAN
    Synchronous replication within cluster
    Clients failover, failback
    Easily integrate with existing DBs - caching framework to read through, write through or write behind
  • 19. SQLFire
    When nodes are added, data and behavior is rebalanced without blocking current clients
    "Data aware procedures“ - standard Java stored procedures with "data aware" and parallelism extensions
  • 20. Flexible Deployment Topologies
    Java Application cluster can host an embedded clustered database by just changing the URL
    jdbc:sqlfire:;mcast-port=33666;host-data=true
    Confidential
    11
  • 21. Flexible Deployment Topologies
    Confidential
    12
  • 22. Partitioning & Replication
  • 23. Explore features through example
    Assume, thousands of flight rows, millions of flightavailability records
  • 24. SQLF Creating Tables
    CREATE TABLE FLIGHTS
    (
    FLIGHT_ID CHAR(6) NOT NULL PRIMARY KEY,
    SEGMENT_NUMBER INTEGER NOT NULL ,
    ORIG_AIRPORT CHAR(3),
    DEPART_TIME TIME, …
    ) ;
    Hash partitioned on PK by default
    Table
    Partitioned Table
    Partitioned Table
    Partitioned Table
    SQLF
    SQLF
    SQLF
  • 25. CREATE TABLE FLIGHTAVAILABILITY
    (
    FLIGHT_ID CHAR(6) NOT NULL ,
    SEGMENT_NUMBER INTEGER NOT NULL ,
    FLIGHT_DATE DATE NOT NULL ,
    ECONOMY_SEATS_TAKEN INTEGER DEFAULT 0, …)
    PARTITION BY COLUMN (FLIGHT_ID)
    COLOCATE WITH (FLIGHTS)
    CREATE TABLE FLIGHTS
    (
    FLIGHT_ID CHAR(6) NOT NULL ,
    SEGMENT_NUMBER INTEGER NOT NULL ,
    ORIG_AIRPORT CHAR(3),
    DEPART_TIME TIME, …)
    PARTITION BY COLUMN (FLIGHT_ID);
    CREATE TABLE Airlines
    AIRLINE CHAR(2) NOT NULL PRIMARY KEY,
    AIRLINE_FULL VARCHAR(24),
    BASIC_RATE DOUBLE PRECISION,
    DISTANCE_DISCOUNT DOUBLE PRECISION,…. )
    CREATE TABLE FLIGHTS
    (
    FLIGHT_ID CHAR(6) NOT NULL ,
    SEGMENT_NUMBER INTEGER NOT NULL ,
    ORIG_AIRPORT CHAR(3),
    DEPART_TIME TIME, …)
    PARTITION BY COLUMN (FLIGHT_ID)REDUNDANCY 1;
    CREATE TABLE Airlines
    AIRLINE CHAR(2) NOT NULL PRIMARY KEY,
    AIRLINE_FULL VARCHAR(24),
    BASIC_RATE DOUBLE PRECISION,
    DISTANCE_DISCOUNT DOUBLE PRECISION,…. )
    REPLICATE;
    Replicated Table
    Replicated Table
    Replicated Table
    Table
    Redundant Partition
    Redundant Partition
    Partitioned Table
    Partitioned Table
    Redundant Partition
    Partitioned Table
    SQLF
    SQLF
    SQLF
    SQLF Creating Tables
    Colocated Partition
    Colocated Partition
    Colocated Partition
  • 26. CREATE TABLE Airlines
    AIRLINE CHAR(2) NOT NULL PRIMARY KEY,
    AIRLINE_FULL VARCHAR(24),
    BASIC_RATE DOUBLE PRECISION,
    DISTANCE_DISCOUNT DOUBLE PRECISION,…. )
    Table
    SQLF
    SQLF
    SQLF
    SQLF Creating Tables
  • 27. CREATE TABLE Airlines
    AIRLINE CHAR(2) NOT NULL PRIMARY KEY,
    AIRLINE_FULL VARCHAR(24),
    BASIC_RATE DOUBLE PRECISION,
    DISTANCE_DISCOUNT DOUBLE PRECISION,…. )
    REPLICATE;
    Replicated Table
    Replicated Table
    Replicated Table
    SQLF
    SQLF
    SQLF
    SQLF Creating Tables
  • 28. SQLF Creating Tables
    CREATE TABLE FLIGHTS
    (
    FLIGHT_ID CHAR(6) NOT NULL ,
    SEGMENT_NUMBER INTEGER NOT NULL ,
    ORIG_AIRPORT CHAR(3),
    DEPART_TIME TIME,
    PARTITION BY COLUMN (FLIGHT_ID);
    Table
    Replicated Table
    Replicated Table
    Replicated Table
    Partitioned Table
    Partitioned Table
    Partitioned Table
    SQLF
    SQLF
    SQLF
  • 29. CREATE TABLE FLIGHTS
    (
    FLIGHT_ID CHAR(6) NOT NULL ,
    SEGMENT_NUMBER INTEGER NOT NULL ,
    ORIG_AIRPORT CHAR(3),
    DEPART_TIME TIME, …)
    PARTITION BY COLUMN (FLIGHT_ID)REDUNDANCY 1;
    Table
    Replicated Table
    Replicated Table
    Replicated Table
    Partitioned Table
    Partitioned Table
    Partitioned Table
    Redundant Partition
    Redundant Partition
    Redundant Partition
    SQLF
    SQLF
    SQLF
    SQLF Creating Tables
  • 30. CREATE TABLE FLIGHTAVAILABILITY
    (
    FLIGHT_ID CHAR(6) NOT NULL ,
    SEGMENT_NUMBER INTEGER NOT NULL ,
    FLIGHT_DATE DATE NOT NULL ,
    ECONOMY_SEATS_TAKEN INTEGER DEFAULT 0, …)
    PARTITION BY COLUMN (FLIGHT_ID)
    COLOCATE WITH (FLIGHTS)
    Table
    Replicated Table
    Replicated Table
    Replicated Table
    Partitioned Table
    Partitioned Table
    Partitioned Table
    Colocated Partition
    Colocated Partition
    Colocated Partition
    Redundant Partition
    Redundant Partition
    Redundant Partition
    SQLF
    SQLF
    SQLF
    SQLF Creating Tables
  • 31. By default, it is only the data dictionary that is persisted to disk.
    Table
    Replicated Table
    Replicated Table
    Replicated Table
    Partitioned Table
    Partitioned Table
    Partitioned Table
    Colocated Partition
    Colocated Partition
    Colocated Partition
    Redundant Partition
    Redundant Partition
    Redundant Partition
    SQLF
    SQLF
    SQLF
    SQLF Creating Tables
  • 32. CREATE TABLE FLIGHTAVAILABILITY
    (
    FLIGHT_ID CHAR(6) NOT NULL ,
    SEGMENT_NUMBER INTEGER NOT NULL ,
    FLIGHT_DATE DATE NOT NULL ,
    ECONOMY_SEATS_TAKEN INTEGER DEFAULT 0, …)
    PARTITION BY COLUMN (FLIGHT_ID)
    COLOCATE WITH (FLIGHTS)
    PERSISTENT ;
    Table
    Replicated Table
    Replicated Table
    Replicated Table
    Partitioned Table
    Partitioned Table
    Partitioned Table
    Redundant Partition
    Redundant Partition
    Redundant Partition
    SQLF
    SQLF
    SQLF
    SQLF Creating Tables
    Colocated Partition
    Colocated Partition
    Colocated Partition
  • 33. Partitioning Options
    To partition using the Primay Key, use:
    (Primary Key’s Java implementation must hash evenly across its range)
    PARTITION BY PRIMARY KEY
    CREATE TABLE FLIGHTS
    (
    FLIGHT_ID CHAR(6) NOT NULL ,
    SEGMENT_NUMBER INTEGER NOT NULL ,
    ORIG_AIRPORT CHAR(3),
    DEPART_TIME TIME, … )
    PARTITION BY PRIMARY KEY;
  • 34. Partitioning Options
    When you wish to partition on a column or columns that are not the primary key, use:
    PARTITION BY COLUMN (column-name [ , column-name ]*)
    CREATE TABLE FLIGHTAVAILABILITY
    (
    FLIGHT_ID CHAR(6) NOT NULL ,
    SEGMENT_NUMBER INTEGER NOT NULL ,
    FLIGHT_DATE DATE NOT NULL ,
    ECONOMY_SEATS_TAKEN INTEGER DEFAULT 0, …)
    PARTITION BY COLUMN (FLIGHT_ID);
  • 35. Partitioning Options
    You can partition entries based on a range of values of one of the columns:
    PARTITION BY RANGE (column-name )
    ( VALUES BETWEEN value AND value
    [ , VALUES BETWEEN value AND value ]*)
    CREATE TABLE FLIGHTAVAILABILITY
    (
    FLIGHT_ID CHAR(6) NOT NULL ,
    SEGMENT_NUMBER INTEGER NOT NULL ,
    FLIGHT_DATE DATE NOT NULL ,
    ECONOMY_SEATS_TAKEN INTEGER DEFAULT 0, …)
    PARTITION BY RANGE ( economy_seats_taken )
    ( VALUES BETWEEN 0 AND 50,
    VALUES BETWEEN 50 AND 100,
    VALUES BETWEEN 100 AND 500);
  • 36. Partitioning Options
    You can explicitly partition entries based on a list of potential values of a column:
    PARTITION BY LIST ( column-name )
    ( VALUES ( value [ , value ]* ) [ , VALUES ( value [ , value ]* ) ]* )
    CREATE TABLE Orders
    (OrderId INT NOT NULL, ItemId INT, NumItems INT, CustomerId INT, OrderDate DATE, Priority INT, Status CHAR(10),
    CONSTRAINT Pk_Orders PRIMARY KEY (OrderId)
    CONSTRAINT Fk_Items FOREIGN KEY (ItemId) REFERENCES Items(ItemId))
    PARTITION BY LIST ( Status )
    ( VALUES ( 'pending', 'returned' ),
    VALUES ( 'shipped', 'received' ),
    VALUES ( 'hold' ));
  • 37. Default Partitioning
    Yes
    Start
    Use explicit directives
    Is partitioning declared?
    No
    Is the
    referenced table partitioned on the foreign key?
    Yes
    Colocate with referenced table
    Yes
    Are there foreign keys?
    No
    If no PARTITION BY clause is specified, GemFire SQLF will automatically partition and colocate tables based on this algorithm.
    Yes
    Partition by primary key
    Is there a primary key?
    Hashing is performed on the Java implementation of the column’s type.
    No
    Yes
    Partition by the first UNIQUE column
    Are there UNIQUE columns?
    No
    Partition by internally generated row id
  • 38. Demo
    default partitioned tables, colocation, persistent tables
  • 39. Scaling with Partitioned tables
  • 40. Hash partitioning for linear scaling
    Key Hashing provides single hop access to its partition
    But, what if the access is not based on the key … say, joins are involved
  • 41. Hash partitioning only goes so far
    Consider this query :
    Select * from flights, flightAvailability
    where <equijoin flights with flightAvailability>
    and flightId ='xxx';
    If both tables are hash partitioned the join logic will need execution on all nodes where flightavailability data is stored
    Distributed joins are expensive and inhibit scaling
    joins across distributed nodes could involve distributed locks and potentially a lot of intermediate data transfer across nodes
    EquiJOIN of rows across multiple nodes is not supported in SQLFire 1.0
  • 42. Partition aware DB design
    Designer thinks about how data maps to partitions
    The main idea is to:
    minimize excessive data distribution by keeping the most frequently accessed and joined data collocated on partitions
    Collocate transaction working set on partitions so complex 2-phase commits/paxos commit is eliminated or minimized.
    Read Pat Helland’s “Life beyond Distributed Transactions” and the Google MegaStore paper
  • 43. Partition aware DB design
    Turns out OLTP systems lend themselves well to this need
    Typically it is the number of entities that grows over time and not the size of the entity.
    Customer count perpetually grows, not the size of the customer info
    Most often access is very restricted and based on select entities
    given a FlightID, fetch flightAvailability records
    given a customerID, add/remove orders, shipment records
    Identify partition key for “Entity Group”
    "entity groups": set of entities across several related tables that can all share a single identifier
    flightIDis shared between the parent and child tables
    CustomerID shared between customer, order and shipment tables
  • 44. Partition aware DB design
    Entity groups defined in SQLFire using “colocation” clause
    Entity group guaranteed to be collocated in presence of failures or rebalance
    Now, complex queries can be executed without requiring excessive distributed data access
  • 45. Partition Aware DB design
    STAR schema design is the norm in OLTP design
    Fact tables (fast changing) are natural partitioning candidates
    Partition by: FlightID … Availability, history rows colocated with Flights
    Dimension tables are natural replicated table candidates
    Replicate Airlines, Countries, Cities on all nodes
    Dealing with Joins involving M-M relationships
    Can the one side of the M-M become a replicated table?
    If not, run the Join logic in a parallel stored procedure to minimize distribution
    Else, split the query into multiple queries in application
  • 46. Scaling Application logic with Parallel “Data Aware procedures”
  • 47. Procedures
    Java Stored Procedures may be created according to the SQL Standard
    CREATE PROCEDURE getOverBookedFlights
    (IN argument OBJECT, OUT result OBJECT)
    LANGUAGE JAVA PARAMETER STYLE JAVA
    READS SQL DATA DYNAMIC RESULT SETS 1
    EXTERNAL NAME com.acme.OverBookedFLights;
    SQLFabric also supports the JDBC type Types.JAVA_OBJECT. A parameter of type JAVA_OBJECT supports an arbitrary Serializable Java object.
    In this case, the procedure will be executed on the server to which a client is connected (or locally for Peer Clients)
  • 48. Data Aware Procedures
    CALL [PROCEDURE]
    procedure_name
    ( [ expression [, expression ]* ] )
    [ WITH RESULT PROCESSOR processor_name ]
    [ { ON TABLE table_name [ WHERE whereClause ] } |
    { ON {ALL | SERVER GROUPS (server_group_name [, server_group_name ]*) }}
    ]
    Client
    Fabric Server 1
    Fabric Server 2
    Parallelize procedure and prune to nodes with required data
    Extend the procedure call with the following syntax:
    CALL getOverBookedFlights( <bind arguments>
    ON TABLE FLIGHTAVAILABILITY
    WHERE FLIGHTID = <SomeFLIGHTID> ;
    Hint the data the procedure depends on
    If table is partitioned by columns in the where clause the procedure execution is pruned to nodes with the data (node with <someFLIGHTID> in this case)
  • 49. Parallelize procedure then aggregate (reduce)
    CALL [PROCEDURE]
    procedure_name
    ( [ expression [, expression ]* ] )
    [ WITH RESULT PROCESSOR processor_name]
    [ { ON TABLE table_name [ WHERE whereClause ] } |
    { ON {ALL | SERVER GROUPS (server_group_name [, server_group_name ]*) }}
    ]
    register a Java Result Processor (optional in some cases):
    CALL SQLF.CreateResultProcessor( processor_name, processor_class_name);
    Client
    Fabric Server 1
    Fabric Server 2
    Fabric Server 3
  • 50. Consistency model
  • 51. Consistency Model without Transactions
    Replication within cluster is always eager and synchronous
    Row updates are always atomic; No need to use transactions
    FIFO consistency: writes performed by a single thread are seen by all other processes in the order in which they were issued
    Consistency in Partitioned tables
    a partitioned table row owned by one member at a point in time
    all updates are serialized to replicas through owner
    "Total ordering" at a row level: atomic and isolated
    Membership changes and consistency
    Pessimistic concurrency support using ‘Select for update’
    Support for referential integrity
  • 52. Distributed Transactions
    Full support for distributed transactions (Single phase commit)
    Highly scalable without any centralized coordinator or lock manager
    We make some important assumptions
    Most OLTP transactions are small in duration and size
    W-W conflicts are very rare in practice
    How does it work?
    Each data node has a sub-coordinator to track TX state
    Eagerly acquire local “write” locks on each replica
    Object owned by a single primary at a point in time
    Fail fast if lock cannot be obtained
    Atomic and works with the cluster Failure detection system
    Isolated until commit
    Only support local isolation during commit
  • 53. Parallel disk persistence
  • 54. Why is disk latency so high?
    Challenges
    Disk seek times is still > 2ms
    OLTP transactions are small writes
    Flushing to disk will result in a seek
    Best rates in 100s per second
    RDBs and NoSQL try to avoid the problem
    Append to transaction logs; out-of-band writes to data files
    But, reads can cause seeks to disk
  • 55. Disk persistence in SQLF
    Parallel log structured storage
    Each partition writes in parallel
    Backups write to disk also
    Increase reliability against h/w loss
    • Don’t seek to disk
    • 56. Don’t flush all the way to disk
    • 57. Use OS scheduler to time write
    • 58. Do this on primary + secondary
    • 59. Realize very high throughput
  • Performance benchmark
  • 60. How does it perform? Scale?
    Scale from 2 to 10 servers (one per host)
    Scale from 200 to 1200 simulated clients (10 hosts)
    Single partitioned table: int PK, 40 fields (20 ints, 20 strings)
  • 61. How does it perform? Scale?
    CPU% remained low per server – about 30% indicating many more clients could be handled
  • 62. Is latency low with scale?
    Latency decreases with server capacity
    50-70% take < 1 millisecond
    About 90% take less than 2 milliseconds
    Small percentage of outliers
  • 63. Q & A
    VMWarevFabricSQLFire BETA will be released in Early June
    Checkout community.gemstone.com
  • 64. Built using GemFire object data fabric + Derby
    52