Successfully reported this slideshow.

SQLFire at VMworld Europe 2011

1,093 views

Published on

SQLFire is a high-performance, memory-optimized distributed SQL database.

SQLFire databases run on multiple servers simultaneously, but present a standard SQL interface to client applications, and appear to be just one database. SQLFi
re also makes it easy to add or remove servers at any time, which makes redundan
cy and elastic scaling very simple.

This presentation has an overview of SQLFire as well as a walkthrough of the SQL
extensions SQLFire uses to create a real distributed SQL database. Importantly
all of the extensions are in the way tables are defined (i.e. the DDL commands) rather than extentions to data inserts or queries so clients are completely unaw
are of SQLFire's distributed nature.

Published in: Technology
  • SQLFire is a high-performance, memory-optimized distributed SQL database.

    SQLFire databases run on multiple servers simultaneously, but present a standard SQL interface to client applications, and appear to be just one database. SQLFi
    re also makes it easy to add or remove servers at any time, which makes redundan
    cy and elastic scaling very simple.

    This presentation has an overview of SQLFire as well as a walkthrough of the SQL
    extensions SQLFire uses to create a real distributed SQL database. Importantly
    all of the extensions are in the way tables are defined (i.e. the DDL commands) rather than extentions to data inserts or queries so clients are completely unaw
    are of SQLFire's distributed nature.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

SQLFire at VMworld Europe 2011

  1. 1. Managing High Performance Data with vFabric SQLFire Carter Shanklin Product Manager for vFabric © 2009 VMware Inc. All rights reserved
  2. 2. Agenda What is SQLFire? Why SQL vs. NoSQL Why SQLFire versus other SQL databases SQLFire features + Demo How SQLFire Scales • Hash partitioning • Entity groups and collocation • Data-aware stored procedures Consistency model Shared nothing persistence
  3. 3. What is vFabric SQLFire? SQLFire is a memory-optimized distributed SQL database. SQLFire attacks scalability challenges in two ways: • Relaxes ACID semantics somewhat (in transactions and in replication) • Horizontally scalable. Add capacity by adding nodes. SQLFire has built-in high availability and native support for replication to multiple datacenters. SQLFire provides a real SQL interface. Ships with JDBC and ADO.NET bindings with more to come. SQLFire can also be used as a cache in front of other databases.
  4. 4. SQLFire at-a-glance Java C# Client Client JDBC or ADO.NET JDBC Many physical machine nodes appear as one logical system As data changes, subscribers are pushed Data transparently replicated and/or partitioned; notification events Redundant storage can be in memory and/or on disk Increase/Decrease capacity on the fly Shared Nothing disk Synchronous read through, persistence write through or Each cache instance can optionally persist to disk Asynchronous write-behind to other data sources and sinks File Other system Databases4
  5. 5. The database world is changing. Many new data models (NoSQL) are emerging • Key-value • Column family (inspired by Google BigTable) • Document • Graph Most focus on making model less rigid than SQL Consistency model is not ACID Low scale High scale Very high scale STRICT – Tunable Eventual Full ACID Consistency (RDB) Different tradeoffs for different goals
  6. 6. SQLFire Versus NoSQL Attribute NoSQL SQLFireDB Interface Idiosyncratic (i.e. each is Standard SQL. custom).Querying Idiosyncratic or not present. SQL Queries.Data Tunable, most favor eventual Tunable, favors highConsistency consistency. consistency.Transactions Weak or not present. Linearly scalable transaction model.Interface Design Designed for simplicity. Designed for compatibility.Data Model Wide variety of different Relational model. models.Schema Focus on extreme flexibility, SQL model, requires DBFlexibility dynamism. migrations, etc.
  7. 7. SQLFire Versus Other SQL Databases Attribute SQLFire Other SQL DBsDB Interface Standard SQL. Standard SQL.Data Tunable. Mix of eventual High consistency.Consistency consistency and high consistency.Transactions Supported. Very strong support.Scaling Model Scale out, commodity Scale up. servers.
  8. 8. SQLFire challenges traditional DB design, not SQLBuffers primarily tuned for IO First write to LOGSecond write to Data files  Too much I/O  Design roots don‟t necessarily apply today • Too much focus on ACID8 • Disk synchronization bottlenecks Confidential
  9. 9. SQLFire 1.0 Notable Features Horizontally scalable with Partitioning and Replication Multiple Topologies • Client/Server, Asynchronous replication over WAN Queries • Distributed and memory-optimized Procedures and Functions • Standard Java stored procedures with “data awareness” Caching • Loader, writers, Eviction, Overflow and Expiration Event framework • Listeners, triggers, Asynchronous write behind Command line tools Manageability, Security
  10. 10. Scaling SQLFire Partitioning & Replication
  11. 11. How SQLFire scales a common DB schema. FLIGHTAVAILABILITY --------------------------------------------- FLIGHTS FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , --------------------------------------------- FLIGHT_DATE DATE NOT NULL , ECONOMY_SEATS_TAKEN INTEGER , FLIGHT_ID CHAR(6) NOT NULL , ….. SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), 1–M PRIMARY KEY ( FLIGHT_ID, DEPART_TIME TIME, SEGMENT_NUMBER, ….. FLIGHT_DATE)) PRIMARY KEY (FLIGHT_ID, FOREIGN KEY (FLIGHT_ID, SEGMENT_NUMBER) SEGMENT_NUMBER) REFERENCES FLIGHTS ( FLIGHT_ID, SEGMENT_NUMBER) 1–1 FLIGHTHISTORY --------------------------------------------- SEVERAL CODE/DIMENSION TABLES FLIGHT_ID CHAR(6), --------------------------------------------- SEGMENT_NUMBER INTEGER, ORIG_AIRPORT CHAR(3), AIRLINES: AIRLINE INFORMATION (VERY STATIC) DEPART_TIME TIME, COUNTRIES : LIST OF COUNTRIES SERVED BY FLIGHTS DEST_AIRPORT CHAR(3), CITIES: ….. MAPS: PHOTOS OF REGIONS SERVED Assume, thousands of flight rows, millions of flightavailability records
  12. 12. Creating Tables CREATE TABLE AIRLINES ( AIRLINE CHAR(2) NOT NULL PRIMARY KEY, AIRLINE_FULL VARCHAR(24), BASIC_RATE DOUBLE PRECISION, DISTANCE_DISCOUNT DOUBLE PRECISION,…. ); Table SQLF SQLF SQLF
  13. 13. Replicated Tables CREATE TABLE AIRLINES ( AIRLINE CHAR(2) NOT NULL PRIMARY KEY, AIRLINE_FULL VARCHAR(24), BASIC_RATE DOUBLE PRECISION, DISTANCE_DISCOUNT DOUBLE PRECISION,…. ) REPLICATE; Replicated Table Replicated Table Replicated Table SQLF SQLF SQLF
  14. 14. Partitioned Tables CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), DEST_AIRPORT CHAR(3) DEPART_TIME TIME, FLIGHT_MILES INTEGER NOT NULL) PARTITION BY COLUMN(FLIGHT_ID); Replicated Table Replicated Table Table Replicated Table Partitioned Table Partitioned Table Partitioned Table SQLF SQLF SQLF
  15. 15. Partition Redundancy CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), DEST_AIRPORT CHAR(3) DEPART_TIME TIME, FLIGHT_MILES INTEGER NOT NULL) PARTITION BY COLUMN (FLIGHT_ID) REDUNDANCY 1; Replicated Table Replicated Table Table Replicated Table Partitioned Table Partitioned Table Partitioned Table Redundant Partition Redundant Partition Redundant Partition SQLF SQLF SQLF
  16. 16. Partition Colocation CREATE TABLE FLIGHTAVAILABILITY ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , FLIGHT_DATE DATE NOT NULL , ECONOMY_SEATS_TAKEN INTEGER DEFAULT 0, …) PARTITION BY COLUMN (FLIGHT_ID) COLOCATE WITH (FLIGHTS); Replicated Table Replicated Table Table Replicated Table Partitioned Table Partitioned Table Partitioned Table Colocated Partition Colocated Partition Colocated Partition Redundant Partition Redundant Partition Redundant Partition SQLF SQLF SQLF
  17. 17. Persistent Tables CREATE TABLE FLIGHTAVAILABILITY ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , FLIGHT_DATE DATE NOT NULL , ECONOMY_SEATS_TAKEN INTEGER DEFAULT 0, …) PARTITION BY COLUMN (FLIGHT_ID) COLOCATE WITH (FLIGHTS) PERSISTENT persistentStore ASYNCHRONOUS;Data dictionary is always persisted in each server sqlf backup /export/fileServerDirectory/sqlfireBackupLocation Replicated Table Replicated Table Table Replicated Table Partitioned Table Partitioned Table Partitioned Table Colocated Partition Colocated Partition Colocated Partition Redundant Partition Redundant Partition Redundant Partition SQLF SQLF SQLF
  18. 18. Demo Scaling with partitioned tables. FLIGHTAVAILABILITY --------------------------------------------- FLIGHTS FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , --------------------------------------------- FLIGHT_DATE DATE NOT NULL , ECONOMY_SEATS_TAKEN INTEGER , FLIGHT_ID CHAR(6) NOT NULL , ….. SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), 1–M PRIMARY KEY ( FLIGHT_ID, DEPART_TIME TIME, SEGMENT_NUMBER,….. FLIGHT_DATE))PRIMARY KEY (FLIGHT_ID, FOREIGN KEY (FLIGHT_ID,SEGMENT_NUMBER) SEGMENT_NUMBER) REFERENCES FLIGHTS ( FLIGHT_ID, SEGMENT_NUMBER) 1–1 FLIGHTHISTORY --------------------------------------------- SEVERAL CODE/DIMENSION TABLES FLIGHT_ID CHAR(6), --------------------------------------------- SEGMENT_NUMBER INTEGER, ORIG_AIRPORT CHAR(3), AIRLINES: AIRLINE INFORMATION (VERY STATIC) DEPART_TIME TIME, COUNTRIES : LIST OF COUNTRIES SERVED BY FLIGHTS DEST_AIRPORT CHAR(3), CITIES: ….. MAPS: PHOTOS OF REGIONS SERVED
  19. 19. Hash partitioning for linear scalingKey Hashing provides single hop access to its partitionBut, what if the access is not based on the key … say, joins are involved
  20. 20. Pure hash-based partitioning will only get you so far. Consider this query : select * from flights, flightAvailability where flights.id = flightAvailability.flightid and flight.fromAirport = ‘CPH’; If both tables are simply hash partitioned the join logic will need execution on all nodes where flightavailability data is stored. This will not scale. • joins across distributed nodes could involve distributed locks and potentially a lot of intermediate data transfer across nodes
  21. 21. To scale we need partition-aware DB design. DB architect must think about how data maps to partitions. The main idea is to: • minimize excessive data distribution by keeping the most frequently accessed and joined data collocated on partitions. Read Pat Helland‟s “Life beyond Distributed Transactions” and the Google MegaStore paper.
  22. 22. SQLFire allows partition-aware design with the “colocated” keyword. Entity Groups FlightID is the entity group KeyTable FlightAvailability partitioned by FlightID colocated with Flights
  23. 23. Solving this scalability problem with SQLFire. Create flightAvailability as follows: CREATE TABLE flightAvailability … partitioned by flightid colocate with flights; Re-run the query: select * from flights, flightAvailability where flights.id = flightAvailability.flightid and flight.fromAirport = ‘CPH’; The query is restricted to nodes containing flights with CPH as the fromAirport.
  24. 24. More about partition-aware database design. OLTP systems tend to be partitionable. • Typically it is the number of entities that grows over time and not the size of the entity.  Customer count perpetually grows, not the size of the customer info • Most often access is very restricted and based on select entities  given a FlightID, fetch flightAvailability records  given a customerID, add/remove orders, shipment records Identify partition key for “Entity Group” • "entity groups": set of entities across several related tables that can all share a single identifier  flightID is shared between the parent and child tables  CustomerID shared between customer, order and shipment tables
  25. 25. Scaling Application logic withParallel “Data Aware procedures”
  26. 26. Stored Procedures in SQLFire. SQLFire stored procedures. • Written in pure Java rather than proprietary extensions. • Created and defined based on SQL standards. • Supports “data awareness” and run only on nodes where applicable data resides. • They support a map/reduce-like execution style. Benefits: • Write procedures in pure Java or take advantage of existing Java libraries. • Easily take advantage of SQLFire as a highly scalable distributed system.
  27. 27. Procedures Java Stored Procedures may be created according to the SQL Standard CREATE PROCEDURE getOverBookedFlights (IN argument OBJECT, OUT result OBJECT) LANGUAGE JAVA PARAMETER STYLE JAVA READS SQL DATA DYNAMIC RESULT SETS 1 EXTERNAL NAME com.acme.OverBookedFLights; SQLFire also supports the JDBC type Types.JAVA_OBJECT. A parameter of type JAVA_OBJECT supports an arbitrary Serializable Java object. In this case, the procedure will be executed on the server to which a client is connected (or locally for Peer Clients)
  28. 28. Data Aware Procedures Parallelize procedure and prune to nodes with required dataExtend the procedure call with the following syntax: Client CALL [PROCEDURE] procedure_name ( [ expression [, expression ]* ] ) [ WITH RESULT PROCESSOR processor_name ] [ { ON TABLE table_name [ WHERE whereClause ] } | { ON {ALL | SERVER GROUPS (server_group_name [, server_group_name ]*) }} Fabric Server 1 Fabric Server 2 ]CALL getOverBookedFlights( <bind arguments>ON TABLE FLIGHTAVAILABILITY Hint the data the procedure depends onWHERE FLIGHTID = <SomeFLIGHTID> ; If table is partitioned by columns in the where clause the procedure execution is pruned to nodes with the data (node with <someFLIGHTID> in this case)
  29. 29. Parallelize procedure then aggregate (reduce) CALL [PROCEDURE] procedure_nameregister a Java Result Processor (optional in some cases): ( [ expression [, expression ]* ] ) CALL SQLF.CreateResultProcessor( [ WITH RESULT PROCESSOR processor_name ] processor_name, processor_class_name); [ { ON TABLE table_name [ WHERE whereClause ] } | { ON {ALL | SERVER GROUPS (server_group_name [, server_group_name ]*) }} ] Client Fabric Server 1 Fabric Server 2 Fabric Server 3
  30. 30. Consistency model
  31. 31. Consistency Model without Transactions Replication within cluster is always eager and synchronous Row updates are always atomic; No need to use transactions FIFO consistency: writes performed by a single thread are seen by all other processes in the order in which they were issued Consistency in Partitioned tables • a partitioned table row owned by one member at a point in time • all updates are serialized to replicas through owner • "Total ordering" at a row level: atomic and isolated Membership changes and consistency – need another hour  Pessimistic concurrency support using „Select for update‟ Support for referential integrity
  32. 32. SQLFire Transactions Highly scalable without any centralized coordinator or lock manager We make some important assumptions • Most OLTP transactions are small in duration and size • Write-write conflicts are very rare in practice How does it work? • Each data node has a sub-coordinator to track TX state • Eagerly acquire local “write” locks on each replica  Object owned by a single primary at a point in time • Fail fast if lock cannot be obtained Atomic and works with the cluster Failure detection system Isolated until commit • Only support local isolation during commit
  33. 33. Scaling disk access with shared nothing disk files and a “journaling” store design
  34. 34. Disk persistence in SQLF Memory Memory Tables Tables LOG LOG Compressor Compressor OS Buffers OS Buffers Record1 Record1 Record1 Record2 Record2 Append only Record1 Record2 Record2 Append only Record3 Record3 Operation logs Record3 Record3 Operation logs Parallel log structured storage • Don’t seek to disk Each partition writes in • Don’t flush all the way to disk parallel – Use OS scheduler to time write Backups write to disk also • Do this on primary + secondary • Increase reliability against h/w • Realize very high throughput loss
  35. 35. Performance benchmark
  36. 36. How does it perform? Scale? Scale from 2 to 10 servers (one per host) Scale from 200 to 1200 simulated clients (10 hosts) Single partitioned table: int PK, 40 fields (20 ints, 20 strings)
  37. 37. How does it perform? Scale? CPU% remained low per server – about 30% indicating many more clients could be handled
  38. 38. Is latency low with scale? Latency decreases with server capacity 50-70% take < 1 millisecond About 90% take less than 2 milliseconds
  39. 39. SQLFire beta available nowhttp://vmware.com/go/sqlfireQ&A

×