SQLFire at Strata 2012


Published on

SQLFire is VMware's in-memory distributed NewSQL database.

I delivered this preso in connection with Jags, the product architect and we covered the design choices SQLFire makes to achieve extreme scalability, as well as the connection between big data and fast data.

The deck looks a little different in presenter mode so for best results download and enjoy.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Let's turn now to a hands-on look at some SQLFire features.On the left we're going to have the SQL code you can use in SQLFire and on the right we'll talk about what the code actually does.For starters we'll create a very simple table, in just the same way you would create it in other databases. By default tables in SQLFire are replicated across all nodes in the SQLFire cluster.That means, for one thing, that if a server crashes all the data in that table is still available. This approach is best for small datasets and data that is frequently accessed or used in joins.
  • Partitioning data is more sophisticated and more interesting. SQLFire has a keyword, "PARTITION BY", which tells SQLFire that the data in that table should be split up across all available nodes.This approach is a must for large datasets.
  • There are a lot of different ways to partition data in SQLFire, by default SQLFire will try to evenly distribute data at random across all servers If that's not good enough you can exert a lot of control over how data is divided and distributed using list, range or expression based partitioning.
  • Partitioning creates a challenge, by default data lives only on one node and if you lose that node the data is offline. We can solve that with the redundancy keyword. Using this causes SQLFire to keep multiple copies of the data on different servers so that if you lose a node, all the data is still available. Redundancy is usually a good idea and you can even keep data in 3 or 4 different servers at once. Most typically you're going to want a redundancy of 1.
  • Co-location is a key feature that allows SQLFire to be a real SQL database and horizontally scalable at the same time. When I talk to people who know distributed databases they usually ask "how do you do distributed joins?" The answer is, we don't. Instead we allow related data to be grouped together on the same physical node. This is done with the COLOCATE WITH keyword, which associates tables together based on a foreign key and keeps related rows on the same server. In this example we have customer 1 and customer 2 stored on different nodes. The COLOCATE WITH keyword lets me ensure that sales records from customer 1 end up on node 1 and records from customer 2 end up on node 2.
  • Map-reduce is great when you have to sequentially apply an operation to every record. For instance, text tokenization, indexing. But, SQLFire DAP is a generic distributed RPC mechanism that brings the power of SQL searches to each partition node.For instance, data mining, scoring where tasks are continuously looking for data of interest using queries. By having each node return the result from its “in-process” memory and parallelizing the work on any number of processors, it becomes highly efficient way to parallel process data.
  • SQLFire at Strata 2012

    1. 1. SQLFireJags Ramnarayan – Chief Architect, SQLFireCarter Shanklin – Product Manager, SQLFire
    2. 2. Sponsor Sessions Suck• – – –
    3. 3. Speed MattersUsers demand fast applications and fast websites. The database is the hardest thing to scale.
    4. 4. SQLFire: Speed, Scale, SQL Speed Scale SQL• In-memory for maximum • Horizontally scalable. • Familiar SQL interface. speed and minimum • Add or remove nodes at • SQL 92 compliant. latency. any time for more • JDBC and ADO.NET capacity or availability. interfaces.
    5. 5. How does SQLFire get scale and speed?• –• –• –
    6. 6. Diverging needs for online and analytics
    7. 7. SQLFire: What does it really look like?
    8. 8. SQLFire Tables Are Replicated By Default.1 CREATE TABLE sales SQLFire Node 12 (product_id int, store_id int, Replica3 price float); sales456 SQLFire Node 27 Replica8 Best for small and9 frequently accessed data.10
    9. 9. Partitioned Tables Are Split Among Members.1 CREATE TABLE sales SQLFire Node 12 (product_id int, store_id int, Replica3 price float) sales Partition 14 PARTITION BY5 COLUMN (product_id);6 SQLFire Node 27 Replica8 Best for large Partition 29 data sets.10
    10. 10. Types Of Partitioning In SQLFire. Type Purpose Example Built-in hashing algorithmHash Partitioning splits data at random across PARTITION BY COLUMN (customer_id); (Default) available servers. Manually divide data across PARTITION BY LIST (home_state) List servers based on discrete (VALUES (‘CA’, ‘WA’), criteria. VALUES (‘TX’, ‘OK’)); Manually divide data across PARTITION BY RANGE (date) Range servers based on continuous (VALUES BETWEEN ‘2008-01-01’ AND ‘2008-12-31’, criteria. VALUES BETWEEN ‘2009-01-01’ AND ‘2009-12-30’); Fully dynamic division of data Expression based on function execution. PARTITION BY (MONTH(date)); Can use UDFs.
    11. 11. How does it scale for queries? 1M Partitioned Table 1000PK queries per second 790k (1kb Rows) 800 604k 600 420k 400 200k 200 # Clients = 2*N N= 2 4 6 8 10 Number Of Servers
    12. 12. How does it scale for updates? 1.3M Partitioned Table 1000Updates Per Second (3 columns) 950k 800 750k 600 490k 400 220k 85% < 1ms 200 latency # Clients = 2*N N= 2 4 6 8 10 Number Of Servers
    13. 13. Redundancy Increases Availability.1 CREATE TABLE sales SQLFire Node 12 (product_id int, store_id int, Replica3 price float) sales Partition 14 PARTITION BY Partition 2*5 COLUMN (product_id);6 REDUNDANCY 1; SQLFire Node 27 Replica8 All data is available Partition 29 if Node 1 fails. Partition 1*10
    14. 14. Partitioning and redundancy Replication issynchronous but done Replication can be in parallel “rack aware” Single owner Redundancy = 2 for any row at point (but tunable) in time
    15. 15. SQLFire: Derp-Proof Database••• Was that cord supposed to be in the wall?
    16. 16. Linearly scaling joins•• –
    17. 17. Partition Aware DB Design–
    18. 18. Collocate Data For Fast Joins.1 CREATE TABLE sales Related data placed SQLFire Node 12 (product_id int, store_id int, on the same node. Replica3 price float) Customer 14 PARTITION BY C1 Customer 1 Sales5 COLUMN (product_id);6 COLOCATE WITH customers; SQLFire Node 27 C2 Replica8 SQLFire can join Customer 29 tables without Customer 2 Sales10 network hops.
    19. 19. Collocate Data For Fast Joins. Related data placed SQLFire Node 1 on the same node. Replica Customer 1 C1 Customer 1 Sales SQLFire Node 2 C2 Replica SQLFire can join Customer 2 tables without Customer 2 Sales network hops.
    20. 20. Collocate Data For Fast Joins. Related data placed SQLFire Node 1 on the same node. Replica Customer 1 C1 Customer 1 Sales Parallel scatter-gather SQLFire Node 2 C2 Replica Customer 2 In parallel, each node does hash join, aggregation locally Customer 2 Sales
    21. 21. Dynamic Data Colocation Dynamic entity Based on foreigngroup formation key relationships Single master for Redundancy = 2 any entity group
    22. 22. Data-Aware Stored Procs••••• Like Map/Reduce But Different
    23. 23. Scaling Stored Procedures1 CALL maxSales(arguments) SQLFire uses data- maxSales on2 aware routing to local data ON TABLE sales route processing to3 WHERE (Location in (‘CA’,’WA’,’OR) the data.4 WITH RESULT PROCESSOR5 maxSalesReducer maxSalesReducer678 Result Processors9 give map/reduce maxSales on functionality. local data10
    24. 24. Scalability: Consistency Assumes:Most x-actions small in space and timeWrite-write conflicts rare
    25. 25. Scalability: High performance persistence• Memory Memory Tables Tables• LOG Compressor LOG Compressor• – OS Buffers OS Buffers Record1 Record1 Record1 Record2 Record2 Append only Record1 Record2 Record2 Append only Record3 Record3 Operation logs Record3 Record3 Operation logs
    26. 26. Demos!
    27. 27. Demo: Distributed Procedures••••
    28. 28. Demo: Caching••••
    29. 29. :sigh:Download: Just Google it Try SQLFire Today! Free for developer use to 3 nodes. Forum: Got questions? Get answers. Twitter: I need more followers to get a promotion.
    30. 30. Demo Details
    31. 31. Scaling Stored Procs (1) Ubuntu (database)Insert Timeseries
    32. 32. Scaling Stored Procs (2) Ubuntu (database)Insert Timeseries Compute Autocorrelations Complete
    33. 33. Scaling Stored Procs (3) Ubuntu Ubuntu Ubuntu (database) (database) (database) Insert Timeseries Rebalance Rebalance Compute Autocorrelations Compute Autocorrelations Compute Autocorrelations All usingstandard SQL APIs Complete Complete Complete
    34. 34. Caching Analytics (1) Continuous Batch Processing
    35. 35. Caching Analytics (2) Ubuntu (database) Low latency In-memory caching JDBC row loader Continuous Batch Processing
    36. 36. Caching Analytics (3) Ubuntu (database) Low latency In-memory caching Scalable + Tunable Cache Policies Continuous Batch Processing
    37. 37. Caching Policies• LRU Count – Overflow to disk or destroy.• Time To Live – Counter ticks as soon as the row is loaded.• Idle Time – Destroy rows when they are not accessed for a while.• Specified in CREATE TABLE syntax.