Managing High Performance Data with vFabric SQLFire
Agenda Various NoSQL attributes and why SQL SQLFire features +  Demo Scalability patterns Hash partitioning Entity groups and collocation Scaling behavior using  “data aware stored procedures” Consistency model  How we do distributed transactions Shared nothing persistence
We Challenge the traditional RDBMS design NOT SQL Too much I/O Design roots don ’t necessarily apply today Too much focus on ACID Disk synchronization bottlenecks Confidential First write to LOG Second write to Data files Buffers primarily tuned for IO
Confidential Common themes in next-gen DB architectures NoSQL, Data Grids, Data Fabrics, NewSQL “ Shared nothing” commodity clusters focus shifts to memory, distributing data and clustering Scale by partitioning the data and move behavior to data nodes HA within cluster and across data centers Add capacity to scale dynamically
What is different ? Several data models Key-value Column family (inspired by Google BigTable) Document Graph Most focus on making model less rigid than SQL Consistency model is not ACID Low scale  High scale  Very high scale  STRICT – Full ACID (RDB) Tunable Consistency Eventual
What is our take with SQLFire?
So, what is vFabric SQLFire? Distributed, Main memory oriented SQL Data management platform NoSQL characteristics of scalability, performance, availability but retains support for distributed transactions, SQL querying It is also designed so you can use it as a operational layer in front of your legacy databases through a caching framework
Show me a picture
Comparing with NoSQL, Object Data Grids The Good Little vendor specific Use SQL for Application DML, Queries Vendor specific stuff in DDL Better Query engine Cost based optimizer, skip-list indexing, parallel queries No deserialization headaches Maintain referential integrity Easier to integrate with existing relational DBs and other products Plug-n-play is a myth
Comparing with NoSQL, Object Data Grids Not So Good Not as efficient for simple key access You can only manage scalar types Nested graphs is painful Complex data relationships that could be represented as a single object and fetched using a key now may require a join Join processing is computationally expensive OR mapping can add latency
Features in 1.0 Partitioning and Replication Multiple Topologies Peer-2-peer, client-server, WAN Events framework Listeners, triggers, Asynchronous write behind Queries Distributed, optimized for main memory Procedures and Functions Standard Java stored proc and parallel  “data aware”  Caching  Loader, writers, Eviction, Overflow and Expiration Command line tool Manageability, Security
Flexible Deployment Topologies Confidential Java Application cluster can host an embedded clustered database by just changing the URL jdbc:sqlfire:;mcast-port=33666;host-data=true
Flexible Deployment Topologies Confidential
Partitioning & Replication
Explore features through example Assume, thousands of flight rows, millions of flightavailability records
Creating Tables Table CREATE TABLE AIRLINES ( AIRLINE CHAR(2) NOT NULL PRIMARY KEY, AIRLINE_FULL VARCHAR(24), BASIC_RATE DOUBLE PRECISION, DISTANCE_DISCOUNT DOUBLE PRECISION,…. ); SQLF SQLF SQLF
Replicated Tables CREATE TABLE AIRLINES ( AIRLINE CHAR(2) NOT NULL PRIMARY KEY, AIRLINE_FULL VARCHAR(24), BASIC_RATE DOUBLE PRECISION, DISTANCE_DISCOUNT DOUBLE PRECISION,…. ) REPLICATE ; Replicated Table Replicated Table Replicated Table SQLF SQLF SQLF
Partitioned Tables CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), DEST_AIRPORT CHAR(3) DEPART_TIME TIME, FLIGHT_MILES INTEGER NOT NULL ) PARTITION BY COLUMN( FLIGHT_ID ); Table Partitioned Table Partitioned Table Partitioned Table Replicated Table Replicated Table Replicated Table SQLF SQLF SQLF
Partition Redundancy CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), DEST_AIRPORT CHAR(3) DEPART_TIME TIME, FLIGHT_MILES INTEGER NOT NULL)   PARTITION BY COLUMN (FLIGHT_ID)   REDUNDANCY 1 ; Table Partitioned Table Redundant Partition Partitioned Table Redundant Partition Partitioned Table Redundant Partition Replicated Table Replicated Table Replicated Table SQLF SQLF SQLF
Partition Colocation CREATE TABLE FLIGHTAVAILABILITY ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , FLIGHT_DATE DATE NOT NULL , ECONOMY_SEATS_TAKEN INTEGER DEFAULT 0, …) PARTITION BY COLUMN (FLIGHT_ID) COLOCATE WITH (FLIGHTS) ; Table Partitioned Table Redundant Partition Partitioned Table Redundant Partition Partitioned Table Redundant Partition Replicated Table Replicated Table Replicated Table SQLF SQLF SQLF Colocated Partition Colocated Partition Colocated Partition Redundant Partition Redundant Partition Redundant Partition
Persistent Tables CREATE TABLE FLIGHTAVAILABILITY ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , FLIGHT_DATE DATE NOT NULL , ECONOMY_SEATS_TAKEN INTEGER DEFAULT 0, …) PARTITION BY COLUMN (FLIGHT_ID) COLOCATE WITH (FLIGHTS)  PERSISTENT persistentStore ASYNCHRONOUS ; Table Partitioned Table Redundant Partition Partitioned Table Redundant Partition Partitioned Table Redundant Partition Replicated Table Replicated Table Replicated Table SQLF SQLF SQLF Colocated Partition Colocated Partition Colocated Partition Redundant Partition Redundant Partition Redundant Partition sqlf backup /export/fileServerDirectory/sqlfireBackupLocation Data dictionary is always persisted in each server
Demo  default partitioned tables, colocation, persistent tables
Scaling data with Partitioned tables
Hash partitioning for linear scaling Key Hashing provides single hop access to its partition But, what if the access is not based on the key … say, joins are involved
Hash partitioning only goes so far Consider this query : Select * from flights, flightAvailability  where <equijoin flights with flightAvailability>  and flightId ='xxx'; If both tables are hash partitioned the join logic will need execution on all nodes where  flightavailability  data is stored Distributed joins are expensive and inhibit scaling joins across distributed nodes could involve distributed locks and potentially a lot of intermediate data transfer across nodes EquiJOIN of rows across multiple nodes is not supported in SQLFire 1.0
Partition aware DB design Designer thinks about how data maps to partitions The main idea is to: minimize excessive data distribution by keeping the most frequently accessed and joined data collocated on partitions Read Pat Helland ’s “Life beyond Distributed Transactions” and the Google MegaStore paper
Partition aware DB design Turns out OLTP systems lend themselves well to this need Typically it is the number of entities that grows over time and not the size of the entity.  Customer count perpetually grows, not the size of the customer info Most often access is very restricted and based on select entities given a FlightID, fetch flightAvailability records given a customerID, add/remove orders, shipment records Identify partition key for  “Entity Group” &quot;entity groups&quot;: set of entities across several related tables that can all share a single identifier flightID is shared between the parent and child tables CustomerID shared between customer, order and shipment tables
Partition aware DB design Entity Groups Table  FlightAvailability partitioned by FlightID colocated with Flights FlightID is the  entity group Key
Partition Aware DB design STAR schema design is the norm in OLTP design Fact tables (fast changing) are natural partitioning candidates Partition by: FlightID … Availability, history rows colocated with Flights Dimension tables are natural replicated table candidates Replicate Airlines, Countries, Cities on all nodes Dealing with Joins involving M-M relationships Can the one side of the M-M become a replicated table? If not, run the Join logic in a parallel stored procedure to minimize distribution Else, split the query into multiple queries in application
Scaling Application logic with Parallel  “Data Aware procedures”
Procedures Java Stored Procedures may be created according to the SQL Standard SQLFabric also supports the JDBC type Types.JAVA_OBJECT. A parameter of type JAVA_OBJECT supports an arbitrary Serializable Java object.  In this case, the procedure will be executed on the server to which a client is connected (or locally for Peer Clients) CREATE PROCEDURE getOverBookedFlights  (IN argument OBJECT, OUT result OBJECT)   LANGUAGE JAVA PARAMETER STYLE JAVA  READS SQL DATA DYNAMIC RESULT SETS 1  EXTERNAL NAME com.acme.OverBookedFLights;
Data Aware Procedures Parallelize procedure and prune to nodes with required data Extend the procedure call with the following syntax: Hint the data the procedure depends on CALL getOverBookedFlights( <bind arguments> ON TABLE FLIGHTAVAILABILITY  WHERE FLIGHTID = <SomeFLIGHTID> ; If table is partitioned by columns in the where clause the procedure execution is pruned to nodes with the data (node with <someFLIGHTID> in this case) CALL [PROCEDURE] procedure_name ( [ expression [, expression ]* ] ) [ WITH RESULT PROCESSOR processor_name ] [ { ON TABLE table_name [ WHERE whereClause ] }  | { ON {ALL | SERVER GROUPS  (server_group_name [, server_group_name ]*) }} ] Fabric Server 2 Fabric Server 1 Client
Parallelize procedure then aggregate (reduce) Fabric Server 2 Fabric Server 1 Client Fabric Server 3 CALL SQLF.CreateResultProcessor( processor_name, processor_class_name); register a Java Result Processor  (optional in some cases) : CALL [PROCEDURE] procedure_name ( [ expression [, expression ]* ] ) [ WITH RESULT PROCESSOR processor_name  ] [ { ON TABLE table_name [ WHERE whereClause ] }  | { ON {ALL | SERVER GROUPS  (server_group_name [, server_group_name ]*) }} ]
Consistency model
Consistency Model without Transactions Replication within cluster is always eager and synchronous Row updates are always atomic; No need to use transactions FIFO consistency: writes performed by a single thread are seen by all other processes in the order in which they were issued Consistency in Partitioned tables a partitioned table row owned by one member at a point in time all updates are serialized to replicas through owner &quot;Total ordering&quot; at a row level: atomic and isolated Membership changes and consistency – need another hour   Pessimistic concurrency support using  ‘Select for update’ Support for referential integrity
Distributed Transactions Full support for distributed transactions (Single phase commit) Highly scalable without any centralized coordinator or lock manager We make some important assumptions Most OLTP transactions are small in duration and size W-W conflicts are very rare in practice How does it work? Each data node has a sub-coordinator to track TX state Eagerly acquire local  “write” locks on each replica Object owned by a single primary at a point in time Fail fast if lock cannot be obtained Atomic and works with the cluster Failure detection system Isolated until commit Only support local isolation during commit
Scaling disk access with shared nothing disk files and a  “journaling” store design
Disk persistence in SQLF Parallel log structured storage Each partition writes in parallel Backups write to disk also Increase reliability against h/w loss Don ’t seek to disk Don ’t flush all the way to disk Use OS scheduler to time write Do this on primary + secondary Realize very high throughput
Performance benchmark
How does it perform? Scale? Scale from 2 to 10 servers (one per host) Scale from 200 to 1200 simulated clients (10 hosts) Single partitioned table: int PK, 40 fields (20 ints, 20 strings)
How does it perform? Scale? CPU% remained low per server – about 30% indicating many more clients could be handled
Is latency low with scale? Latency decreases with server capacity 50-70% take < 1 millisecond About 90% take less than 2 milliseconds
Q & A VMWare vFabric SQLFire BETA available now Checkout  http://communities.vmware.com/community/vmtn/appplatform/vfabric_sqlfire
Built using GemFire object data fabric + Derby

vFabric SQLFire for high performance data

  • 1.
    Managing High PerformanceData with vFabric SQLFire
  • 2.
    Agenda Various NoSQLattributes and why SQL SQLFire features + Demo Scalability patterns Hash partitioning Entity groups and collocation Scaling behavior using “data aware stored procedures” Consistency model How we do distributed transactions Shared nothing persistence
  • 3.
    We Challenge thetraditional RDBMS design NOT SQL Too much I/O Design roots don ’t necessarily apply today Too much focus on ACID Disk synchronization bottlenecks Confidential First write to LOG Second write to Data files Buffers primarily tuned for IO
  • 4.
    Confidential Common themesin next-gen DB architectures NoSQL, Data Grids, Data Fabrics, NewSQL “ Shared nothing” commodity clusters focus shifts to memory, distributing data and clustering Scale by partitioning the data and move behavior to data nodes HA within cluster and across data centers Add capacity to scale dynamically
  • 5.
    What is different? Several data models Key-value Column family (inspired by Google BigTable) Document Graph Most focus on making model less rigid than SQL Consistency model is not ACID Low scale High scale Very high scale STRICT – Full ACID (RDB) Tunable Consistency Eventual
  • 6.
    What is ourtake with SQLFire?
  • 7.
    So, what isvFabric SQLFire? Distributed, Main memory oriented SQL Data management platform NoSQL characteristics of scalability, performance, availability but retains support for distributed transactions, SQL querying It is also designed so you can use it as a operational layer in front of your legacy databases through a caching framework
  • 8.
    Show me apicture
  • 9.
    Comparing with NoSQL,Object Data Grids The Good Little vendor specific Use SQL for Application DML, Queries Vendor specific stuff in DDL Better Query engine Cost based optimizer, skip-list indexing, parallel queries No deserialization headaches Maintain referential integrity Easier to integrate with existing relational DBs and other products Plug-n-play is a myth
  • 10.
    Comparing with NoSQL,Object Data Grids Not So Good Not as efficient for simple key access You can only manage scalar types Nested graphs is painful Complex data relationships that could be represented as a single object and fetched using a key now may require a join Join processing is computationally expensive OR mapping can add latency
  • 11.
    Features in 1.0Partitioning and Replication Multiple Topologies Peer-2-peer, client-server, WAN Events framework Listeners, triggers, Asynchronous write behind Queries Distributed, optimized for main memory Procedures and Functions Standard Java stored proc and parallel “data aware” Caching Loader, writers, Eviction, Overflow and Expiration Command line tool Manageability, Security
  • 12.
    Flexible Deployment TopologiesConfidential Java Application cluster can host an embedded clustered database by just changing the URL jdbc:sqlfire:;mcast-port=33666;host-data=true
  • 13.
  • 14.
  • 15.
    Explore features throughexample Assume, thousands of flight rows, millions of flightavailability records
  • 16.
    Creating Tables TableCREATE TABLE AIRLINES ( AIRLINE CHAR(2) NOT NULL PRIMARY KEY, AIRLINE_FULL VARCHAR(24), BASIC_RATE DOUBLE PRECISION, DISTANCE_DISCOUNT DOUBLE PRECISION,…. ); SQLF SQLF SQLF
  • 17.
    Replicated Tables CREATETABLE AIRLINES ( AIRLINE CHAR(2) NOT NULL PRIMARY KEY, AIRLINE_FULL VARCHAR(24), BASIC_RATE DOUBLE PRECISION, DISTANCE_DISCOUNT DOUBLE PRECISION,…. ) REPLICATE ; Replicated Table Replicated Table Replicated Table SQLF SQLF SQLF
  • 18.
    Partitioned Tables CREATETABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), DEST_AIRPORT CHAR(3) DEPART_TIME TIME, FLIGHT_MILES INTEGER NOT NULL ) PARTITION BY COLUMN( FLIGHT_ID ); Table Partitioned Table Partitioned Table Partitioned Table Replicated Table Replicated Table Replicated Table SQLF SQLF SQLF
  • 19.
    Partition Redundancy CREATETABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), DEST_AIRPORT CHAR(3) DEPART_TIME TIME, FLIGHT_MILES INTEGER NOT NULL) PARTITION BY COLUMN (FLIGHT_ID) REDUNDANCY 1 ; Table Partitioned Table Redundant Partition Partitioned Table Redundant Partition Partitioned Table Redundant Partition Replicated Table Replicated Table Replicated Table SQLF SQLF SQLF
  • 20.
    Partition Colocation CREATETABLE FLIGHTAVAILABILITY ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , FLIGHT_DATE DATE NOT NULL , ECONOMY_SEATS_TAKEN INTEGER DEFAULT 0, …) PARTITION BY COLUMN (FLIGHT_ID) COLOCATE WITH (FLIGHTS) ; Table Partitioned Table Redundant Partition Partitioned Table Redundant Partition Partitioned Table Redundant Partition Replicated Table Replicated Table Replicated Table SQLF SQLF SQLF Colocated Partition Colocated Partition Colocated Partition Redundant Partition Redundant Partition Redundant Partition
  • 21.
    Persistent Tables CREATETABLE FLIGHTAVAILABILITY ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , FLIGHT_DATE DATE NOT NULL , ECONOMY_SEATS_TAKEN INTEGER DEFAULT 0, …) PARTITION BY COLUMN (FLIGHT_ID) COLOCATE WITH (FLIGHTS) PERSISTENT persistentStore ASYNCHRONOUS ; Table Partitioned Table Redundant Partition Partitioned Table Redundant Partition Partitioned Table Redundant Partition Replicated Table Replicated Table Replicated Table SQLF SQLF SQLF Colocated Partition Colocated Partition Colocated Partition Redundant Partition Redundant Partition Redundant Partition sqlf backup /export/fileServerDirectory/sqlfireBackupLocation Data dictionary is always persisted in each server
  • 22.
    Demo defaultpartitioned tables, colocation, persistent tables
  • 23.
    Scaling data withPartitioned tables
  • 24.
    Hash partitioning forlinear scaling Key Hashing provides single hop access to its partition But, what if the access is not based on the key … say, joins are involved
  • 25.
    Hash partitioning onlygoes so far Consider this query : Select * from flights, flightAvailability where <equijoin flights with flightAvailability> and flightId ='xxx'; If both tables are hash partitioned the join logic will need execution on all nodes where flightavailability data is stored Distributed joins are expensive and inhibit scaling joins across distributed nodes could involve distributed locks and potentially a lot of intermediate data transfer across nodes EquiJOIN of rows across multiple nodes is not supported in SQLFire 1.0
  • 26.
    Partition aware DBdesign Designer thinks about how data maps to partitions The main idea is to: minimize excessive data distribution by keeping the most frequently accessed and joined data collocated on partitions Read Pat Helland ’s “Life beyond Distributed Transactions” and the Google MegaStore paper
  • 27.
    Partition aware DBdesign Turns out OLTP systems lend themselves well to this need Typically it is the number of entities that grows over time and not the size of the entity. Customer count perpetually grows, not the size of the customer info Most often access is very restricted and based on select entities given a FlightID, fetch flightAvailability records given a customerID, add/remove orders, shipment records Identify partition key for “Entity Group” &quot;entity groups&quot;: set of entities across several related tables that can all share a single identifier flightID is shared between the parent and child tables CustomerID shared between customer, order and shipment tables
  • 28.
    Partition aware DBdesign Entity Groups Table FlightAvailability partitioned by FlightID colocated with Flights FlightID is the entity group Key
  • 29.
    Partition Aware DBdesign STAR schema design is the norm in OLTP design Fact tables (fast changing) are natural partitioning candidates Partition by: FlightID … Availability, history rows colocated with Flights Dimension tables are natural replicated table candidates Replicate Airlines, Countries, Cities on all nodes Dealing with Joins involving M-M relationships Can the one side of the M-M become a replicated table? If not, run the Join logic in a parallel stored procedure to minimize distribution Else, split the query into multiple queries in application
  • 30.
    Scaling Application logicwith Parallel “Data Aware procedures”
  • 31.
    Procedures Java StoredProcedures may be created according to the SQL Standard SQLFabric also supports the JDBC type Types.JAVA_OBJECT. A parameter of type JAVA_OBJECT supports an arbitrary Serializable Java object. In this case, the procedure will be executed on the server to which a client is connected (or locally for Peer Clients) CREATE PROCEDURE getOverBookedFlights (IN argument OBJECT, OUT result OBJECT) LANGUAGE JAVA PARAMETER STYLE JAVA READS SQL DATA DYNAMIC RESULT SETS 1 EXTERNAL NAME com.acme.OverBookedFLights;
  • 32.
    Data Aware ProceduresParallelize procedure and prune to nodes with required data Extend the procedure call with the following syntax: Hint the data the procedure depends on CALL getOverBookedFlights( <bind arguments> ON TABLE FLIGHTAVAILABILITY WHERE FLIGHTID = <SomeFLIGHTID> ; If table is partitioned by columns in the where clause the procedure execution is pruned to nodes with the data (node with <someFLIGHTID> in this case) CALL [PROCEDURE] procedure_name ( [ expression [, expression ]* ] ) [ WITH RESULT PROCESSOR processor_name ] [ { ON TABLE table_name [ WHERE whereClause ] } | { ON {ALL | SERVER GROUPS (server_group_name [, server_group_name ]*) }} ] Fabric Server 2 Fabric Server 1 Client
  • 33.
    Parallelize procedure thenaggregate (reduce) Fabric Server 2 Fabric Server 1 Client Fabric Server 3 CALL SQLF.CreateResultProcessor( processor_name, processor_class_name); register a Java Result Processor (optional in some cases) : CALL [PROCEDURE] procedure_name ( [ expression [, expression ]* ] ) [ WITH RESULT PROCESSOR processor_name ] [ { ON TABLE table_name [ WHERE whereClause ] } | { ON {ALL | SERVER GROUPS (server_group_name [, server_group_name ]*) }} ]
  • 34.
  • 35.
    Consistency Model withoutTransactions Replication within cluster is always eager and synchronous Row updates are always atomic; No need to use transactions FIFO consistency: writes performed by a single thread are seen by all other processes in the order in which they were issued Consistency in Partitioned tables a partitioned table row owned by one member at a point in time all updates are serialized to replicas through owner &quot;Total ordering&quot; at a row level: atomic and isolated Membership changes and consistency – need another hour  Pessimistic concurrency support using ‘Select for update’ Support for referential integrity
  • 36.
    Distributed Transactions Fullsupport for distributed transactions (Single phase commit) Highly scalable without any centralized coordinator or lock manager We make some important assumptions Most OLTP transactions are small in duration and size W-W conflicts are very rare in practice How does it work? Each data node has a sub-coordinator to track TX state Eagerly acquire local “write” locks on each replica Object owned by a single primary at a point in time Fail fast if lock cannot be obtained Atomic and works with the cluster Failure detection system Isolated until commit Only support local isolation during commit
  • 37.
    Scaling disk accesswith shared nothing disk files and a “journaling” store design
  • 38.
    Disk persistence inSQLF Parallel log structured storage Each partition writes in parallel Backups write to disk also Increase reliability against h/w loss Don ’t seek to disk Don ’t flush all the way to disk Use OS scheduler to time write Do this on primary + secondary Realize very high throughput
  • 39.
  • 40.
    How does itperform? Scale? Scale from 2 to 10 servers (one per host) Scale from 200 to 1200 simulated clients (10 hosts) Single partitioned table: int PK, 40 fields (20 ints, 20 strings)
  • 41.
    How does itperform? Scale? CPU% remained low per server – about 30% indicating many more clients could be handled
  • 42.
    Is latency lowwith scale? Latency decreases with server capacity 50-70% take < 1 millisecond About 90% take less than 2 milliseconds
  • 43.
    Q & AVMWare vFabric SQLFire BETA available now Checkout http://communities.vmware.com/community/vmtn/appplatform/vfabric_sqlfire
  • 44.
    Built using GemFireobject data fabric + Derby

Editor's Notes

  • #4 Original design rooted in good principles of data independence, durability and consistency of data Designers naturally focused on disk IO performance and maintaining strict data consistency through many locking/latching techniques Buffer management is all about using memory for caching contiguous disk blocks
  • #5 Driven by the desire to scale, be HA and offer lower latencies ... clusters of commodity servers .... ... focus shifts to distributing data and clustering ... shared nothing including the disk ... avoid disk seeks as much as possible .. ... memory is cheap and reliable ... Pool memory across cluster and use highly optimized concurrent data structures ... partitioning with consistent hashing ... dynamically increase cluster capacity ... Move and parallelize behavior to data (MR) ... High availability within cluster and across data centers
  • #29 Entity groups defined in SQLFire using “colocation” clause Entity group guaranteed to be collocated in presence of failures or rebalance Now, complex queries can be executed without requiring excessive distributed data access