Your SlideShare is downloading. ×
0
SQL, NoSQL, NewSQL?Whats a developer to do?   Chris Richardson   Author of POJOs in Action   Founder of CloudFoundry.com  ...
Overall presentation goal The joy and pain of    building Java applications that use NoSQL and NewSQL       2
About Chris        3
(About Chris)        4
About Chris()        5
About Chris        6
About Chris        http://www.theregister.co.uk/2009/08/19/springsource_cloud_foundry/        7
About Chris   Developer Advocate for     CloudFoundry.com Signup at CloudFoundry.com   using promo code JFokus            ...
Agendao  Why NoSQL? NewSQL?o  Persisting entitieso  Implementing queries     2/14/12   Copyright (c) 2011 Chris Richard...
Food to Goo  Take-out food delivery    serviceo  “Launched” in 2006o  Used a relational    database (naturally)        ...
Success è Growth challengeso  Increasing traffico  Increasing data volumeo  Distribute across a few data centerso  In...
Limitations of relational databaseso  Scalingo  Distributiono  Updating schemao  O/R impedance mismatcho  Handling se...
Solution: Spend Money                                                                          o    Buy SSD and RAM      ...
Solution: Use NoSQL Benefits                 Higher               Limited                 performance          transaction...
MongoDBo  Document-oriented database  n  JSON-style documents: Lists, Maps, primitives  n  Schema-lesso  Transaction =...
Apache Cassandrao  Column-oriented database/Extensible row store   n    Think Row ~= java.util.SortedMapo  Transaction ...
Other NoSQL databasesType                                   ExamplesExtensible columns/Column-             Hbaseoriented  ...
Solution: Use NewSQLo  Relational databases with SQL and    ACID transactions                                      ANDo ...
Stonebraker’s motivations                   “…Current databases are designed for                           1970s hardware ...
About VoltDBo  Open-sourceo  In-memory relational databaseo  Durability thru replication; snapshots    and loggingo  T...
The future is polyglot persistence                                                                           e.g. Netflix ...
Spring Data is here to help                                  For                    NoSQL databaseshttp://www.springsource...
Agendao  Why NoSQL? NewSQL?o  Persisting entitieso  Implementing queries      2/14/12   Copyright (c) 2011 Chris Richar...
Food to Go – Place Order use case1.  Customer enters delivery address and    delivery time2.  System displays available re...
Food to Go – Domain model (partial)class Restaurant {               class TimeRange {  long id;                         lo...
Database schemaID                   Name                  …                                                         RESTAU...
How to implement the repository?      public interface AvailableRestaurantRepository {          void add(Restaurant restau...
MongoDB: persisting restaurants is easy                                                        Server                     ...
Spring Data for Mongo code@Repositorypublic class AvailableRestaurantRepositoryMongoDbImpl          implements AvailableRe...
Spring Configuration@Configurationpublic class MongoConfig extends AbstractDatabaseConfig { @Value("#{mongoDbProperties.da...
Cassandra data model           Column           Column   Row      Name             Value   Key                            ...
Cassandra– inserting/updating data                                                                   Column Family        ...
Cassandra– retrieving data                                                              Column Family    K1   N1   V1   TS...
Option #1: Use a column per attribute       Column Name = path/expression to access property value                        ...
Option #2: Use a single column      Column value = serialized object graph, e.g. JSON                                     ...
Cassandra codepublic class AvailableRestaurantRepositoryCassandraKeyImpl          implements AvailableRestaurantRepository...
Using VoltDBo  Use the original schemao  Standard SQL statements                   BUT YOU MUSTo  Write stored procedur...
About VoltDB stored procedureso  Key part of VoltDBo  Replication = executing stored    procedure on replicao  Logging ...
About partitioning Partition column                                                     RESTAURANT table ID               ...
Example cluster Partition 1a                            Partition 2a                                       Partition 3a   ...
Single partition procedure: FAST   SELECT * FROM RESTAURANT WHERE ID = 1 High-performance lock free code  ID   Name     … ...
Multi-partition procedure: SLOWER   SELECT * FROM RESTAURANT WHERE NAME = ‘Ajanta’                    Communication/Coordi...
Chosen partitioning scheme<partitions>   <partition   table="restaurant" column="id"/>   <partition   table="service_area"...
Stored procedure – AddRestaurant@ProcInfo( singlePartition = true, partitionInfo = "Restaurant.id: 0”)public class AddRest...
VoltDb repository – add()@Repositorypublic class AvailableRestaurantRepositoryVoltdbImpl           implements AvailableRes...
VoltDbTemplate wrapper classpublic class VoltDbTemplate {   private Client client;         VoltDB client API   public Volt...
VoltDb server configuration<?xml version="1.0"?>                      <deployment><project>  <info>                       ...
Agendao  Why NoSQL? NewSQL?o  Persisting entitieso  Implementing queries     2/14/12   Copyright (c) 2011 Chris Richard...
Finding available restaurantsAvailable restaurants =      Serve the zip code of the delivery addressAND      Are open at t...
Finding available restaurants on Monday,6.15pm for 94619 zipselect r.*             Straightforwardfrom restaurant r      t...
MongoDB = easy to query{    serviceArea:"94619",                        Find a    openingHours: {      $elemMatch : {     ...
MongoTemplate-based code@Repositorypublic class AvailableRestaurantRepositoryMongoDbImpl                               imp...
BUT how to do this with Cassandra??!o  How can Cassandra support a query that has                           ?    n  A 3-...
Simplification #1: DenormalizationRestaurant_id   Day_of_week   Open_time   Close_time   Zip_code1               Monday   ...
Simplification #2: Application filteringSELECT restaurant_id, open_time FROM time_range_zip_code WHERE day_of_week = ‘Mond...
Simplification #3: Eliminate multiple =’s withconcatenation  Restaurant_id    Zip_dow        Open_time   Close_time  1    ...
Column family with composite column   names as an index      Restaurant_id     Zip_dow              Open_time            C...
Querying with a slice                                                         Column Family: AvailableRestaurants         ...
Needs a few pages of code         private void insertAvailability(Restaurant restaurant) {                for (String zipC...
What did I just do to query the data?       60
Mongo vs. Cassandra            DC1                                                            DC2               Shard A Ma...
VoltDB - attempt #1@ProcInfo( singlePartition = false)public class FindAvailableRestaurants extends VoltProcedure { ... }E...
VoltDB - attempt #2@ProcInfo( singlePartition = true, partitionInfo = "Restaurant.id: 0”)public class AddRestaurant extend...
VoltDB - attempt #3<partitions>   ...   <partition table="available_time_range" column="zip_code"/></partitions>@ProcInfo(...
Summary…o  Relational databases are great BUT there    are limitationso  Each NoSQL database solves some    problems BUT...
… Summaryo  Very carefully pick the NewSQL/    NoSQL DB for your applicationo  Consider a polyglot persistence    archit...
Thank you! Signup at CloudFoundry.com   using promo code JFokus           My contact info:           chris.richardson@spri...
Upcoming SlideShare
Loading in...5
×

SQL, NoSQL, NewSQL? What's a developer to do?

1,681

Published on

The database world is undergoing a major upheaval. NoSQL databases such as MongoDB and Cassandra are emerging as a compelling choice for many applications. They can simplify the persistence of complex data models and offering significantly better scalability and performance. But these databases have a very different and unfamiliar data model and APIs as well as a limited transaction model. Moreover, the relational world is fighting back with so-called NewSQL databases such as VoltDB, which by using a radically different architecture offers high scalability and performance as well as the familiar relational model and ACID transactions. Sounds great but unlike the traditional relational database you can't use JDBC and must partition your data.

In this presentation you will learn about popular NoSQL databases - MongoDB, and Cassandra - as well at VoltDB. We will compare and contrast each database's data model and Java API using NoSQL and NewSQL versions of a use case from the book POJOs in Action. We will learn about the benefits and drawbacks of using NoSQL and NewSQL databases.

Published in: Technology

Transcript of "SQL, NoSQL, NewSQL? What's a developer to do?"

  1. 1. SQL, NoSQL, NewSQL?Whats a developer to do? Chris Richardson Author of POJOs in Action Founder of CloudFoundry.com chris.richardson@springsource.com @crichardson
  2. 2. Overall presentation goal The joy and pain of building Java applications that use NoSQL and NewSQL 2
  3. 3. About Chris 3
  4. 4. (About Chris) 4
  5. 5. About Chris() 5
  6. 6. About Chris 6
  7. 7. About Chris http://www.theregister.co.uk/2009/08/19/springsource_cloud_foundry/ 7
  8. 8. About Chris Developer Advocate for CloudFoundry.com Signup at CloudFoundry.com using promo code JFokus 8
  9. 9. Agendao  Why NoSQL? NewSQL?o  Persisting entitieso  Implementing queries 2/14/12 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 9
  10. 10. Food to Goo  Take-out food delivery serviceo  “Launched” in 2006o  Used a relational database (naturally) 10
  11. 11. Success è Growth challengeso  Increasing traffico  Increasing data volumeo  Distribute across a few data centerso  Increasing domain model complexity
  12. 12. Limitations of relational databaseso  Scalingo  Distributiono  Updating schemao  O/R impedance mismatcho  Handling semi-structured data 12
  13. 13. Solution: Spend Money o  Buy SSD and RAM o  Buy Oracle o  Buy high-end servers o  … ORhttp://upload.wikimedia.org/wikipedia/commons/e/e5/Rising_Sun_Yacht.JPGo  Hire more DevOpso  Use application-level shardingo  Build your own middlewareo  … http://www.trekbikes.com/us/en/bikes/road/race_performance/madone_5_series/madone_5_2/# 13
  14. 14. Solution: Use NoSQL Benefits Higher Limited performance transactions Higher scalability Relaxed Richer data- consistency model Unconstrained Schema-less data Drawbacks 14
  15. 15. MongoDBo  Document-oriented database n  JSON-style documents: Lists, Maps, primitives n  Schema-lesso  Transaction = update of a single documento  Rich query language for dynamic querieso  Tunable writes: speed ó reliabilityo  Highly scalable and availableo  Use cases n  High volume writes n  Complex data n  Semi-structured data 15
  16. 16. Apache Cassandrao  Column-oriented database/Extensible row store n  Think Row ~= java.util.SortedMapo  Transaction = update of a rowo  Fast writes = append to a logo  Tunable reads/writes: consistency ó latency/ availabilityo  Extremely scalable n  Transparent and dynamic clustering n  Rack and datacenter aware data replicationo  CQL = “SQL”-like DDL and DMLo  Use cases n  Big data n  Multiple Data Center distributed database n  (Write intensive) Logging n  High-availability (writes) 16
  17. 17. Other NoSQL databasesType ExamplesExtensible columns/Column- Hbaseoriented SimpleDB DynamoDBGraph Neo4jKey-value Redis MembaseDocument CouchDb http://nosql-database.org/ lists 122+ NoSQL databases 17
  18. 18. Solution: Use NewSQLo  Relational databases with SQL and ACID transactions ANDo  New and improved architectureo  Radically better scalability and performanceo  NewSQL vendors: ScaleDB, NimbusDB, …, VoltDB 2/14/12 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 18
  19. 19. Stonebraker’s motivations “…Current databases are designed for 1970s hardware …” Stonebraker: http://www.slideshare.net/VoltDB/sql-myths-webinar Significant overhead in “…logging, latching, locking, B-tree, and buffer management operations…” SIGMOD 08: Though the looking glass: http://dl.acm.org/citation.cfm?id=1376713 19
  20. 20. About VoltDBo  Open-sourceo  In-memory relational databaseo  Durability thru replication; snapshots and loggingo  Transparent partitioningo  Fast and scalable …VoltDB is very scalable; it should scale to 120 partitions, 39 servers, and 1.6 million complex transactions per second at over 300 CPU cores… http://www.mysqlperformanceblog.com/2011/02/28/is-voltdb-really-as-scalable-as-they-claim/ 2/14/12 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 20
  21. 21. The future is polyglot persistence e.g. Netflix •  RDBMS •  SimpleDB •  Cassandra •  Hadoop/Hbase IEEE Software Sept/October 2010 - Debasish Ghosh / Twitter @debasishg 21
  22. 22. Spring Data is here to help For NoSQL databaseshttp://www.springsource.org/spring-data 22
  23. 23. Agendao  Why NoSQL? NewSQL?o  Persisting entitieso  Implementing queries 2/14/12 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 23
  24. 24. Food to Go – Place Order use case1.  Customer enters delivery address and delivery time2.  System displays available restaurants3.  Customer picks restaurant4.  System displays menu5.  Customer selects menu items6.  Customer places order 24
  25. 25. Food to Go – Domain model (partial)class Restaurant { class TimeRange { long id; long id; String name; int dayOfWeek; Set<String> serviceArea; int openingTime; Set<TimeRange> openingHours; int closingTime; List<MenuItem> menuItems; }} class MenuItem { String name; double price; } 25
  26. 26. Database schemaID Name … RESTAURANT1 Ajanta table2 Montclair EggshopRestaurant_id zipcode RESTAURANT_ZIPCODE1 94707 table1 946192 946112 94619 RESTAURANT_TIME_RANGE tableRestaurant_id dayOfWeek openTime closeTime1 Monday 1130 14301 Monday 1730 21302 Tuesday 1130 … 26
  27. 27. How to implement the repository? public interface AvailableRestaurantRepository { void add(Restaurant restaurant); Restaurant findDetailsById(int id); … } Restaurant ó ? TimeRange MenuItemRestaurant aggregate 27
  28. 28. MongoDB: persisting restaurants is easy Server Database: Food To Go Collection: Restaurants { "_id" : ObjectId("4bddc2f49d1505567c6220a0") "name": "Ajanta", "serviceArea": ["94619", "99999"], BSON = "openingHours": [ { binary "dayOfWeek": 1, JSON "open": 1130, "close": 1430 }, { Sequence "dayOfWeek": 2, "open": 1130, of bytes on "close": 1430 disk è fast }, … ] i/o } 28
  29. 29. Spring Data for Mongo code@Repositorypublic class AvailableRestaurantRepositoryMongoDbImpl implements AvailableRestaurantRepository { public static String AVAILABLE_RESTAURANTS_COLLECTION = "availableRestaurants"; @Autowired private MongoTemplate mongoTemplate; @Override public void add(Restaurant restaurant) { mongoTemplate.insert(restaurant, AVAILABLE_RESTAURANTS_COLLECTION); } @Override public Restaurant findDetailsById(int id) { return mongoTemplate.findOne(new Query(where("_id").is(id)), Restaurant.class, AVAILABLE_RESTAURANTS_COLLECTION); }} 29
  30. 30. Spring Configuration@Configurationpublic class MongoConfig extends AbstractDatabaseConfig { @Value("#{mongoDbProperties.databaseName}") private String mongoDbDatabase;@Beanpublic Mongo mongo() throws UnknownHostException, MongoException { return new Mongo(databaseHostName);} @Bean public MongoTemplate mongoTemplate(Mongo mongo) throws Exception { MongoTemplate mongoTemplate = new MongoTemplate(mongo, mongoDbDatabase); mongoTemplate.setWriteConcern(WriteConcern.SAFE); mongoTemplate.setWriteResultChecking(WriteResultChecking.EXCEPTION); return mongoTemplate; }} 30
  31. 31. Cassandra data model Column Column Row Name Value Key Timestamp Keyspace Column Family K1 N1 V1 TS1 N2 V2 TS2 N3 V3 TS3 K2 N1 V1 TS1 N2 V2 TS2 N3 V3 TS3Column name/value: number, string, Boolean, timestamp, and composite 31
  32. 32. Cassandra– inserting/updating data Column Family K1 N1 V1 TS1 N2 V2 TS2 N3 V3 TS3 …Idempotent= transaction CF.insert(key=K1, (N4, V4, TS4), …) Column Family K1 N1 V1 TS1 N2 V2 TS2 N3 V3 TS3 N4 V4 TS4 … 32
  33. 33. Cassandra– retrieving data Column Family K1 N1 V1 TS1 N2 V2 TS2 N3 V3 TS3 N4 V4 TS4… CF.slice(key=K1, startColumn=N2, endColumn=N4) K1 N2 V2 TS2 N3 V3 TS3 N4 V4 TS4 Cassandra has secondary indexes but they aren’t helpful for these use cases 33
  34. 34. Option #1: Use a column per attribute Column Name = path/expression to access property value Column Family: RestaurantDetails openingHours[0].dayOfWeek Monday name Ajanta serviceArea[0] 946191 openingHours[0].open 1130 type indian serviceArea[1] 94707 openingHours[0].close 1430 Egg openingHours[0].dayOfWeek Monday name serviceArea[0] 94611 shop2 Break openingHours[0].open 0830 type serviceArea[1] 94619 Fast openingHours[0].close 1430
  35. 35. Option #2: Use a single column Column value = serialized object graph, e.g. JSON Column Family: RestaurantDetails 2 attributes: { name: “Montclair Eggshop”, … } 1 attributes { name: “Ajanta”, …} 2 attributes { name: “Eggshop”, …} ✔ 35
  36. 36. Cassandra codepublic class AvailableRestaurantRepositoryCassandraKeyImpl implements AvailableRestaurantRepository {@Autowired Home grownprivate final CassandraTemplate cassandraTemplate; wrapper classpublic void add(Restaurant restaurant) { cassandraTemplate.insertEntity(keyspace, RESTAURANT_DETAILS_CF, restaurant);}public Restaurant findDetailsById(int id) { String key = Integer.toString(id); return cassandraTemplate.findEntity(Restaurant.class, keyspace, key, RESTAURANT_DETAILS_CF); …}… http://en.wikipedia.org/wiki/Hector 36
  37. 37. Using VoltDBo  Use the original schemao  Standard SQL statements BUT YOU MUSTo  Write stored procedures and invoke them using proprietary interfaceo  Partition your data 2/14/12 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 37
  38. 38. About VoltDB stored procedureso  Key part of VoltDBo  Replication = executing stored procedure on replicao  Logging = log stored procedure invocationo  Stored procedure invocation = transaction 2/14/12 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 38
  39. 39. About partitioning Partition column RESTAURANT table ID Name … 1 Ajanta 2 Eggshop … 2/14/12 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 39
  40. 40. Example cluster Partition 1a Partition 2a Partition 3a ID Name … ID Name … ID Name … 1 Ajanta 2 Eggshop … .. … … … Partition 3b Partition 1b Partition 2b ID Name … ID Name … ID Name … … .. 1 Ajanta 2 Eggshop … … …VoltDB Server 1 VoltDB Server 2 VoltDB Server 3 2/14/12 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 40
  41. 41. Single partition procedure: FAST SELECT * FROM RESTAURANT WHERE ID = 1 High-performance lock free code ID Name … ID Name … ID Name … 1 Ajanta 1 Eggshop … .. … … … … … …VoltDB Server 1 VoltDB Server 2 VoltDB Server 3 2/14/12 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 41
  42. 42. Multi-partition procedure: SLOWER SELECT * FROM RESTAURANT WHERE NAME = ‘Ajanta’ Communication/Coordination overhead ID Name … ID Name … ID Name … 1 Ajanta 1 Eggshop … .. … … … … … …VoltDB Server 1 VoltDB Server 2 VoltDB Server 3 2/14/12 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 42
  43. 43. Chosen partitioning scheme<partitions> <partition table="restaurant" column="id"/> <partition table="service_area" column="restaurant_id"/> <partition table="menu_item" column="restaurant_id"/> <partition table="time_range" column="restaurant_id"/> <partition table="available_time_range" column="restaurant_id"/></partitions> Performance is excellent: much faster than MySQL 43
  44. 44. Stored procedure – AddRestaurant@ProcInfo( singlePartition = true, partitionInfo = "Restaurant.id: 0”)public class AddRestaurant extends VoltProcedure { public final SQLStmt insertRestaurant = new SQLStmt("INSERT INTO Restaurant VALUES (?,?);"); public final SQLStmt insertServiceArea = new SQLStmt("INSERT INTO service_area VALUES (?,?);"); public final SQLStmt insertOpeningTimes = new SQLStmt("INSERT INTO time_range VALUES (?,?,?,?);"); public final SQLStmt insertMenuItem = new SQLStmt("INSERT INTO menu_item VALUES (?,?,?);");public long run(int id, String name, String[] serviceArea, long[] daysOfWeek, long[] openingTimes, long[] closingTimes, String[] names, double[] prices) { voltQueueSQL(insertRestaurant, id, name); for (String zipCode : serviceArea) voltQueueSQL(insertServiceArea, id, zipCode); for (int i = 0; i < daysOfWeek.length ; i++) voltQueueSQL(insertOpeningTimes, id, daysOfWeek[i], openingTimes[i], closingTimes[i]); for (int i = 0; i < names.length ; i++) voltQueueSQL(insertMenuItem, id, names[i], prices[i]); voltExecuteSQL(true); return 0; }} 44
  45. 45. VoltDb repository – add()@Repositorypublic class AvailableRestaurantRepositoryVoltdbImpl implements AvailableRestaurantRepository { @Autowired private VoltDbTemplate voltDbTemplate; @Override public void add(Restaurant restaurant) { invokeRestaurantProcedure("AddRestaurant", restaurant); } private void invokeRestaurantProcedure(String procedureName, Restaurant restaurant) { Object[] serviceArea = restaurant.getServiceArea().toArray(); long[][] openingHours = toArray(restaurant.getOpeningHours()); Flatten Object[][] menuItems = toArray(restaurant.getMenuItems()); Restaurant voltDbTemplate.update(procedureName, restaurant.getId(), restaurant.getName(), serviceArea, openingHours[0], openingHours[1], openingHours[2], menuItems[0], menuItems[1]);} 45
  46. 46. VoltDbTemplate wrapper classpublic class VoltDbTemplate { private Client client; VoltDB client API public VoltDbTemplate(Client client) { this.client = client; } public void update(String procedureName, Object... params) { try { ClientResponse x = client.callProcedure(procedureName, params); … } catch (Exception e) { throw new RuntimeException(e); } } 46
  47. 47. VoltDb server configuration<?xml version="1.0"?> <deployment><project> <info> <cluster hostcount="1" <name>Food To Go</name> sitesperhost="5" kfactor="0" /> ... </info> </deployment> <database> <schemas> <schema path=schema.sql /> </schemas> <partitions> <partition table="restaurant" column="id"/> ... </partitions> <procedures> <procedure class=net.chrisrichardson.foodToGo.newsql.voltdb.procs.AddRestaurant /> ... </procedures> </database></project>voltcompiler target/classes src/main/resources/sql/voltdb-project.xml foodtogo.jarbin/voltdb leader localhost catalog foodtogo.jar deployment deployment.xml 47
  48. 48. Agendao  Why NoSQL? NewSQL?o  Persisting entitieso  Implementing queries 2/14/12 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 48
  49. 49. Finding available restaurantsAvailable restaurants = Serve the zip code of the delivery addressAND Are open at the delivery timepublic interface AvailableRestaurantRepository { List<AvailableRestaurant> findAvailableRestaurants(Address deliveryAddress, Date deliveryTime); …} 49
  50. 50. Finding available restaurants on Monday,6.15pm for 94619 zipselect r.* Straightforwardfrom restaurant r three-way join inner join restaurant_time_range tr on r.id =tr.restaurant_id inner join restaurant_zipcode sa on r.id = sa.restaurant_idWhere ’94619’ = sa.zip_codeand tr.day_of_week=’monday’and tr.openingtime <= 1815and 1815 <= tr.closingtime 50
  51. 51. MongoDB = easy to query{ serviceArea:"94619", Find a openingHours: { $elemMatch : { restaurant "dayOfWeek" : "Monday", "open": {$lte: 1815}, that serves } "close": {$gte: 1815} the 94619 zip} } code and is open atDBCursor cursor = collection.find(qbeObject);while (cursor.hasNext()) { 6.15pm on a DBObject o = cursor.next(); … Monday }db.availableRestaurants.ensureIndex({serviceArea: 1}) 51
  52. 52. MongoTemplate-based code@Repositorypublic class AvailableRestaurantRepositoryMongoDbImpl implements AvailableRestaurantRepository {@Autowired private final MongoTemplate mongoTemplate;@Overridepublic List<AvailableRestaurant> findAvailableRestaurants(Address deliveryAddress, Date deliveryTime) { int timeOfDay = DateTimeUtil.timeOfDay(deliveryTime); int dayOfWeek = DateTimeUtil.dayOfWeek(deliveryTime); Query query = new Query(where("serviceArea").is(deliveryAddress.getZip()) .and("openingHours”).elemMatch(where("dayOfWeek").is(dayOfWeek) .and("openingTime").lte(timeOfDay) .and("closingTime").gte(timeOfDay))); return mongoTemplate.find(AVAILABLE_RESTAURANTS_COLLECTION, query, AvailableRestaurant.class);} mongoTemplate.ensureIndex(“availableRestaurants”, new Index().on("serviceArea", Order.ASCENDING)); 52
  53. 53. BUT how to do this with Cassandra??!o  How can Cassandra support a query that has ? n  A 3-way join n  Multiple = n  > and <è We need to implement an index Queries instead of data model drives NoSQL database design 53
  54. 54. Simplification #1: DenormalizationRestaurant_id Day_of_week Open_time Close_time Zip_code1 Monday 1130 1430 947071 Monday 1130 1430 946191 Monday 1730 2130 947071 Monday 1730 2130 946192 Monday 0700 1430 94619… SELECT restaurant_id FROM time_range_zip_code WHERE day_of_week = ‘Monday’ Simpler query: AND zip_code = 94619 §  No joins §  Two = and two < AND 1815 < close_time AND open_time < 1815 54
  55. 55. Simplification #2: Application filteringSELECT restaurant_id, open_time FROM time_range_zip_code WHERE day_of_week = ‘Monday’ Even simpler query AND zip_code = 94619 •  No joins AND 1815 < close_time •  Two = and one < AND open_time < 1815 55
  56. 56. Simplification #3: Eliminate multiple =’s withconcatenation Restaurant_id Zip_dow Open_time Close_time 1 94707:Monday 1130 1430 1 94619:Monday 1130 1430 1 94707:Monday 1730 2130 1 94619:Monday 1730 2130 2 94619:Monday 0700 1430 … SELECT restaurant_id, open_time FROM time_range_zip_code WHERE zip_code_day_of_week = ‘94619:Monday’ AND 1815 < close_time key range 56
  57. 57. Column family with composite column names as an index Restaurant_id Zip_dow Open_time Close_time 1 94707:Monday 1130 1430 1 94619:Monday 1130 1430 1 94707:Monday 1730 2130 1 94619:Monday 1730 2130 2 94619:Monday 0700 1430 … Column Family: AvailableRestaurants JSON FOR JSON FOR (1430,0700,2) (2130,1730,1)94619:Monday EGG AJANTA JSON FOR (1430,1130,1) AJANTA
  58. 58. Querying with a slice Column Family: AvailableRestaurants JSON FOR JSON FOR (1430,0700,2) (2130,1730,1) EGG AJANTA94619:Monday JSON FOR (1430,1130,1) AJANTA slice(key= 94619:Monday, sliceStart = (1815, *, *), sliceEnd = (2359, *, *)) JSON FOR (2130,1730,1)94619:Monday AJANTA 18:15 is after 17:30 è {Ajanta} 58
  59. 59. Needs a few pages of code private void insertAvailability(Restaurant restaurant) { for (String zipCode : (Set<String>) restaurant.getServiceArea()) {@Override for (TimeRange tr : (Set<TimeRange>) restaurant.getOpeningHours()) { public List<AvailableRestaurant> findAvailableRestaurants(Address deliveryAddress, Date deliveryTime) { String dayOfWeek = format2(tr.getDayOfWeek()); int dayOfWeek = DateTimeUtil.dayOfWeek(deliveryTime); String openingTime = format4(tr.getOpeningTime()); int timeOfDay = DateTimeUtil.timeOfDay(deliveryTime); String closingTime = format4(tr.getClosingTime()); String zipCode = deliveryAddress.getZip(); String key = formatKey(zipCode, format2(dayOfWeek)); String restaurantId = format8(restaurant.getId()); HSlicePredicate<Composite> predicate = new HSlicePredicate<Composite>(new CompositeSerializer()); String key = formatKey(zipCode, dayOfWeek); Composite start = new Composite(); String columnValue = toJson(restaurant); Composite finish = new Composite(); start.addComponent(0, format4(timeOfDay), ComponentEquality.GREATER_THAN_EQUAL); finish.addComponent(0, format4(2359), ComponentEquality.GREATER_THAN_EQUAL); Composite columnName = new Composite(); predicate.setRange(start, finish, false, 100); columnName.add(0, closingTime); final List<AvailableRestaurantIndexEntry> closingAfter = new ArrayList<AvailableRestaurantIndexEntry>(); columnName.add(1, openingTime); columnName.add(2, restaurantId); ColumnFamilyRowMapper<String, Composite, Object> mapper = new ColumnFamilyRowMapper<String, Composite, Object>() { @Override ColumnFamilyUpdater<String, Composite> updater public Object mapRow(ColumnFamilyResult<String, Composite> results) { = compositeCloseTemplate.createUpdater(key); for (Composite columnName : results.getColumnNames()) { String openTime = columnName.get(1, new StringSerializer()); updater.setString(columnName, columnValue); String restaurantId = columnName.get(2, new StringSerializer()); closingAfter.add(new AvailableRestaurantIndexEntry(openTime, restaurantId, results.getString(columnName))); } return null; } }; compositeCloseTemplate.update(updater); } compositeCloseTemplate.queryColumns(key, predicate, mapper); } List<AvailableRestaurant> result = new LinkedList<AvailableRestaurant>(); } for (AvailableRestaurantIndexEntry trIdAndAvailableRestaurant : closingAfter) { if (trIdAndAvailableRestaurant.isOpenBefore(timeOfDay)) result.add(trIdAndAvailableRestaurant.getAvailableRestaurant()); } return result; } 59
  60. 60. What did I just do to query the data? 60
  61. 61. Mongo vs. Cassandra DC1 DC2 Shard A Master Shard B MasterMongoDB Remote DC1 Client DC2 Client DC1 DC2 Async Cassandra Or CassandraCassandra Sync DC1 Client DC2 Client 2/14/12 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 61
  62. 62. VoltDB - attempt #1@ProcInfo( singlePartition = false)public class FindAvailableRestaurants extends VoltProcedure { ... }ERROR 10:12:03,251 [main] COMPILER: Failed to plan for statementtype(findAvailableRestaurants_with_join) select r.* from restaurantr,time_range tr, service_area sa Where ? = sa.zip_code and r.id=tr.restaurant_id and r.id = sa.restaurant_id and tr.day_of_week=?and tr.open_time <= ? and ? <= tr.close_time Error: "Unable to planfor statement. Likely statement is joining two partitioned tables in amulti-partition statement. This is not supported at this time."ERROR 10:12:03,251 [main] COMPILER: Catalog compilation failed. Bummer! 2/14/12 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 62
  63. 63. VoltDB - attempt #2@ProcInfo( singlePartition = true, partitionInfo = "Restaurant.id: 0”)public class AddRestaurant extends VoltProcedure { public final SQLStmt insertAvailable= new SQLStmt("INSERT INTO available_time_range VALUES (?,?,?, ?, ?, ?);"); public long run(....) { ... for (int i = 0; i < daysOfWeek.length ; i++) { voltQueueSQL(insertOpeningTimes, id, daysOfWeek[i], openingTimes[i], closingTimes[i]); for (String zipCode : serviceArea) { voltQueueSQL(insertAvailable, id, daysOfWeek[i], openingTimes[i], closingTimes[i], zipCode, name); } } ... public final SQLStmt findAvailableRestaurants_denorm = new SQLStmt( voltExecuteSQL(true); "select restaurant_id, name from available_time_range tr " + return 0; "where ? = tr.zip_code " + } "and tr.day_of_week=? " +} "and tr.open_time <= ? " + " and ? <= tr.close_time "); Works but queries are only slightly faster than MySQL! 2/14/12 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 63
  64. 64. VoltDB - attempt #3<partitions> ... <partition table="available_time_range" column="zip_code"/></partitions>@ProcInfo( singlePartition = false, ...)public class AddRestaurant extends VoltProcedure { ... }@ProcInfo( singlePartition = true, partitionInfo = "available_time_range.zip_code: 0")public class FindAvailableRestaurants extends VoltProcedure { ... }Queries are really fast but inserts are not LPartitioning scheme – optimal for some usecases but not others 64
  65. 65. Summary…o  Relational databases are great BUT there are limitationso  Each NoSQL database solves some problems BUT n  Limited transactions: NoSQL = NoACID n  One day needing ACID è major rewrite n  Query-driven, denormalized database design n  …o  NewSQL databases such as VoltDB provides SQL, ACID transactions and incredible performance BUT n  Not all operations are fast n  Non-JDBC API 65
  66. 66. … Summaryo  Very carefully pick the NewSQL/ NoSQL DB for your applicationo  Consider a polyglot persistence architectureo  Encapsulate your data access code so you can switcho  Startups = avoid NewSQL/NoSQL for shorter time to market? 2/14/12 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 66
  67. 67. Thank you! Signup at CloudFoundry.com using promo code JFokus My contact info: chris.richardson@springsource.com @crichardson 67
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×