SQL, NoSQL, NewSQL?Whats a developer to do?Chris RichardsonAuthor of POJOs in ActionFounder of the original CloudFoundry.c...
Overall presentation goal The joy and pain of    building Java applications that use NoSQL and NewSQL       2
About Chris        3
(About Chris)        4
About Chris()        5
About Chris        6
About Chris        http://www.theregister.co.uk/2009/08/19/springsource_cloud_foundry/        7
About Chris   Developer Advocate for     CloudFoundry.com                            8
Agenda  Why NoSQL? NewSQL?  Persisting entities  Implementing queries                          Slide 9
Food to Go  Take-out food delivery   service  “Launched” in 2006  Used a relational   database (naturally)         10
Success  Growth challenges  Increasing traffic  Increasing data volume  Distribute across a few data centers  Increas...
Limitations of relational databases  Scaling  Distribution  Updating schema  O/R impedance mismatch  Handling semi-st...
Solution: Spend Money                                                                              Buy SSD and RAM       ...
Solution: Use NoSQL Benefits                 Higher               Limited                 performance          transaction...
MongoDB          Slide 15
MongoDB is a document oriented DB                                                       Server                            ...
MongoDB – writes are fast  Client                                   MongoDB           Insert(collection, documents)       ...
MongoDB is scalable                         Shard 1                             Shard 2          Mongod                   ...
MongoDB use cases  Use cases    High volume writes    Complex data    Semi-structured data  Used by: Shutterfly, Four...
Apache Cassandra                   20
Cassandra is a column-oriented DB           Column           Column   Row                       Value            Name     ...
Cassandra– inserting/updating data                                                                   Column Family        ...
Cassandra– retrieving data                                                              Column Family    K1   N1   V1   TS...
Tunable reads and writes                           Higher availability                           Lower response time      ...
Cassandra is scalable           Application                    Application             Node 1                         Node...
Cassandra use cases  Use cases    Big data    Multiple Data Center distributed     database    (Write intensive) Loggi...
Other NoSQL databasesType                                   ExamplesExtensible columns/Column-             Hbaseoriented  ...
Solution: Use NewSQL  Relational databases with SQL and   ACID transactions                   AND  New and improved arch...
Stonebraker’s motivations                   “…Current databases are designed for                           1970s hardware ...
About VoltDB  Open-source  In-memory relational database  Durability thru replication; snapshots   and logging  Transp...
The future is polyglot persistence                                                                           e.g. Netflix ...
Spring Data is here to help                                  For                     NoSQL databaseshttp://www.springsourc...
Agenda  Why NoSQL? NewSQL?  Persisting entities  Implementing queries                          Slide 33
Food to Go – Place Order use case1.  Customer enters delivery address and    delivery time2.  System displays available re...
Food to Go – Domain model (partial)class Restaurant {               class TimeRange {  long id;                         lo...
Database schemaID                   Name                  …                                                         RESTAU...
Pick your JDBC-based framework        interface AvailableRestaurantRepository {            void add(Restaurant restaurant)...
But with NoSQL?                                 interface AvailableRestaurantRepository {          Restaurant             ...
MongoDB          Slide 39
MongoDB: persisting restaurants is easy      {          "_id" : ObjectId("4bddc2f49d1505567c6220a0")          "name": "Aja...
Spring Data for Mongo code@Repositorypublic class AvailableRestaurantRepositoryMongoDbImpl          implements AvailableRe...
Spring Configuration@Configurationpublic class MongoConfig extends AbstractDatabaseConfig { @Value("#{mongoDbProperties.da...
Apache Cassandra                   43
Option #1: Use a column per attribute       Column Name = path/expression to access property value                        ...
Option #2: Use a single column         Column value = serialized object graph, e.g. JSON                                  ...
Cassandra code: wrapper around Hectorpublic class AvailableRestaurantRepositoryCassandraKeyImpl          implements Availa...
Slide 47
Using VoltDB  Use the original schema  Standard SQL statements             BUT YOU MUST  Write stored procedures and in...
About VoltDB stored procedures  Key part of VoltDB  Replication = executing stored   procedure on replica  Logging = lo...
About partitioning Partition column                                           RESTAURANT table ID                         ...
Example VoltDB cluster    (3 servers) x (2 partitions per server)                       ÷                                 ...
Single partition procedure: FAST     SELECT * FROM RESTAURANT WHERE ID = 1High-performance lock free code                 ...
Multi-partition procedure: SLOWER   SELECT * FROM RESTAURANT WHERE NAME = ‘Ajanta’                    Communication/Coordi...
Chosen partitioning scheme<partitions>   <partition   table="restaurant" column="id"/>   <partition   table="service_area"...
Stored procedure – AddRestaurant@ProcInfo( singlePartition = true, partitionInfo = "Restaurant.id: 0”)public class AddRest...
VoltDb repository – add()@Repositorypublic class AvailableRestaurantRepositoryVoltdbImpl           implements AvailableRes...
VoltDbTemplate wrapper classpublic class VoltDbTemplate {   private Client client;         VoltDB client API   public Volt...
VoltDb server configuration<?xml version="1.0"?>                      <deployment><project>                               ...
Agenda  Why NoSQL? NewSQL?  Persisting entities  Implementing queries                          Slide 59
Finding available restaurantsAvailable restaurants =      Serve the zip code of the delivery addressAND      Are open at t...
Finding available restaurants on Monday,6.15pm for 94619 zipselect r.*             Straightforwardfrom restaurant r      t...
MongoDB          Slide 62
MongoDB = easy to query{    serviceArea:"94619",                        Find a    openingHours: {      $elemMatch : {     ...
MongoTemplate-based code@Repositorypublic class AvailableRestaurantRepositoryMongoDbImpl                               imp...
But how do this withApache Cassandra??!                       65
Slice is equivalent to a simple SELECTkeyVal   colName       colValue             select key, colName, colValueX        Y ...
We need to implement an indexthat can be queried using a slice       Queries instead of data       model drives NoSQL     ...
Simplification #1: DenormalizationRestaurant_id   Day_of_week   Open_time   Close_time   Zip_code1               Monday   ...
Simplification #2: Application filteringSELECT restaurant_id, open_time FROM time_range_zip_code WHERE day_of_week = ‘Mond...
Simplification #3: Eliminate multiple =’s withconcatenation  Restaurant_id   Zip_dow        Open_time   Close_time  1     ...
Column family as an index      Restaurant_id     Zip_dow              Open_time            Close_time      1              ...
Querying with a slice                                                         Column Family: AvailableRestaurants         ...
Needs a few pages of code         private void insertAvailability(Restaurant restaurant) {                for (String zipC...
What did I just do to query the data?  Wrote code to maintain an index  Reduced performance due to extra   writes       ...
But what would you rather implement?      “Complex” query logic               OR     Multi-datacenter, multi-       master...
Slide 76
VoltDB - attempt #0@ProcInfo( singlePartition = false)public class FindAvailableRestaurants extends VoltProcedure { ... }S...
VoltDB - attempt #1@ProcInfo( singlePartition = false)public class FindAvailableRestaurants extends VoltProcedure { ... }c...
VoltDB - attempt #2@ProcInfo( singlePartition = true, partitionInfo = "Restaurant.id: 0”)public class AddRestaurant extend...
VoltDB - attempt #3<partitions>   ...   <partition table="available_time_range" column="zip_code"/></partitions>@ProcInfo(...
VoltDB – key lessonA given partitioning scheme can be good for some use  cases but bad for others                        S...
Summary  Different databases           =  Different tradeoffs                        82
… Summary…         Benefits        DrawbacksRDBMS    • SQL           • Lack of performance and scalability         • ACID ...
… Summary  Very carefully pick the NewSQL/   NoSQL DB for your application  Consider a polyglot persistence   architectu...
Thank you!         My contact info:         chris.richardson@springsource.com             @crichardson         Blog: http:...
Upcoming SlideShare
Loading in...5
×

SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

1,342

Published on

The database world is undergoing a major upheaval. NoSQL databases such as MongoDB and Cassandra are emerging as a compelling choice for many applications. They can simplify the persistence of complex data models and offering significantly better scalability and performance. But these databases have a very different and unfamiliar data model and APIs as well as a limited transaction model. Moreover, the relational world is fighting back with so-called NewSQL databases such as VoltDB, which by using a radically different architecture offers high scalability and performance as well as the familiar relational model and ACID transactions. Sounds great but unlike the traditional relational database you can’t use JDBC and must partition your data.
In this presentation you will learn about popular NoSQL databases – MongoDB, and Cassandra – as well at VoltDB. We will compare and contrast each database’s data model and Java API using NoSQL and NewSQL versions of a use case from the book POJOs in Action. We will learn about the benefits and drawbacks of using NoSQL and NewSQL databases.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,342
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
52
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012

  1. 1. SQL, NoSQL, NewSQL?Whats a developer to do?Chris RichardsonAuthor of POJOs in ActionFounder of the original CloudFoundry.comchris.richardson@springsource.com @crichardsonBlog: http://plainoldobjects.com Copyright (c) 2012 Chris Richardson. All rights reserved.
  2. 2. Overall presentation goal The joy and pain of building Java applications that use NoSQL and NewSQL 2
  3. 3. About Chris 3
  4. 4. (About Chris) 4
  5. 5. About Chris() 5
  6. 6. About Chris 6
  7. 7. About Chris http://www.theregister.co.uk/2009/08/19/springsource_cloud_foundry/ 7
  8. 8. About Chris Developer Advocate for CloudFoundry.com 8
  9. 9. Agenda  Why NoSQL? NewSQL?  Persisting entities  Implementing queries Slide 9
  10. 10. Food to Go  Take-out food delivery service  “Launched” in 2006  Used a relational database (naturally) 10
  11. 11. Success  Growth challenges  Increasing traffic  Increasing data volume  Distribute across a few data centers  Increasing domain model complexity
  12. 12. Limitations of relational databases  Scaling  Distribution  Updating schema  O/R impedance mismatch  Handling semi-structured data 12
  13. 13. Solution: Spend Money   Buy SSD and RAM   Buy Oracle   Buy high-end servers   … ORhttp://upload.wikimedia.org/wikipedia/commons/e/e5/Rising_Sun_Yacht.JPG  Hire more DevOps  Use application-level sharding  Build your own middleware  … http://www.trekbikes.com/us/en/bikes/road/race_performance/madone_5_series/madone_5_2/# 13
  14. 14. Solution: Use NoSQL Benefits Higher Limited performance transactions Higher scalability Relaxed Richer data- consistency model Unconstrained data Drawbacks Schema-less 14
  15. 15. MongoDB Slide 15
  16. 16. MongoDB is a document oriented DB Server Database: Food To Go Collection: Restaurants { "_id" : ObjectId("4bddc2f49d1505567c6220a0") "name": "Ajanta", "serviceArea": ["94619", "99999"], BSON = "openingHours": [ { binary "dayOfWeek": 1, JSON "open": 1130, "close": 1430 }, { Sequence "dayOfWeek": 2, "open": 1130, of bytes on "close": 1430 disk  fast }, … ] i/o } 16
  17. 17. MongoDB – writes are fast Client MongoDB Insert(collection, documents) Insert(collection, documents) Async Insert(collection, documents) getLastError() Blocks until writes succeed Slide 17
  18. 18. MongoDB is scalable Shard 1 Shard 2 Mongod Mongod (replica) (replica) Mongod Mongod (master) Mongod (master) Mongod (replica) (replica)ConfigServermongod A shard consists of a mongos replica set = generalization of master slavemongodmongod Collections spread over multiple client shards 4/11/12 Slide 18
  19. 19. MongoDB use cases  Use cases   High volume writes   Complex data   Semi-structured data  Used by: Shutterfly, Foursquare, Bit.ly, … Slide 19
  20. 20. Apache Cassandra 20
  21. 21. Cassandra is a column-oriented DB Column Column Row Value Name Timestamp Key Keyspace Column Family K1 N1 V1 TS1 N2 V2 TS2 N3 V3 TS3 K2 N1 V1 TS1 N2 V2 TS2 N3 V3 TS3Column name/value: number, string, Boolean, timestamp, and composite 21
  22. 22. Cassandra– inserting/updating data Column Family K1 N1 V1 TS1 N2 V2 TS2 N3 V3 TS3 …Idempotent= transaction CF.insert(key=K1, (N4, V4, TS4), …) Column Family K1 N1 V1 TS1 N2 V2 TS2 N3 V3 TS3 N4 V4 TS4 … 22
  23. 23. Cassandra– retrieving data Column Family K1 N1 V1 TS1 N2 V2 TS2 N3 V3 TS3 N4 V4 TS4… CF.slice(key=K1, startColumn=N2, endColumn=N4) K1 N2 V2 TS2 N3 V3 TS3 N4 V4 TS4 23
  24. 24. Tunable reads and writes Higher availability Lower response time Less consistency  Any node (for writes)  One replica  Quorum of replicas  Local quorum  Each quorum  All replicas Lower availability Higher response time More consistency Slide 24
  25. 25. Cassandra is scalable Application Application Node 1 Node 1 Node 4 Node 2 Node 4 Node 2 Node 3 Node 3 Cassandra cluster Cassandra clusterDatacenter 1 Datacenter 2 Slide 25
  26. 26. Cassandra use cases  Use cases   Big data   Multiple Data Center distributed database   (Write intensive) Logging   High-availability (writes)  Used by: Netflix, Facebook, Digg, etc. Slide 26
  27. 27. Other NoSQL databasesType ExamplesExtensible columns/Column- Hbaseoriented SimpleDB DynamoDBGraph Neo4jKey-value Redis MembaseDocument CouchDb http://nosql-database.org/ lists 122+ NoSQL databases 27
  28. 28. Solution: Use NewSQL  Relational databases with SQL and ACID transactions AND  New and improved architecture  Radically better scalability and performance  NewSQL vendors: ScaleDB, NimbusDB, …, VoltDB Slide 28
  29. 29. Stonebraker’s motivations “…Current databases are designed for 1970s hardware …” Stonebraker: http://www.slideshare.net/VoltDB/sql-myths-webinar Significant overhead in “…logging, latching, locking, B-tree, and buffer management operations…” SIGMOD 08: Though the looking glass: http://dl.acm.org/citation.cfm?id=1376713 29
  30. 30. About VoltDB  Open-source  In-memory relational database  Durability thru replication; snapshots and logging  Transparent partitioning  Fast and scalable …VoltDB is very scalable; it should scale to 120 partitions, 39 servers, and 1.6 million complex transactions per second at over 300 CPU cores… http://www.mysqlperformanceblog.com/2011/02/28/is-voltdb-really-as-scalable-as-they-claim/ Slide 30
  31. 31. The future is polyglot persistence e.g. Netflix •  RDBMS •  SimpleDB •  Cassandra •  Hadoop/Hbase IEEE Software Sept/October 2010 - Debasish Ghosh / Twitter @debasishg 31
  32. 32. Spring Data is here to help For NoSQL databaseshttp://www.springsource.org/spring-data 32
  33. 33. Agenda  Why NoSQL? NewSQL?  Persisting entities  Implementing queries Slide 33
  34. 34. Food to Go – Place Order use case1.  Customer enters delivery address and delivery time2.  System displays available restaurants3.  Customer picks restaurant4.  System displays menu5.  Customer selects menu items6.  Customer places order 34
  35. 35. Food to Go – Domain model (partial)class Restaurant { class TimeRange { long id; long id; String name; int dayOfWeek; Set<String> serviceArea; int openingTime; Set<TimeRange> openingHours; int closingTime; List<MenuItem> menuItems; }} class MenuItem { String name; double price; } 35
  36. 36. Database schemaID Name … RESTAURANT1 Ajanta table2 Montclair EggshopRestaurant_id zipcode RESTAURANT_ZIPCODE1 94707 table1 946192 946112 94619 RESTAURANT_TIME_RANGE tableRestaurant_id dayOfWeek openTime closeTime1 Monday 1130 14301 Monday 1730 21302 Tuesday 1130 … 36
  37. 37. Pick your JDBC-based framework interface AvailableRestaurantRepository { void add(Restaurant restaurant); Restaurant findDetailsById(int id); … } Spring JDBC Hibernate/JPA … JDBC 37
  38. 38. But with NoSQL? interface AvailableRestaurantRepository { Restaurant void add(Restaurant restaurant); Restaurant findDetailsById(int id); … TimeRange MenuItem }Restaurant aggregate ? ? 38
  39. 39. MongoDB Slide 39
  40. 40. MongoDB: persisting restaurants is easy { "_id" : ObjectId("4bddc2f49d1505567c6220a0") "name": "Ajanta", "serviceArea": ["94619", "99999"], "openingHours": [ { "dayOfWeek": 1, "open": 1130, "close": 1430 }, { "dayOfWeek": 2, "open": 1130, "close": 1430 }, … ] } 40
  41. 41. Spring Data for Mongo code@Repositorypublic class AvailableRestaurantRepositoryMongoDbImpl implements AvailableRestaurantRepository { public static String AVAILABLE_RESTAURANTS_COLLECTION = "availableRestaurants"; @Autowired private MongoTemplate mongoTemplate; @Override public void add(Restaurant restaurant) { mongoTemplate.insert(restaurant, AVAILABLE_RESTAURANTS_COLLECTION); } @Override public Restaurant findDetailsById(int id) { return mongoTemplate.findOne(new Query(where("_id").is(id)), Restaurant.class, AVAILABLE_RESTAURANTS_COLLECTION); }} 41
  42. 42. Spring Configuration@Configurationpublic class MongoConfig extends AbstractDatabaseConfig { @Value("#{mongoDbProperties.databaseName}") private String mongoDbDatabase;@Beanpublic Mongo mongo() throws UnknownHostException, MongoException { return new Mongo(databaseHostName);} @Bean public MongoTemplate mongoTemplate(Mongo mongo) throws Exception { MongoTemplate mongoTemplate = new MongoTemplate(mongo, mongoDbDatabase); … return mongoTemplate; }} 42
  43. 43. Apache Cassandra 43
  44. 44. Option #1: Use a column per attribute Column Name = path/expression to access property value Column Family: RestaurantDetails openingHours[0].dayOfWeek Monday name Ajanta serviceArea[0] 946191 openingHours[0].open 1130 type indian serviceArea[1] 94707 openingHours[0].close 1430 Egg openingHours[0].dayOfWeek Monday name serviceArea[0] 94611 shop2 Break openingHours[0].open 0830 type serviceArea[1] 94619 Fast openingHours[0].close 1430
  45. 45. Option #2: Use a single column Column value = serialized object graph, e.g. JSON Column Family: RestaurantDetails 2 attributes: { name: “Montclair Eggshop”, … } 1 attributes { name: “Ajanta”, …} 2 attributes { name: “Eggshop”, …}Can’t use secondary indexes but they aren’t helpful for these use cases ✔ 45
  46. 46. Cassandra code: wrapper around Hectorpublic class AvailableRestaurantRepositoryCassandraKeyImpl implements AvailableRestaurantRepository {@Autowired Home grownprivate final CassandraTemplate cassandraTemplate; wrapper classpublic void add(Restaurant restaurant) { cassandraTemplate.insertEntity(keyspace, RESTAURANT_DETAILS_CF, restaurant);}public Restaurant findDetailsById(int id) { String key = Integer.toString(id); return cassandraTemplate.findEntity(Restaurant.class, keyspace, key, RESTAURANT_DETAILS_CF); …}… http://en.wikipedia.org/wiki/Hector 46
  47. 47. Slide 47
  48. 48. Using VoltDB  Use the original schema  Standard SQL statements BUT YOU MUST  Write stored procedures and invoke them using proprietary interface  Partition your data Slide 48
  49. 49. About VoltDB stored procedures  Key part of VoltDB  Replication = executing stored procedure on replica  Logging = log stored procedure invocation  Stored procedure invocation = transaction Slide 49
  50. 50. About partitioning Partition column RESTAURANT table ID Name … 1 Ajanta 2 Eggshop … Partition 1 Partition 2 ID Name … ID Name … 1 Ajanta 2 Eggshop … … Slide 50
  51. 51. Example VoltDB cluster (3 servers) x (2 partitions per server) ÷ 3 partitions (2 replicas per partition) Partition 1a Partition 2a Partition 3a ID Name … ID Name … ID Name … 1 Ajanta 2 Eggshop … .. … … … Partition 3b Partition 1b Partition 2b ID Name … ID Name … ID Name … … .. 1 Ajanta 2 Eggshop … … …VoltDB Server A VoltDB Server B VoltDB Server C Slide 51
  52. 52. Single partition procedure: FAST SELECT * FROM RESTAURANT WHERE ID = 1High-performance lock free code Stored Partition procedure column parameter ID Name … ID Name … ID Name … 1 Ajanta 1 Eggshop … .. … … … … … … VoltDB Server A VoltDB Server B VoltDB Server C Slide 52
  53. 53. Multi-partition procedure: SLOWER SELECT * FROM RESTAURANT WHERE NAME = ‘Ajanta’ Communication/Coordination overhead ID Name … ID Name … ID Name … 1 Ajanta 1 Eggshop … .. … … … … … …VoltDB Server A VoltDB Server B VoltDB Server C Slide 53
  54. 54. Chosen partitioning scheme<partitions> <partition table="restaurant" column="id"/> <partition table="service_area" column="restaurant_id"/> <partition table="menu_item" column="restaurant_id"/> <partition table="time_range" column="restaurant_id"/> <partition table="available_time_range" column="restaurant_id"/></partitions> Performance is excellent: much faster than MySQL 54
  55. 55. Stored procedure – AddRestaurant@ProcInfo( singlePartition = true, partitionInfo = "Restaurant.id: 0”)public class AddRestaurant extends VoltProcedure { public final SQLStmt insertRestaurant = new SQLStmt("INSERT INTO Restaurant VALUES (?,?);"); public final SQLStmt insertServiceArea = new SQLStmt("INSERT INTO service_area VALUES (?,?);"); public final SQLStmt insertOpeningTimes = new SQLStmt("INSERT INTO time_range VALUES (?,?,?,?);"); public final SQLStmt insertMenuItem = new SQLStmt("INSERT INTO menu_item VALUES (?,?,?);");public long run(int id, String name, String[] serviceArea, long[] daysOfWeek, long[] openingTimes, long[] closingTimes, String[] names, double[] prices) { voltQueueSQL(insertRestaurant, id, name); for (String zipCode : serviceArea) voltQueueSQL(insertServiceArea, id, zipCode); for (int i = 0; i < daysOfWeek.length ; i++) voltQueueSQL(insertOpeningTimes, id, daysOfWeek[i], openingTimes[i], closingTimes[i]); for (int i = 0; i < names.length ; i++) voltQueueSQL(insertMenuItem, id, names[i], prices[i]); voltExecuteSQL(true); return 0; }} 55
  56. 56. VoltDb repository – add()@Repositorypublic class AvailableRestaurantRepositoryVoltdbImpl implements AvailableRestaurantRepository { @Autowired private VoltDbTemplate voltDbTemplate; @Override public void add(Restaurant restaurant) { invokeRestaurantProcedure("AddRestaurant", restaurant); } private void invokeRestaurantProcedure(String procedureName, Restaurant restaurant) { Object[] serviceArea = restaurant.getServiceArea().toArray(); long[][] openingHours = toArray(restaurant.getOpeningHours()); Flatten Object[][] menuItems = toArray(restaurant.getMenuItems()); Restaurant voltDbTemplate.update(procedureName, restaurant.getId(), restaurant.getName(), serviceArea, openingHours[0], openingHours[1], openingHours[2], menuItems[0], menuItems[1]);} 56
  57. 57. VoltDbTemplate wrapper classpublic class VoltDbTemplate { private Client client; VoltDB client API public VoltDbTemplate(Client client) { this.client = client; } public void update(String procedureName, Object... params) { try { ClientResponse x = client.callProcedure(procedureName, params); … } catch (Exception e) { throw new RuntimeException(e); } } 57
  58. 58. VoltDb server configuration<?xml version="1.0"?> <deployment><project> <cluster hostcount="1" <info> <name>Food To Go</name> sitesperhost="5" kfactor="0" /> ... </info> </deployment> <database> <schemas> <schema path=schema.sql /> </schemas> <partitions> <partition table="restaurant" column="id"/> ... </partitions> <procedures> <procedure class=net.chrisrichardson.foodToGo.newsql.voltdb.procs.AddRestaurant /> ... </procedures> </database></project>voltcompiler target/classes src/main/resources/sql/voltdb-project.xml foodtogo.jarbin/voltdb leader localhost catalog foodtogo.jar deployment deployment.xml 58
  59. 59. Agenda  Why NoSQL? NewSQL?  Persisting entities  Implementing queries Slide 59
  60. 60. Finding available restaurantsAvailable restaurants = Serve the zip code of the delivery addressAND Are open at the delivery timepublic interface AvailableRestaurantRepository { List<AvailableRestaurant> findAvailableRestaurants(Address deliveryAddress, Date deliveryTime); …} 60
  61. 61. Finding available restaurants on Monday,6.15pm for 94619 zipselect r.* Straightforwardfrom restaurant r three-way join inner join restaurant_time_range tr on r.id =tr.restaurant_id inner join restaurant_zipcode sa on r.id = sa.restaurant_idWhere ’94619’ = sa.zip_codeand tr.day_of_week=’monday’and tr.openingtime <= 1815and 1815 <= tr.closingtime 61
  62. 62. MongoDB Slide 62
  63. 63. MongoDB = easy to query{ serviceArea:"94619", Find a openingHours: { $elemMatch : { restaurant "dayOfWeek" : "Monday", "open": {$lte: 1815}, that serves } "close": {$gte: 1815} the 94619 zip} } code and is open atDBCursor cursor = collection.find(qbeObject);while (cursor.hasNext()) { 6.15pm on a DBObject o = cursor.next(); … Monday }db.availableRestaurants.ensureIndex({serviceArea: 1}) 63
  64. 64. MongoTemplate-based code@Repositorypublic class AvailableRestaurantRepositoryMongoDbImpl implements AvailableRestaurantRepository {@Autowired private final MongoTemplate mongoTemplate;@Overridepublic List<AvailableRestaurant> findAvailableRestaurants(Address deliveryAddress, Date deliveryTime) { int timeOfDay = DateTimeUtil.timeOfDay(deliveryTime); int dayOfWeek = DateTimeUtil.dayOfWeek(deliveryTime); Query query = new Query(where("serviceArea").is(deliveryAddress.getZip()) .and("openingHours”).elemMatch(where("dayOfWeek").is(dayOfWeek) .and("openingTime").lte(timeOfDay) .and("closingTime").gte(timeOfDay))); return mongoTemplate.find(AVAILABLE_RESTAURANTS_COLLECTION, query, AvailableRestaurant.class);} mongoTemplate.ensureIndex(“availableRestaurants”, new Index().on("serviceArea", Order.ASCENDING)); 64
  65. 65. But how do this withApache Cassandra??! 65
  66. 66. Slice is equivalent to a simple SELECTkeyVal colName colValue select key, colName, colValueX Y Z from columnFamily… where key = keyVal and colName >= startVal and colName <= endVal Column Family columnFamily.slice(key=keyVal, startColumn=startVal, endColumn=endVal) X Y Z
  67. 67. We need to implement an indexthat can be queried using a slice Queries instead of data model drives NoSQL database design Slide 67
  68. 68. Simplification #1: DenormalizationRestaurant_id Day_of_week Open_time Close_time Zip_code1 Monday 1130 1430 947071 Monday 1130 1430 946191 Monday 1730 2130 947071 Monday 1730 2130 946192 Monday 0700 1430 94619… SELECT restaurant_id FROM time_range_zip_code WHERE day_of_week = ‘Monday’ Simpler query: AND zip_code = 94619   No joins   Two = and two < AND 1815 < close_time AND open_time < 1815 68
  69. 69. Simplification #2: Application filteringSELECT restaurant_id, open_time FROM time_range_zip_code WHERE day_of_week = ‘Monday’ Even simpler query AND zip_code = 94619 •  No joins AND 1815 < close_time •  Two = and one < AND open_time < 1815 69
  70. 70. Simplification #3: Eliminate multiple =’s withconcatenation Restaurant_id Zip_dow Open_time Close_time 1 94707:Monday 1130 1430 1 94619:Monday 1130 1430 1 94707:Monday 1730 2130 1 94619:Monday 1730 2130 2 94619:Monday 0700 1430 … SELECT restaurant_id, open_time FROM time_range_zip_code WHERE zip_code_day_of_week = ‘94619:Monday’ AND 1815 < close_time Row key range 70
  71. 71. Column family as an index Restaurant_id Zip_dow Open_time Close_time 1 94707:Monday 1130 1430 1 94619:Monday 1130 1430 1 94707:Monday 1730 2130 1 94619:Monday 1730 2130 2 94619:Monday 0700 1430 … Column Family: AvailableRestaurants JSON FOR JSON FOR (1430,0700,2) (2130,1730,1)94619:Monday EGG AJANTA JSON FOR (1430,1130,1) AJANTA
  72. 72. Querying with a slice Column Family: AvailableRestaurants JSON FOR JSON FOR (1430,0700,2) (2130,1730,1) EGG AJANTA94619:Monday JSON FOR (1430,1130,1) AJANTA slice(key= 94619:Monday, sliceStart = (1815, *, *), sliceEnd = (2359, *, *)) JSON FOR (2130,1730,1)94619:Monday AJANTA 18:15 is after 17:30  {Ajanta} 72
  73. 73. Needs a few pages of code private void insertAvailability(Restaurant restaurant) { for (String zipCode : (Set<String>) restaurant.getServiceArea()) {@Override for (TimeRange tr : (Set<TimeRange>) restaurant.getOpeningHours()) { public List<AvailableRestaurant> findAvailableRestaurants(Address deliveryAddress, Date deliveryTime) { String dayOfWeek = format2(tr.getDayOfWeek()); int dayOfWeek = DateTimeUtil.dayOfWeek(deliveryTime); String openingTime = format4(tr.getOpeningTime()); int timeOfDay = DateTimeUtil.timeOfDay(deliveryTime); String closingTime = format4(tr.getClosingTime()); String zipCode = deliveryAddress.getZip(); String key = formatKey(zipCode, format2(dayOfWeek)); String restaurantId = format8(restaurant.getId()); HSlicePredicate<Composite> predicate = new HSlicePredicate<Composite>(new CompositeSerializer()); String key = formatKey(zipCode, dayOfWeek); Composite start = new Composite(); Composite finish = new Composite(); String columnValue = toJson(restaurant); start.addComponent(0, format4(timeOfDay), ComponentEquality.GREATER_THAN_EQUAL); finish.addComponent(0, format4(2359), ComponentEquality.GREATER_THAN_EQUAL); Composite columnName = new Composite(); predicate.setRange(start, finish, false, 100); columnName.add(0, closingTime); final List<AvailableRestaurantIndexEntry> closingAfter = new ArrayList<AvailableRestaurantIndexEntry>(); columnName.add(1, openingTime); columnName.add(2, restaurantId); ColumnFamilyRowMapper<String, Composite, Object> mapper = new ColumnFamilyRowMapper<String, Composite, Object>() { @Override ColumnFamilyUpdater<String, Composite> updater public Object mapRow(ColumnFamilyResult<String, Composite> results) { = compositeCloseTemplate.createUpdater(key); for (Composite columnName : results.getColumnNames()) { String openTime = columnName.get(1, new StringSerializer()); String restaurantId = columnName.get(2, new StringSerializer()); updater.setString(columnName, columnValue); closingAfter.add(new AvailableRestaurantIndexEntry(openTime, restaurantId, results.getString(columnName))); } return null; } }; compositeCloseTemplate.update(updater); } compositeCloseTemplate.queryColumns(key, predicate, mapper); } List<AvailableRestaurant> result = new LinkedList<AvailableRestaurant>(); } for (AvailableRestaurantIndexEntry trIdAndAvailableRestaurant : closingAfter) { if (trIdAndAvailableRestaurant.isOpenBefore(timeOfDay)) result.add(trIdAndAvailableRestaurant.getAvailableRestaurant()); } return result; } 73
  74. 74. What did I just do to query the data?  Wrote code to maintain an index  Reduced performance due to extra writes 74
  75. 75. But what would you rather implement? “Complex” query logic OR Multi-datacenter, multi- master database infrastructure Slide 75
  76. 76. Slide 76
  77. 77. VoltDB - attempt #0@ProcInfo( singlePartition = false)public class FindAvailableRestaurants extends VoltProcedure { ... }SELECT r.*FROM restaurant r,time_range tr, service_area saWHERE ? = sa.zip_code and sa.restaurant_id= tr.restaurant_id AND r.restaurant_id = sa.restaurant_id and tr.day_of_week=? AND tr.open_time <= ? AND ? <= tr.close_time Slow  Slide 77
  78. 78. VoltDB - attempt #1@ProcInfo( singlePartition = false)public class FindAvailableRestaurants extends VoltProcedure { ... }create index idx_service_area_zip_code on service_area(zip_code);..ERROR 10:12:03,251 [main] COMPILER: Failed to plan for statementtype(findAvailableRestaurants_with_join) select r.* from restaurantr,time_range tr, service_area sa Where ? = sa.zip_code andsa.restaurant_id= tr.restaurant_id and r.restaurant_id =sa.restaurant_id and tr.day_of_week=? and tr.open_time <= ? and ?<= tr.close_time Error: "Unable to plan for statement. Possibly joiningpartitioned tables in a multi-partition procedure using a column that isnot the partitioning attribute or a non-equality operator. This isstatement not supported at this time."2012-03-19 08:19:31,743 ERROR [main] COMPILER: Catalogcompilation failed. #fail Slide 78
  79. 79. VoltDB - attempt #2@ProcInfo( singlePartition = true, partitionInfo = "Restaurant.id: 0”)public class AddRestaurant extends VoltProcedure { public final SQLStmt insertAvailable= new SQLStmt("INSERT INTO available_time_range VALUES (?,?,?, ?, ?, ?);"); public long run(....) { ... for (int i = 0; i < daysOfWeek.length ; i++) { voltQueueSQL(insertOpeningTimes, id, daysOfWeek[i], openingTimes[i], closingTimes[i]); for (String zipCode : serviceArea) { voltQueueSQL(insertAvailable, id, daysOfWeek[i], openingTimes[i], closingTimes[i], zipCode, name); } } ... public final SQLStmt findAvailableRestaurants_denorm = new SQLStmt( voltExecuteSQL(true); "select restaurant_id, name from available_time_range tr " + return 0; "where ? = tr.zip_code " + } "and tr.day_of_week=? " +} "and tr.open_time <= ? " + " and ? <= tr.close_time "); Works but queries are only slightly faster than MySQL!  Slide 79
  80. 80. VoltDB - attempt #3<partitions> ... <partition table="available_time_range" column="zip_code"/></partitions>@ProcInfo( singlePartition = false, ...)public class AddRestaurant extends VoltProcedure { ... }@ProcInfo( singlePartition = true, partitionInfo = "available_time_range.zip_code: 0")public class FindAvailableRestaurants extends VoltProcedure { ... } Queries are really fast  But inserts are not  80
  81. 81. VoltDB – key lessonA given partitioning scheme can be good for some use cases but bad for others Slide 81
  82. 82. Summary Different databases = Different tradeoffs 82
  83. 83. … Summary… Benefits DrawbacksRDBMS • SQL • Lack of performance and scalability • ACID • Difficult to distribute • Familiar • Schema updates • Handling semi-structured dataNoSQL • Scalability • Lack of ACID • Performance • Limited querying (some) • Schema-lessNewSQL • ACID • Proprietary API • SQL • Schema updates • Familiar • Handling semi-structured data • Scalability • Not all use cases are performant • Performance Slide 83
  84. 84. … Summary  Very carefully pick the NewSQL/ NoSQL DB for your application  Consider a polyglot persistence architecture  Encapsulate your data access code so you can switch  Startups = avoid NewSQL/NoSQL for shorter time to market? Slide 84
  85. 85. Thank you! My contact info: chris.richardson@springsource.com @crichardson Blog: http://plainoldobjects.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×