• Save
Polygot persistence for Java Developers - August 2011 / @Oakjug
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Polygot persistence for Java Developers - August 2011 / @Oakjug

on

  • 2,552 views

Relational databases have long been considered the one true way to persist enterprise data. But today, NoSQL databases are emerging as a viable alternative for many applications. They can simplify the ...

Relational databases have long been considered the one true way to persist enterprise data. But today, NoSQL databases are emerging as a viable alternative for many applications. They can simplify the persistence of complex data models and offer significantly better scalability, and performance. But NoSQL databases are very different than the ACID/SQL/JDBC/JPA world that we have become accustomed to. In this presentation, you will learn about our experience implementing a use case from POJOs in Action using popular NoSQL databases: Redis, MongoDB, and Cassandra. We will compare and contrast each database’s data model and Java API. You will learn about the benefits and drawbacks of using NoSQL.

Statistics

Views

Total Views
2,552
Views on SlideShare
1,947
Embed Views
605

Actions

Likes
2
Downloads
0
Comments
0

2 Embeds 605

http://plainoldobjects.com 467
http://plainoldobjects.wordpress.com 138

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Polygot persistence for Java Developers - August 2011 / @Oakjug Presentation Transcript

  • 1. Polyglot persistence for Java developers - moving out of the relational comfort zoneChris RichardsonAuthor of POJOs in ActionFounder of CloudFoundry.comchris@chrisrichardson.net@crichardson
  • 2. Overall presentation goalThe joy and pain of building Java applications that use NoSQL 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 2
  • 3. About Chris •  Grew up in England and live in Oakland, CA •  Over 25+ years of software development experience including 14+ years of Java •  Speaker at JavaOne, SpringOne, PhillyETE, Devoxx, etc. •  Organize the Oakland JUG and the Groovy Grails meetup http://www.theregister.co.uk/2009/08/19/springsource_cloud_foundry/ 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 3
  • 4. Agenda o  Why NoSQL? o  Overview of NoSQL databases o  Introduction to Spring Data o  Case study: POJOs in Action & NoSQL 8/19/11Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 4
  • 5. Relational databases are greato  SQL = Rich, declarative query languageo  Database enforces referential integrityo  ACID semanticso  Well understood by developerso  Well supported by frameworks and tools, e.g. Spring JDBC, Hibernate, JPAo  Well understood by operations n  Configuration n  Care and feeding n  Backups n  Tuning n  Failure and recovery n  Performance characteristicso  But…. 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 5
  • 6. Problem: Complex object graphso  Object/relational impedance mismatcho  Complicated to map rich domain model to relational schemao  Performance issues n  Many rows in many tables n  Many joins
  • 7. Problem: Semi-structured datao  Relational schema doesn’t easily handle semi-structured data: n  Varying attributes n  Custom attributes on a customer recordo  Common solution = Name/value table n  Poor performance n  E.g. Finding specific attributes for customers satisfying some criteria = multi-way outer JOIN n  Lack of constraintso  Another solution = Serialize as blob n  Fewer joins n  BUT can’t be queried
  • 8. Problem: Schema evolutiono  For example: n  Add attributes to an object è add columns to tableo  Schema changes = n  Holding locks for a long time è application downtime n  $$
  • 9. Problem: Scalingo  Scaling reads: n  Master/slave n  But beware of consistency issueso  Scaling writes n  Extremely difficult/impossible/expensive n  Vertical scaling is limited and requires $$ n  Horizontal scaling is limited/requires $$
  • 10. Solution: Buy high end technology http://upload.wikimedia.org/wikipedia/commons/e/e5/Rising_Sun_Yacht.JPG
  • 11. Solution: Hire more developerso  Application-level shardingo  Build your own middlewareo  …http://www.trekbikes.com/us/en/bikes/road/race_performance/madone_4_series/madone_4_5
  • 12. Solution: Use NewSQLo  Led by Stonebraker n  Current databases are designed for 1970s hardware and for both OLTP and data warehouses n  http://www.slideshare.net/VoltDB/sql- myths-webinaro  NewSQL n  Next generation SQL databases, e.g. VoltDB n  Leverage multi-core, commodity hardware n  In-memory n  Horizontally scalable n  Transparently shardable n  ACID
  • 13. NoSQL databases are emerging…Each one offerssome combinationof:o  Higher performanceo  Higher scalabilityo  Richer data-modelo  Schema-lessIn return for:o  Limited transactionso  Relaxed consistencyo  Unconstrained datao  … 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 13
  • 14. … but there are few commonalitieso  Everyone and their dog has written oneo  Different data models n  Key-value “Same sorry state as the database market in the 1970s before SQL was n  Column invented” http://queue.acm.org/detail.cfm? n  Document id=1961297 n  Grapho  Different APIso  No JDBC, Hibernate, JPA (generally) 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 14
  • 15. Future = multi-paradigm data storagefor enterprise applications IEEE Software Sept/October 2010 - Debasish Ghosh / Twitter @debasishg 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 15
  • 16. Agendao  Why NoSQL?o  Overview of NoSQL databaseso  Introduction to Spring Datao  Case study: POJOs in Action & NoSQL 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 16
  • 17. Rediso  Advanced key-value store n  Values can be binary strings, Lists, Sets, Sorted Sets, Hashes, … n  Data-type specific operationso  Very fast n  ~100K operations/second on entry-level hardware n  In-memory operations K1 V1o  Persistent K2 V2 n  Periodic snapshots of memory OR K3 V2 append commands to log fileo  Transactions within a single server n  Atomic execution of batched commands n  Optimistic locking 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 17
  • 18. Redis CLI Sorted set member = value + scoreredis> zadd mysortedset 5.0 a(integer) 1redis> zadd mysortedset 10.0 b(integer) 1redis> zadd mysortedset 1.0 c(integer) 1redis> zrange mysortedset 0 11) "c"2) "a"redis> zrangebyscore mysortedset 1 61) "c"2) "a" 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 18
  • 19. Scaling Rediso  Master/slave replication n  Tree of Redis servers n  Non-persistent master can replicate to a persistent slave n  Use slaves for read-only querieso  Sharding n  Client-side only – consistent hashing based on key n  Server-side sharding – coming one dayo  Run multiple servers per physical host n  Server is single threaded => Leverage multiple CPUs n  32 bit more efficient than 64 bit 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 19
  • 20. Downsides of Rediso  Low-level API compared to SQLo  Single threaded: n  Multiple cores è multiple Redis serverso  Master/slave failover is manualo  Partitioning is done by the cliento  Dataset has to fit in memory
  • 21. Redis use caseso  Drop-in replacement for Memcached n  Session state n  Cache of data retrieved from SORo  Replica of SOR for queries needing high- performanceo  Miscellaneous yet important n  Counting using INCR command, e.g. hit counts n  Most recent N items - LPUSH and LTRIM n  Randomly selecting an item – SRANDMEMBER n  Queuing – Lists with LPOP, RPUSH, …. n  High score tables – Sorted sets and ZINCRBY n  …o  Notable users: github, guardian.co.uk, …. 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 21
  • 22. Cassandrao  An Apache open-source projecto  Developed by Facebook for inbox searcho  Column-oriented database/Extensible row store n  The data model will hurt your brain n  Row = map or map of mapso  Fast writes = append to a logo  Extremely scalable n  Transparent and dynamic clustering n  Rack and datacenter aware data replicationo  Tunable read/write consistency per operation n  Writes: any, one replica, quorum of replicas, …, all n  Read: one, quorum, …, allo  CQL = “SQL”-like DDL and DML 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 22
  • 23. Cassandra data model My Column family (within a key space) Keys Columns a colA: value1 colB: value2 colC: value3 b colA: value colD: value colE: value A column has a timestamp too  4-D map: keySpace x key x columnFamily x column è valueo  Arbitrary number of columnso  Column names are dynamic; can contain datao  Columns for a row are stored on disk in order determined by comparatoro  One CF row = one DDD aggregate 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 23
  • 24. Cassandra data model – insert/update My Column family (within a key space) Keys Columns a colA: value1 colB: value2 colC: value3 Transaction = updates to a row within a b colA: value colD: value colE: value ColumnFamily Insert(key=a, columName=colZ, value=foo) Idempotent Keys Columns a colA: value1 colB: value2 colC: value3 colZ: foo b colA: value colD: value colE: value 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 24
  • 25. Cassandra query example – sliceKey Columns s colA: colB: colC: colZ:a value1 value2 value3 foo colA: colD: colE:b value value value slice(key=a, startColumn=colA, endColumnName=colC)Key Columns You can also do a s rangeSlice which colA: colB:a value1 value2 returns a range of keys – less efficient 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 25
  • 26. Super Column Families – one moredimension My Column family (within a key space) Keys Super columns ScA ScB a colA: value1 colB: value2 colC: value3 b colA: value colD: value colE: value Insert(key=a, superColumn=scB, columName=colZ, value=foo) keySpace x key x columnFamily x superColumn x column -> value Keys Super columns ScA ScB a colA: value1 colB: value2 colC:colZ: foo value3 b colA: value colD: value colE: value 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 26
  • 27. Getting data with super slice My Column family (within a key space) Keys Super columns ScA ScB a colA: value1 colB: value2 colC: value3 b colA: value colD: value colE: value superSlice(key=a, startColumn=scB, endColumnName=scC) Keys Super columns ScB a colC: value3 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 27
  • 28. Cassandra CLI$ bin/cassandra-cli -h localhostConnected to: "Test Cluster" on localhost/9160Welcome to cassandra CLI.[default@unknown] use Keyspace1;Authenticated to keyspace: Keyspace1[default@Keyspace1] list restaurantDetails;Using default limit of 100-------------------RowKey: 1=> (super_column=attributes, (column=json, value={"id": 1,"name":"Ajanta","menuItems"....[default@Keyspace1] get restaurantDetails[1] [attributes’];=> (column=json, value={"id": 1,"name":"Ajanta","menuItems".... 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 28
  • 29. Scaling Cassandra • Client connects to any node • Dynamically add/remove nodes Keys = [D, A] Node 1 • Reads/Writes specify how many nodes • Configurable # of replicas Token = A •  adjacent nodes •  rack and data center aware replicates replicates Node 4 Node 2 Keys = [A, B] Token = D Token = B replicatesKeys = [C, D] replicates Replicates to Node 3 Token = C Keys = [B, C] 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 29
  • 30. Downsides of Cassandrao  Learning curveo  Still maturing, currently v0.8.4o  Limited queries, i.e. KV lookupo  Transactions limited to a column family rowo  Lacks an easy to use API 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 30
  • 31. Cassandra use caseso  Use cases •  Big data •  Multiple Data Center distributed database •  Persistent cache •  (Write intensive) Logging •  High-availability (writes)o  Who is using it n  Digg, Facebook, Twitter, Reddit, Rackspace n  Cloudkick, Cisco, SimpleGeo, Ooyala, OpenX n  The largest production cluster has over 100 TB of data in over 150 machines. – Casssandra web site 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 31
  • 32. MongoDBo  Document-oriented database n  JSON-style documents: Lists, Maps, primitives n  Documents organized into collections (~table) n  Schema-lesso  Rich query language for dynamic querieso  Asynchronous, configurable writes: n  No wait n  Wait for replication n  Wait for write to disko  Very fasto  Highly scalable and available: n  Replica sets (generalized master/slave) n  Sharding n  Transparent to client 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 32
  • 33. Data Model = Binary JSON documents{ "name" : "Sahn Maru", One document "type" : ”Korean", "serviceArea" : [ = "94619", "94618" one DDD aggregate ], "openingHours" : [ { DBObject o = new BasicDBObject(); "dayOfWeek" : "Wednesday", o.put("name", ”Sahn Maru"); "open" : 1730, "close" : 2230 DBObject mi = new BasicDBObject(); } mi.put("name", "Daeji Bulgogi"); ], … "_id" : ObjectId("4bddc2f49d1505567c6220a0") List<DBObject> mis = Collections.singletonList(mi);} o.put("menuItems", mis); o  Sequence of bytes on disk = fast I/O n  No joins/seeks n  In-place updates when possible è no index updates o  Transaction = update of single document 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 33
  • 34. MongoDB CLI$ bin/mongo> use mydb> r1 = {name: Ajanta}{name: Ajanta}> r2 = {name: Montclair Egg Shop}{name: Montclair Egg Shop}> db.restaurants.save(r1)> r1{ _id: ObjectId("98…"), name: "Ajanta"}> db.restaurants.save(r2)> r2{ _id: ObjectId("66…"), name: "Montclair Egg Shop"}> db.restaurants.find({name: /^A/}){ _id: ObjectId("98…"), name: "Ajanta"}> db.restaurants.update({name: "Ajanta"},{name: "Ajanta Restaurant"}) 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 34
  • 35. MongoDB query by example{ serviceArea:"94619", Find a openingHours: { $elemMatch : { restaurant "dayOfWeek" : "Monday", "open": {$lte: 1800}, that serves } "close": {$gte: 1800} the 94619 zip} } code and is open at 6pmDBCursor cursor = collection.find(qbeObject);while (cursor.hasNext()) { on a Monday DBObject o = cursor.next(); … } 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 35
  • 36. Scaling MongoDB Shard 1 Shard 2 Mongod Mongod (replica) (replica) Mongod Mongod (master) Mongod (master) Mongod (replica) (replica)ConfigServermongod A shard consists of a mongos replica set = generalization of master slavemongodmongod Collections spread over multiple client shards 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 36
  • 37. Mongo Downsideso  Server has a global write lock n  Single writer OR multiple readers è Long running queries blocks writerso  Great that writes are not synchronous n  BUT perhaps an asynchronous response would be better than a synchronous getLastError()Interesting story: http://www.slideshare.net/eonnen/from-100s-to-100s-of-millions
  • 38. MongoDB use caseso  Use cases n  High volume writes n  Complex data n  Semi-structured datao  Who is using it? n  Shutterfly, Foursquare n  Bit.ly Intuit n  SourceForge, NY Times n  GILT Groupe, Evite, n  SugarCRM 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 38
  • 39. Other NoSQL databasesType ExamplesExtensible columns/Column- Hbaseoriented SimpleDBGraph Neo4jKey-value MembaseDocument CouchDb http://nosql-database.org/ lists 122+ NoSQL databases 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 39
  • 40. Picking a databaseApplication requirement SolutionComplex transactions/ACID Relational databaseScaling NoSQLSocial data Graph databaseMultiple datacenters CassandraHighly-available writes CassandraFlexible data Document storeHigh write volumes Mongo, CassandraSuper fast cache RedisAdhoc queries Relational or Mongo… http://highscalability.com/blog/2011/6/20/35-use-cases-for-choosing-your-next-nosql-database.html 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 40
  • 41. Proceed with cautiono  Don’t commit to a NoSQL DB until you have done a significant POCo  Encapsulate your data access code so you can switcho  Hope that one day you won’t need ACID
  • 42. Agendao  Why NoSQL?o  Overview of NoSQL databaseso  Introduction to Spring Datao  Case study: POJOs in Action & NoSQL 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 42
  • 43. NoSQL Java APIsDatabase LibrariesRedis Jedis, JRedis, JDBC-Redis, RJCCassandra Raw Thrift if you are a masochist Hector, …MongoDB MongoDB provides a Java driver Some are not so easy to use Stylistic differences Boilerplate code … 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 43
  • 44. Spring Data Project GoalsBring classic Spring value propositions to a widerange of NoSQL databases è n  Productivity n  Programming model consistency: E.g. <NoSQL>Template classes n  “Portability”http://www.springsource.org/spring-data Slide 44
  • 45. Spring Data sub-projects§ Commons: Polyglot persistence§ Key-Value: Redis, Riak§ Document: MongoDB, CouchDB§ Graph: Neo4j§ GORM for NoSQL§ Various milestone releases § Redis 1.0.0.M4 (July 20th, 2011) § Document 1.0.0.M2 (April 9, 2011) § Graph - Neo4j Support 1.0.0 (April 19, 2011) § … 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 45
  • 46. MongoTemplate MongoTemplateSimplifies data databaseName POJO ó DBObjectaccess userId mapping PasswordTranslates defaultCollectionNameexceptions writeConcern writeResultChecking save() <<interface>> insert() remove() MongoConvertor updateFirst() write(Object, DBObject) findOne() read(Class, DBObject) find() … SimpleMongo uses Converter Mongo MongoMapping (Java Driver class) Converter Slide 46
  • 47. Richer mapping Annotations define mapping: @Document, @Id, @Indexed, @PersistanceConstructor,@Document @CompoundIndex, @DBRef,public class Person { @GeoSpatialIndexed, @Value @Id private ObjectId id; Map fields instead of properties private String firstname; è no getters or setters required @Indexed Non-default constructor private String lastname; Index generation @PersistenceConstructor public Person(String firstname, String lastname) { this.firstname = firstname; this.lastname = lastname; }….} Slide 47
  • 48. Generic Mongo Repositoriesinterface PersonRepository extends MongoRepository<Person, ObjectId> { List<Person> findByLastname(String lastName);}<bean> <mongo:repositories base-package="net.chrisrichardson.mongodb.example.mongorepository" mongo-template-ref="mongoTemplate" /></beans>Person p = new Person("John", "Doe");personRepository.save(p);Person p2 = personRepository.findOne(p.getId());List<Person> johnDoes = personRepository.findByLastname("Doe");assertEquals(1, johnDoes.size()); Slide 48
  • 49. Support for the QueryDSL project Generated from Type-safedomain model class composable queries QPerson person = QPerson.person; Predicate predicate = person.homeAddress.street1.eq("1 High Street") .and(person.firstname.eq("John")) List<Person> people = personRepository.findAll(predicate); assertEquals(1, people.size()); assertPersonEquals(p, people.get(0)); Slide 49
  • 50. Cross-store/polyglot persistence Person person = new Person(…);@Entity entityManager.persist(person);public class Person { // In Database Person p2 = entityManager.find(…) @Id private Long id; private String firstname; private String lastname;// In MongoDB@RelatedDocument private Address address; { "_id" : ObjectId(”….."), "_entity_id" : NumberLong(1), "_entity_class" : "net.. Person", "_entity_field_name" : "address", "zip" : "94611", "street1" : "1 High Street", …} Slide 50
  • 51. Agendao  Why NoSQL?o  Overview of NoSQL databaseso  Introduction to Spring Datao  Case study: POJOs in Action & NoSQL 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 51
  • 52. Food to Go – placing a takeout ordero  Customer enters delivery address and delivery timeo  System displays available restaurants = restaurants that serve the zip code of the delivery address AND are open at the delivery time class Restaurant { class TimeRange { long id; long id; String name; int dayOfWeek; Set<String> serviceArea; int openingTime; Set<TimeRange> openingHours; int closingTime; List<MenuItem> menuItems; } } class MenuItem { String name; double price; } 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 52
  • 53. Database schemaID Name … RESTAURANT1 Ajanta table2 Montclair EggshopRestaurant_id zipcode RESTAURANT_ZIPCODE1 94707 table1 946192 946112 94619 RESTAURANT_TIME_RANGE tableRestaurant_id dayOfWeek openTime closeTime1 Monday 1130 14301 Monday 1730 21302 Tuesday 1130 … 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 53
  • 54. Finding available restaurants on monday, 7.30pm for 94619 zipselect r.* Straightforwardfrom restaurant r three-way join inner join restaurant_time_range tr on r.id =tr.restaurant_id inner join restaurant_zipcode sa on r.id = sa.restaurant_idWhere ’94619’ = sa.zip_codeand tr.day_of_week=’monday’and tr.openingtime <= 1930and 1930 <=tr.closingtime 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 54
  • 55. Redis - Persisting restaurants is “easy”rest:1:details [ name: “Ajanta”, … ] Multiple KV valuerest:1:serviceArea [ “94619”, “94611”, …] pairsrest:1:openingHours [10, 11]timerange:10 [“dayOfWeek”: “Monday”, ..]timerange:11 [“dayOfWeek”: “Tuesday”, ..] Single KV hash ORrest:1 [ name: “Ajanta”, “serviceArea:0” : “94611”, “serviceArea:1” : “94619”, “menuItem:0:name”, “Chicken Vindaloo”, …] OR Single KV String rest:1 { .. A BIG STRING/BYTE ARRAY, E.G. JSON } 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 55
  • 56. BUT…o  … we can only retrieve them via primary keyè  We need to implement indexesè  Queries instead of data model drives NoSQL database designo  But how can a key-value store support a query that has ? n  A 3-way join n  Multiple = n  > and < 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 56
  • 57. Simplification #1: DenormalizationRestaurant_id Day_of_week Open_time Close_time Zip_code1 Monday 1130 1430 947071 Monday 1130 1430 946191 Monday 1730 2130 947071 Monday 1730 2130 946192 Monday 0700 1430 94619… SELECT restaurant_id, open_time FROM time_range_zip_code WHERE day_of_week = ‘Monday’ Simpler query: AND zip_code = 94619 §  No joins §  Two = and two < AND 1815 < close_time AND open_time < 1815 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 57
  • 58. Simplification #2: Application filteringSELECT restaurant_id, open_time FROM time_range_zip_code WHERE day_of_week = ‘Monday’ Even simple query AND zip_code = 94619 •  No joins AND 1815 < close_time •  Two = and one < AND open_time < 1815 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 58
  • 59. Simplification #3: Eliminate multiple =’s with concatenation Restaurant_id Zip_dow Open_time Close_time 1 94707:Monday 1130 1430 1 94619:Monday 1130 1430 1 94707:Monday 1730 2130 1 94619:Monday 1730 2130 2 94619:Monday 0700 1430 …SELECT … FROM time_range_zip_code WHERE zip_code_day_of_week = ‘94619:Monday’ AND 1815 < close_time key range 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 59
  • 60. Sorted sets support range queries Key Sorted Set [ Entry:Score, …] 94707:Monday [1130_1:1430, 1730_1:2130] 94619:Monday [0700_2:1430, 1130_1:1430, 1730_1:2130] zipCode:dayOfWeek Member: OpeningTime_RestaurantId Score: ClosingTime ZRANGEBYSCORE 94619:Monday 1815 2359 è {1730_1} 1730 is before 1815 è Ajanta is open 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 60
  • 61. What did I just do to query the data? 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 61
  • 62. What did I just do to query the data?o  Wrote code to maintain an indexo  Reduced performance due to extra writes 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 62
  • 63. RedisTemplate-based code@Repositorypublic class AvailableRestaurantRepositoryRedisImpl implements AvailableRestaurantRepository {@Autowired private final StringRedisTemplate redisTemplate;private BoundZSetOperations<String, String> closingTimes(int dayOfWeek, String zipCode) { return redisTemplate.boundZSetOps(AvailableRestaurantKeys.closingTimesKey(dayOfWeek, zipCode)); }public List<AvailableRestaurant> findAvailableRestaurants(Address deliveryAddress, Date deliveryTime) { String zipCode = deliveryAddress.getZip(); int timeOfDay = timeOfDay(deliveryTime); int dayOfWeek = dayOfWeek(deliveryTime); Set<String> closingTrs = closingTimes(dayOfWeek, zipCode).rangeByScore(timeOfDay, 2359); Set<String> restaurantIds = new HashSet<String>(); String paddedTimeOfDay = FormattingUtil.format4(timeOfDay); for (String trId : closingTrs) { if (trId.substring(0, 4).compareTo(paddedTimeOfDay) <= 0) restaurantIds.add(StringUtils.substringAfterLast(trId, "_")); } Collection<String> jsonForRestaurants = redisTemplate.opsForValue().multiGet(AvailableRestaurantKeys.timeRangeRestaurantInfoKeys(restaurantIds )); List<AvailableRestaurant> restaurants = new ArrayList<AvailableRestaurant>(); for (String json : jsonForRestaurants) { restaurants.add(AvailableRestaurant.fromJson(json)); } return restaurants; } 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 63
  • 64. Redis – Spring configuration@Configurationpublic class RedisConfiguration extends AbstractDatabaseConfig { @Bean public RedisConnectionFactory jedisConnectionFactory() { JedisConnectionFactory factory = new JedisConnectionFactory(); factory.setHostName(databaseHostName); factory.setPort(6379); factory.setUsePool(true); JedisPoolConfig poolConfig = new JedisPoolConfig(); poolConfig.setMaxActive(1000); factory.setPoolConfig(poolConfig); return factory; } @Bean public StringRedisTemplate stringRedisTemplate(RedisConnectionFactory factory) { StringRedisTemplate template = new StringRedisTemplate(); template.setConnectionFactory(factory); return template; }} 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 64
  • 65. Cassandra: Easy to storerestaurants Column Family: RestaurantDetails Keys Columns 1 name: Ajanta type: Indian … name: Montclair 2 type: Breakfast … Egg Shop OR Column Family: RestaurantDetails Keys Columns 1 details: { JSON DOCUMENT } 2 details: { JSON DOCUMENT } 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 65
  • 66. Querying using Cassandrao  Similar challenges to using Rediso  Limited querying options n  Row key – exact or range n  Column name – exact or rangeo  Use composite/concatenated keys n  Prefix - equality match n  Suffix - can be range scano  No joins è denormalize 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 66
  • 67. Cassandra: Find restaurants that close after the deliverytime and then filter Keys Super Columns 1430 1430 213094619:Mon 1130_1: JSON FOR 1730_1: JSON FOR 0700_2: JSON FOR EGG AJANTA AJANTA SuperSlice key= 94619:Mon SliceStart = 1815 SliceEnd = 2359 Keys Super Columns 213094619:Mon 1730_1: JSON FOR AJANTA 18:15 is after 17:30 => {Ajanta} 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 67
  • 68. Cassandra/Hector codeimport me.prettyprint.hector.api.Cluster;public class CassandraHelper { @Autowired private final Cluster cluster; public <T> List<T> getSuperSlice(String keyspace, String columnFamily, String key, String sliceStart, String sliceEnd, SuperSliceResultMapper<T> resultMapper) { SuperSliceQuery<String, String, String, String> q = HFactory.createSuperSliceQuery(HFactory.createKeyspace(keyspace, cluster), StringSerializer.get(), StringSerializer.get(), StringSerializer.get(), StringSerializer.get()); q.setColumnFamily(columnFamily); q.setKey(key); q.setRange(sliceStart, sliceEnd, false, 10000); QueryResult<SuperSlice<String, String, String>> qr = q.execute(); SuperColumnRowProcessor<T> rowProcessor = new SuperColumnRowProcessor<T>(resultMapper); for (HSuperColumn<String, String, String> superColumn : qr.get().getSuperColumns()) { List<HColumn<String, String>> columns = superColumn.getColumns(); rowProcessor.processRow(key, superColumn.getName(), columns); } return rowProcessor.getResult(); }} 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 68
  • 69. MongoDB = easy to store{ "_id": "1234" "name": "Ajanta", "serviceArea": ["94619", "99999"], "openingHours": [ { "dayOfWeek": 1, "open": 1130, "close": 1430 }, { "dayOfWeek": 2, "open": 1130, "close": 1430 }, … ]} 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 69
  • 70. MongoDB = easy to query{ "serviceArea": "94619", "openingHours": { "$elemMatch": { "open": { "$lte": 1815}, "dayOfWeek": 4, "close": { $gte": 1815} } } db.availableRestaurants.ensureIndex({serviceArea: 1}) 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 70
  • 71. MongoTemplate-based code@Repositorypublic class AvailableRestaurantRepositoryMongoDbImpl implements AvailableRestaurantRepository {@Autowired private final MongoTemplate mongoTemplate;@Autowired @Overridepublic List<AvailableRestaurant> findAvailableRestaurants(Address deliveryAddress, Date deliveryTime) { int timeOfDay = DateTimeUtil.timeOfDay(deliveryTime); int dayOfWeek = DateTimeUtil.dayOfWeek(deliveryTime);Query query = new Query(where("serviceArea").is(deliveryAddress.getZip()) .and("openingHours”).elemMatch(where("dayOfWeek").is(dayOfWeek) .and("openingTime").lte(timeOfDay) .and("closingTime").gte(timeOfDay))); return mongoTemplate.find(AVAILABLE_RESTAURANTS_COLLECTION, query, AvailableRestaurant.class);} mongoTemplate.ensureIndex(“availableRestaurants”, new Index().on("serviceArea", Order.ASCENDING)); 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 71
  • 72. MongoDB – Spring Configuration@Configurationpublic class MongoConfig extends AbstractDatabaseConfig { private @Value("#{mongoDbProperties.databaseName}") String mongoDbDatabase; public @Bean MongoFactoryBean mongo() { MongoFactoryBean factory = new MongoFactoryBean(); factory.setHost(databaseHostName); MongoOptions options = new MongoOptions(); options.connectionsPerHost = 500; factory.setMongoOptions(options); return factory; } public @Bean MongoTemplate mongoTemplate(Mongo mongo) throws Exception { MongoTemplate mongoTemplate = new MongoTemplate(mongo, mongoDbDatabase); mongoTemplate.setWriteConcern(WriteConcern.SAFE); mongoTemplate.setWriteResultChecking(WriteResultChecking.EXCEPTION); return mongoTemplate; }} 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 72
  • 73. Summaryo  Relational databases are great but n  Object/relational impedance mismatch n  Relational schema is rigid n  Extremely difficult/impossible to scale writes n  Performance can be suboptimalo  Each NoSQL databases can solve some combination of those problems BUT n  Limited transactions n  One day needing ACID è major rewrite n  Query-driven, denormalized database design n  … èo  Carefully pick the NoSQL DB for your applicationo  Consider a polyglot persistence architecture 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 74
  • 74. Thank you! My contact info: chris@chrisrichardson.net @crichardson 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 75