Polygot persistence for Java Developers - August 2011 / @Oakjug

  • 2,109 views
Uploaded on

Relational databases have long been considered the one true way to persist enterprise data. But today, NoSQL databases are emerging as a viable alternative for many applications. They can simplify the …

Relational databases have long been considered the one true way to persist enterprise data. But today, NoSQL databases are emerging as a viable alternative for many applications. They can simplify the persistence of complex data models and offer significantly better scalability, and performance. But NoSQL databases are very different than the ACID/SQL/JDBC/JPA world that we have become accustomed to. In this presentation, you will learn about our experience implementing a use case from POJOs in Action using popular NoSQL databases: Redis, MongoDB, and Cassandra. We will compare and contrast each database’s data model and Java API. You will learn about the benefits and drawbacks of using NoSQL.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,109
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
0
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Polyglot persistence for Java developers - moving out of the relational comfort zoneChris RichardsonAuthor of POJOs in ActionFounder of CloudFoundry.comchris@chrisrichardson.net@crichardson
  • 2. Overall presentation goalThe joy and pain of building Java applications that use NoSQL 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 2
  • 3. About Chris •  Grew up in England and live in Oakland, CA •  Over 25+ years of software development experience including 14+ years of Java •  Speaker at JavaOne, SpringOne, PhillyETE, Devoxx, etc. •  Organize the Oakland JUG and the Groovy Grails meetup http://www.theregister.co.uk/2009/08/19/springsource_cloud_foundry/ 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 3
  • 4. Agenda o  Why NoSQL? o  Overview of NoSQL databases o  Introduction to Spring Data o  Case study: POJOs in Action & NoSQL 8/19/11Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 4
  • 5. Relational databases are greato  SQL = Rich, declarative query languageo  Database enforces referential integrityo  ACID semanticso  Well understood by developerso  Well supported by frameworks and tools, e.g. Spring JDBC, Hibernate, JPAo  Well understood by operations n  Configuration n  Care and feeding n  Backups n  Tuning n  Failure and recovery n  Performance characteristicso  But…. 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 5
  • 6. Problem: Complex object graphso  Object/relational impedance mismatcho  Complicated to map rich domain model to relational schemao  Performance issues n  Many rows in many tables n  Many joins
  • 7. Problem: Semi-structured datao  Relational schema doesn’t easily handle semi-structured data: n  Varying attributes n  Custom attributes on a customer recordo  Common solution = Name/value table n  Poor performance n  E.g. Finding specific attributes for customers satisfying some criteria = multi-way outer JOIN n  Lack of constraintso  Another solution = Serialize as blob n  Fewer joins n  BUT can’t be queried
  • 8. Problem: Schema evolutiono  For example: n  Add attributes to an object è add columns to tableo  Schema changes = n  Holding locks for a long time è application downtime n  $$
  • 9. Problem: Scalingo  Scaling reads: n  Master/slave n  But beware of consistency issueso  Scaling writes n  Extremely difficult/impossible/expensive n  Vertical scaling is limited and requires $$ n  Horizontal scaling is limited/requires $$
  • 10. Solution: Buy high end technology http://upload.wikimedia.org/wikipedia/commons/e/e5/Rising_Sun_Yacht.JPG
  • 11. Solution: Hire more developerso  Application-level shardingo  Build your own middlewareo  …http://www.trekbikes.com/us/en/bikes/road/race_performance/madone_4_series/madone_4_5
  • 12. Solution: Use NewSQLo  Led by Stonebraker n  Current databases are designed for 1970s hardware and for both OLTP and data warehouses n  http://www.slideshare.net/VoltDB/sql- myths-webinaro  NewSQL n  Next generation SQL databases, e.g. VoltDB n  Leverage multi-core, commodity hardware n  In-memory n  Horizontally scalable n  Transparently shardable n  ACID
  • 13. NoSQL databases are emerging…Each one offerssome combinationof:o  Higher performanceo  Higher scalabilityo  Richer data-modelo  Schema-lessIn return for:o  Limited transactionso  Relaxed consistencyo  Unconstrained datao  … 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 13
  • 14. … but there are few commonalitieso  Everyone and their dog has written oneo  Different data models n  Key-value “Same sorry state as the database market in the 1970s before SQL was n  Column invented” http://queue.acm.org/detail.cfm? n  Document id=1961297 n  Grapho  Different APIso  No JDBC, Hibernate, JPA (generally) 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 14
  • 15. Future = multi-paradigm data storagefor enterprise applications IEEE Software Sept/October 2010 - Debasish Ghosh / Twitter @debasishg 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 15
  • 16. Agendao  Why NoSQL?o  Overview of NoSQL databaseso  Introduction to Spring Datao  Case study: POJOs in Action & NoSQL 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 16
  • 17. Rediso  Advanced key-value store n  Values can be binary strings, Lists, Sets, Sorted Sets, Hashes, … n  Data-type specific operationso  Very fast n  ~100K operations/second on entry-level hardware n  In-memory operations K1 V1o  Persistent K2 V2 n  Periodic snapshots of memory OR K3 V2 append commands to log fileo  Transactions within a single server n  Atomic execution of batched commands n  Optimistic locking 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 17
  • 18. Redis CLI Sorted set member = value + scoreredis> zadd mysortedset 5.0 a(integer) 1redis> zadd mysortedset 10.0 b(integer) 1redis> zadd mysortedset 1.0 c(integer) 1redis> zrange mysortedset 0 11) "c"2) "a"redis> zrangebyscore mysortedset 1 61) "c"2) "a" 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 18
  • 19. Scaling Rediso  Master/slave replication n  Tree of Redis servers n  Non-persistent master can replicate to a persistent slave n  Use slaves for read-only querieso  Sharding n  Client-side only – consistent hashing based on key n  Server-side sharding – coming one dayo  Run multiple servers per physical host n  Server is single threaded => Leverage multiple CPUs n  32 bit more efficient than 64 bit 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 19
  • 20. Downsides of Rediso  Low-level API compared to SQLo  Single threaded: n  Multiple cores è multiple Redis serverso  Master/slave failover is manualo  Partitioning is done by the cliento  Dataset has to fit in memory
  • 21. Redis use caseso  Drop-in replacement for Memcached n  Session state n  Cache of data retrieved from SORo  Replica of SOR for queries needing high- performanceo  Miscellaneous yet important n  Counting using INCR command, e.g. hit counts n  Most recent N items - LPUSH and LTRIM n  Randomly selecting an item – SRANDMEMBER n  Queuing – Lists with LPOP, RPUSH, …. n  High score tables – Sorted sets and ZINCRBY n  …o  Notable users: github, guardian.co.uk, …. 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 21
  • 22. Cassandrao  An Apache open-source projecto  Developed by Facebook for inbox searcho  Column-oriented database/Extensible row store n  The data model will hurt your brain n  Row = map or map of mapso  Fast writes = append to a logo  Extremely scalable n  Transparent and dynamic clustering n  Rack and datacenter aware data replicationo  Tunable read/write consistency per operation n  Writes: any, one replica, quorum of replicas, …, all n  Read: one, quorum, …, allo  CQL = “SQL”-like DDL and DML 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 22
  • 23. Cassandra data model My Column family (within a key space) Keys Columns a colA: value1 colB: value2 colC: value3 b colA: value colD: value colE: value A column has a timestamp too  4-D map: keySpace x key x columnFamily x column è valueo  Arbitrary number of columnso  Column names are dynamic; can contain datao  Columns for a row are stored on disk in order determined by comparatoro  One CF row = one DDD aggregate 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 23
  • 24. Cassandra data model – insert/update My Column family (within a key space) Keys Columns a colA: value1 colB: value2 colC: value3 Transaction = updates to a row within a b colA: value colD: value colE: value ColumnFamily Insert(key=a, columName=colZ, value=foo) Idempotent Keys Columns a colA: value1 colB: value2 colC: value3 colZ: foo b colA: value colD: value colE: value 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 24
  • 25. Cassandra query example – sliceKey Columns s colA: colB: colC: colZ:a value1 value2 value3 foo colA: colD: colE:b value value value slice(key=a, startColumn=colA, endColumnName=colC)Key Columns You can also do a s rangeSlice which colA: colB:a value1 value2 returns a range of keys – less efficient 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 25
  • 26. Super Column Families – one moredimension My Column family (within a key space) Keys Super columns ScA ScB a colA: value1 colB: value2 colC: value3 b colA: value colD: value colE: value Insert(key=a, superColumn=scB, columName=colZ, value=foo) keySpace x key x columnFamily x superColumn x column -> value Keys Super columns ScA ScB a colA: value1 colB: value2 colC:colZ: foo value3 b colA: value colD: value colE: value 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 26
  • 27. Getting data with super slice My Column family (within a key space) Keys Super columns ScA ScB a colA: value1 colB: value2 colC: value3 b colA: value colD: value colE: value superSlice(key=a, startColumn=scB, endColumnName=scC) Keys Super columns ScB a colC: value3 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 27
  • 28. Cassandra CLI$ bin/cassandra-cli -h localhostConnected to: "Test Cluster" on localhost/9160Welcome to cassandra CLI.[default@unknown] use Keyspace1;Authenticated to keyspace: Keyspace1[default@Keyspace1] list restaurantDetails;Using default limit of 100-------------------RowKey: 1=> (super_column=attributes, (column=json, value={"id": 1,"name":"Ajanta","menuItems"....[default@Keyspace1] get restaurantDetails[1] [attributes’];=> (column=json, value={"id": 1,"name":"Ajanta","menuItems".... 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 28
  • 29. Scaling Cassandra • Client connects to any node • Dynamically add/remove nodes Keys = [D, A] Node 1 • Reads/Writes specify how many nodes • Configurable # of replicas Token = A •  adjacent nodes •  rack and data center aware replicates replicates Node 4 Node 2 Keys = [A, B] Token = D Token = B replicatesKeys = [C, D] replicates Replicates to Node 3 Token = C Keys = [B, C] 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 29
  • 30. Downsides of Cassandrao  Learning curveo  Still maturing, currently v0.8.4o  Limited queries, i.e. KV lookupo  Transactions limited to a column family rowo  Lacks an easy to use API 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 30
  • 31. Cassandra use caseso  Use cases •  Big data •  Multiple Data Center distributed database •  Persistent cache •  (Write intensive) Logging •  High-availability (writes)o  Who is using it n  Digg, Facebook, Twitter, Reddit, Rackspace n  Cloudkick, Cisco, SimpleGeo, Ooyala, OpenX n  The largest production cluster has over 100 TB of data in over 150 machines. – Casssandra web site 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 31
  • 32. MongoDBo  Document-oriented database n  JSON-style documents: Lists, Maps, primitives n  Documents organized into collections (~table) n  Schema-lesso  Rich query language for dynamic querieso  Asynchronous, configurable writes: n  No wait n  Wait for replication n  Wait for write to disko  Very fasto  Highly scalable and available: n  Replica sets (generalized master/slave) n  Sharding n  Transparent to client 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 32
  • 33. Data Model = Binary JSON documents{ "name" : "Sahn Maru", One document "type" : ”Korean", "serviceArea" : [ = "94619", "94618" one DDD aggregate ], "openingHours" : [ { DBObject o = new BasicDBObject(); "dayOfWeek" : "Wednesday", o.put("name", ”Sahn Maru"); "open" : 1730, "close" : 2230 DBObject mi = new BasicDBObject(); } mi.put("name", "Daeji Bulgogi"); ], … "_id" : ObjectId("4bddc2f49d1505567c6220a0") List<DBObject> mis = Collections.singletonList(mi);} o.put("menuItems", mis); o  Sequence of bytes on disk = fast I/O n  No joins/seeks n  In-place updates when possible è no index updates o  Transaction = update of single document 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 33
  • 34. MongoDB CLI$ bin/mongo> use mydb> r1 = {name: Ajanta}{name: Ajanta}> r2 = {name: Montclair Egg Shop}{name: Montclair Egg Shop}> db.restaurants.save(r1)> r1{ _id: ObjectId("98…"), name: "Ajanta"}> db.restaurants.save(r2)> r2{ _id: ObjectId("66…"), name: "Montclair Egg Shop"}> db.restaurants.find({name: /^A/}){ _id: ObjectId("98…"), name: "Ajanta"}> db.restaurants.update({name: "Ajanta"},{name: "Ajanta Restaurant"}) 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 34
  • 35. MongoDB query by example{ serviceArea:"94619", Find a openingHours: { $elemMatch : { restaurant "dayOfWeek" : "Monday", "open": {$lte: 1800}, that serves } "close": {$gte: 1800} the 94619 zip} } code and is open at 6pmDBCursor cursor = collection.find(qbeObject);while (cursor.hasNext()) { on a Monday DBObject o = cursor.next(); … } 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 35
  • 36. Scaling MongoDB Shard 1 Shard 2 Mongod Mongod (replica) (replica) Mongod Mongod (master) Mongod (master) Mongod (replica) (replica)ConfigServermongod A shard consists of a mongos replica set = generalization of master slavemongodmongod Collections spread over multiple client shards 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 36
  • 37. Mongo Downsideso  Server has a global write lock n  Single writer OR multiple readers è Long running queries blocks writerso  Great that writes are not synchronous n  BUT perhaps an asynchronous response would be better than a synchronous getLastError()Interesting story: http://www.slideshare.net/eonnen/from-100s-to-100s-of-millions
  • 38. MongoDB use caseso  Use cases n  High volume writes n  Complex data n  Semi-structured datao  Who is using it? n  Shutterfly, Foursquare n  Bit.ly Intuit n  SourceForge, NY Times n  GILT Groupe, Evite, n  SugarCRM 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 38
  • 39. Other NoSQL databasesType ExamplesExtensible columns/Column- Hbaseoriented SimpleDBGraph Neo4jKey-value MembaseDocument CouchDb http://nosql-database.org/ lists 122+ NoSQL databases 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 39
  • 40. Picking a databaseApplication requirement SolutionComplex transactions/ACID Relational databaseScaling NoSQLSocial data Graph databaseMultiple datacenters CassandraHighly-available writes CassandraFlexible data Document storeHigh write volumes Mongo, CassandraSuper fast cache RedisAdhoc queries Relational or Mongo… http://highscalability.com/blog/2011/6/20/35-use-cases-for-choosing-your-next-nosql-database.html 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 40
  • 41. Proceed with cautiono  Don’t commit to a NoSQL DB until you have done a significant POCo  Encapsulate your data access code so you can switcho  Hope that one day you won’t need ACID
  • 42. Agendao  Why NoSQL?o  Overview of NoSQL databaseso  Introduction to Spring Datao  Case study: POJOs in Action & NoSQL 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 42
  • 43. NoSQL Java APIsDatabase LibrariesRedis Jedis, JRedis, JDBC-Redis, RJCCassandra Raw Thrift if you are a masochist Hector, …MongoDB MongoDB provides a Java driver Some are not so easy to use Stylistic differences Boilerplate code … 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 43
  • 44. Spring Data Project GoalsBring classic Spring value propositions to a widerange of NoSQL databases è n  Productivity n  Programming model consistency: E.g. <NoSQL>Template classes n  “Portability”http://www.springsource.org/spring-data Slide 44
  • 45. Spring Data sub-projects§ Commons: Polyglot persistence§ Key-Value: Redis, Riak§ Document: MongoDB, CouchDB§ Graph: Neo4j§ GORM for NoSQL§ Various milestone releases § Redis 1.0.0.M4 (July 20th, 2011) § Document 1.0.0.M2 (April 9, 2011) § Graph - Neo4j Support 1.0.0 (April 19, 2011) § … 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 45
  • 46. MongoTemplate MongoTemplateSimplifies data databaseName POJO ó DBObjectaccess userId mapping PasswordTranslates defaultCollectionNameexceptions writeConcern writeResultChecking save() <<interface>> insert() remove() MongoConvertor updateFirst() write(Object, DBObject) findOne() read(Class, DBObject) find() … SimpleMongo uses Converter Mongo MongoMapping (Java Driver class) Converter Slide 46
  • 47. Richer mapping Annotations define mapping: @Document, @Id, @Indexed, @PersistanceConstructor,@Document @CompoundIndex, @DBRef,public class Person { @GeoSpatialIndexed, @Value @Id private ObjectId id; Map fields instead of properties private String firstname; è no getters or setters required @Indexed Non-default constructor private String lastname; Index generation @PersistenceConstructor public Person(String firstname, String lastname) { this.firstname = firstname; this.lastname = lastname; }….} Slide 47
  • 48. Generic Mongo Repositoriesinterface PersonRepository extends MongoRepository<Person, ObjectId> { List<Person> findByLastname(String lastName);}<bean> <mongo:repositories base-package="net.chrisrichardson.mongodb.example.mongorepository" mongo-template-ref="mongoTemplate" /></beans>Person p = new Person("John", "Doe");personRepository.save(p);Person p2 = personRepository.findOne(p.getId());List<Person> johnDoes = personRepository.findByLastname("Doe");assertEquals(1, johnDoes.size()); Slide 48
  • 49. Support for the QueryDSL project Generated from Type-safedomain model class composable queries QPerson person = QPerson.person; Predicate predicate = person.homeAddress.street1.eq("1 High Street") .and(person.firstname.eq("John")) List<Person> people = personRepository.findAll(predicate); assertEquals(1, people.size()); assertPersonEquals(p, people.get(0)); Slide 49
  • 50. Cross-store/polyglot persistence Person person = new Person(…);@Entity entityManager.persist(person);public class Person { // In Database Person p2 = entityManager.find(…) @Id private Long id; private String firstname; private String lastname;// In MongoDB@RelatedDocument private Address address; { "_id" : ObjectId(”….."), "_entity_id" : NumberLong(1), "_entity_class" : "net.. Person", "_entity_field_name" : "address", "zip" : "94611", "street1" : "1 High Street", …} Slide 50
  • 51. Agendao  Why NoSQL?o  Overview of NoSQL databaseso  Introduction to Spring Datao  Case study: POJOs in Action & NoSQL 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 51
  • 52. Food to Go – placing a takeout ordero  Customer enters delivery address and delivery timeo  System displays available restaurants = restaurants that serve the zip code of the delivery address AND are open at the delivery time class Restaurant { class TimeRange { long id; long id; String name; int dayOfWeek; Set<String> serviceArea; int openingTime; Set<TimeRange> openingHours; int closingTime; List<MenuItem> menuItems; } } class MenuItem { String name; double price; } 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 52
  • 53. Database schemaID Name … RESTAURANT1 Ajanta table2 Montclair EggshopRestaurant_id zipcode RESTAURANT_ZIPCODE1 94707 table1 946192 946112 94619 RESTAURANT_TIME_RANGE tableRestaurant_id dayOfWeek openTime closeTime1 Monday 1130 14301 Monday 1730 21302 Tuesday 1130 … 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 53
  • 54. Finding available restaurants on monday, 7.30pm for 94619 zipselect r.* Straightforwardfrom restaurant r three-way join inner join restaurant_time_range tr on r.id =tr.restaurant_id inner join restaurant_zipcode sa on r.id = sa.restaurant_idWhere ’94619’ = sa.zip_codeand tr.day_of_week=’monday’and tr.openingtime <= 1930and 1930 <=tr.closingtime 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 54
  • 55. Redis - Persisting restaurants is “easy”rest:1:details [ name: “Ajanta”, … ] Multiple KV valuerest:1:serviceArea [ “94619”, “94611”, …] pairsrest:1:openingHours [10, 11]timerange:10 [“dayOfWeek”: “Monday”, ..]timerange:11 [“dayOfWeek”: “Tuesday”, ..] Single KV hash ORrest:1 [ name: “Ajanta”, “serviceArea:0” : “94611”, “serviceArea:1” : “94619”, “menuItem:0:name”, “Chicken Vindaloo”, …] OR Single KV String rest:1 { .. A BIG STRING/BYTE ARRAY, E.G. JSON } 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 55
  • 56. BUT…o  … we can only retrieve them via primary keyè  We need to implement indexesè  Queries instead of data model drives NoSQL database designo  But how can a key-value store support a query that has ? n  A 3-way join n  Multiple = n  > and < 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 56
  • 57. Simplification #1: DenormalizationRestaurant_id Day_of_week Open_time Close_time Zip_code1 Monday 1130 1430 947071 Monday 1130 1430 946191 Monday 1730 2130 947071 Monday 1730 2130 946192 Monday 0700 1430 94619… SELECT restaurant_id, open_time FROM time_range_zip_code WHERE day_of_week = ‘Monday’ Simpler query: AND zip_code = 94619 §  No joins §  Two = and two < AND 1815 < close_time AND open_time < 1815 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 57
  • 58. Simplification #2: Application filteringSELECT restaurant_id, open_time FROM time_range_zip_code WHERE day_of_week = ‘Monday’ Even simple query AND zip_code = 94619 •  No joins AND 1815 < close_time •  Two = and one < AND open_time < 1815 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 58
  • 59. Simplification #3: Eliminate multiple =’s with concatenation Restaurant_id Zip_dow Open_time Close_time 1 94707:Monday 1130 1430 1 94619:Monday 1130 1430 1 94707:Monday 1730 2130 1 94619:Monday 1730 2130 2 94619:Monday 0700 1430 …SELECT … FROM time_range_zip_code WHERE zip_code_day_of_week = ‘94619:Monday’ AND 1815 < close_time key range 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 59
  • 60. Sorted sets support range queries Key Sorted Set [ Entry:Score, …] 94707:Monday [1130_1:1430, 1730_1:2130] 94619:Monday [0700_2:1430, 1130_1:1430, 1730_1:2130] zipCode:dayOfWeek Member: OpeningTime_RestaurantId Score: ClosingTime ZRANGEBYSCORE 94619:Monday 1815 2359 è {1730_1} 1730 is before 1815 è Ajanta is open 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 60
  • 61. What did I just do to query the data? 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 61
  • 62. What did I just do to query the data?o  Wrote code to maintain an indexo  Reduced performance due to extra writes 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 62
  • 63. RedisTemplate-based code@Repositorypublic class AvailableRestaurantRepositoryRedisImpl implements AvailableRestaurantRepository {@Autowired private final StringRedisTemplate redisTemplate;private BoundZSetOperations<String, String> closingTimes(int dayOfWeek, String zipCode) { return redisTemplate.boundZSetOps(AvailableRestaurantKeys.closingTimesKey(dayOfWeek, zipCode)); }public List<AvailableRestaurant> findAvailableRestaurants(Address deliveryAddress, Date deliveryTime) { String zipCode = deliveryAddress.getZip(); int timeOfDay = timeOfDay(deliveryTime); int dayOfWeek = dayOfWeek(deliveryTime); Set<String> closingTrs = closingTimes(dayOfWeek, zipCode).rangeByScore(timeOfDay, 2359); Set<String> restaurantIds = new HashSet<String>(); String paddedTimeOfDay = FormattingUtil.format4(timeOfDay); for (String trId : closingTrs) { if (trId.substring(0, 4).compareTo(paddedTimeOfDay) <= 0) restaurantIds.add(StringUtils.substringAfterLast(trId, "_")); } Collection<String> jsonForRestaurants = redisTemplate.opsForValue().multiGet(AvailableRestaurantKeys.timeRangeRestaurantInfoKeys(restaurantIds )); List<AvailableRestaurant> restaurants = new ArrayList<AvailableRestaurant>(); for (String json : jsonForRestaurants) { restaurants.add(AvailableRestaurant.fromJson(json)); } return restaurants; } 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 63
  • 64. Redis – Spring configuration@Configurationpublic class RedisConfiguration extends AbstractDatabaseConfig { @Bean public RedisConnectionFactory jedisConnectionFactory() { JedisConnectionFactory factory = new JedisConnectionFactory(); factory.setHostName(databaseHostName); factory.setPort(6379); factory.setUsePool(true); JedisPoolConfig poolConfig = new JedisPoolConfig(); poolConfig.setMaxActive(1000); factory.setPoolConfig(poolConfig); return factory; } @Bean public StringRedisTemplate stringRedisTemplate(RedisConnectionFactory factory) { StringRedisTemplate template = new StringRedisTemplate(); template.setConnectionFactory(factory); return template; }} 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 64
  • 65. Cassandra: Easy to storerestaurants Column Family: RestaurantDetails Keys Columns 1 name: Ajanta type: Indian … name: Montclair 2 type: Breakfast … Egg Shop OR Column Family: RestaurantDetails Keys Columns 1 details: { JSON DOCUMENT } 2 details: { JSON DOCUMENT } 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 65
  • 66. Querying using Cassandrao  Similar challenges to using Rediso  Limited querying options n  Row key – exact or range n  Column name – exact or rangeo  Use composite/concatenated keys n  Prefix - equality match n  Suffix - can be range scano  No joins è denormalize 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 66
  • 67. Cassandra: Find restaurants that close after the deliverytime and then filter Keys Super Columns 1430 1430 213094619:Mon 1130_1: JSON FOR 1730_1: JSON FOR 0700_2: JSON FOR EGG AJANTA AJANTA SuperSlice key= 94619:Mon SliceStart = 1815 SliceEnd = 2359 Keys Super Columns 213094619:Mon 1730_1: JSON FOR AJANTA 18:15 is after 17:30 => {Ajanta} 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 67
  • 68. Cassandra/Hector codeimport me.prettyprint.hector.api.Cluster;public class CassandraHelper { @Autowired private final Cluster cluster; public <T> List<T> getSuperSlice(String keyspace, String columnFamily, String key, String sliceStart, String sliceEnd, SuperSliceResultMapper<T> resultMapper) { SuperSliceQuery<String, String, String, String> q = HFactory.createSuperSliceQuery(HFactory.createKeyspace(keyspace, cluster), StringSerializer.get(), StringSerializer.get(), StringSerializer.get(), StringSerializer.get()); q.setColumnFamily(columnFamily); q.setKey(key); q.setRange(sliceStart, sliceEnd, false, 10000); QueryResult<SuperSlice<String, String, String>> qr = q.execute(); SuperColumnRowProcessor<T> rowProcessor = new SuperColumnRowProcessor<T>(resultMapper); for (HSuperColumn<String, String, String> superColumn : qr.get().getSuperColumns()) { List<HColumn<String, String>> columns = superColumn.getColumns(); rowProcessor.processRow(key, superColumn.getName(), columns); } return rowProcessor.getResult(); }} 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 68
  • 69. MongoDB = easy to store{ "_id": "1234" "name": "Ajanta", "serviceArea": ["94619", "99999"], "openingHours": [ { "dayOfWeek": 1, "open": 1130, "close": 1430 }, { "dayOfWeek": 2, "open": 1130, "close": 1430 }, … ]} 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 69
  • 70. MongoDB = easy to query{ "serviceArea": "94619", "openingHours": { "$elemMatch": { "open": { "$lte": 1815}, "dayOfWeek": 4, "close": { $gte": 1815} } } db.availableRestaurants.ensureIndex({serviceArea: 1}) 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 70
  • 71. MongoTemplate-based code@Repositorypublic class AvailableRestaurantRepositoryMongoDbImpl implements AvailableRestaurantRepository {@Autowired private final MongoTemplate mongoTemplate;@Autowired @Overridepublic List<AvailableRestaurant> findAvailableRestaurants(Address deliveryAddress, Date deliveryTime) { int timeOfDay = DateTimeUtil.timeOfDay(deliveryTime); int dayOfWeek = DateTimeUtil.dayOfWeek(deliveryTime);Query query = new Query(where("serviceArea").is(deliveryAddress.getZip()) .and("openingHours”).elemMatch(where("dayOfWeek").is(dayOfWeek) .and("openingTime").lte(timeOfDay) .and("closingTime").gte(timeOfDay))); return mongoTemplate.find(AVAILABLE_RESTAURANTS_COLLECTION, query, AvailableRestaurant.class);} mongoTemplate.ensureIndex(“availableRestaurants”, new Index().on("serviceArea", Order.ASCENDING)); 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 71
  • 72. MongoDB – Spring Configuration@Configurationpublic class MongoConfig extends AbstractDatabaseConfig { private @Value("#{mongoDbProperties.databaseName}") String mongoDbDatabase; public @Bean MongoFactoryBean mongo() { MongoFactoryBean factory = new MongoFactoryBean(); factory.setHost(databaseHostName); MongoOptions options = new MongoOptions(); options.connectionsPerHost = 500; factory.setMongoOptions(options); return factory; } public @Bean MongoTemplate mongoTemplate(Mongo mongo) throws Exception { MongoTemplate mongoTemplate = new MongoTemplate(mongo, mongoDbDatabase); mongoTemplate.setWriteConcern(WriteConcern.SAFE); mongoTemplate.setWriteResultChecking(WriteResultChecking.EXCEPTION); return mongoTemplate; }} 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 72
  • 73. Summaryo  Relational databases are great but n  Object/relational impedance mismatch n  Relational schema is rigid n  Extremely difficult/impossible to scale writes n  Performance can be suboptimalo  Each NoSQL databases can solve some combination of those problems BUT n  Limited transactions n  One day needing ACID è major rewrite n  Query-driven, denormalized database design n  … èo  Carefully pick the NoSQL DB for your applicationo  Consider a polyglot persistence architecture 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 74
  • 74. Thank you! My contact info: chris@chrisrichardson.net @crichardson 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 75