Polyglot persistence for Java developers - moving out of the     relational comfort zoneChris RichardsonAuthor of POJOs in...
Overall presentation goalThe joy and pain of   building Java  applications that     use NoSQL    8/19/11   Copyright (c) 2...
About Chris              •           Grew up in England and live in Oakland,                          CA              •   ...
Agenda          o  Why NoSQL?          o  Overview of NoSQL databases          o  Introduction to Spring Data          o  ...
Relational databases are greato  SQL = Rich, declarative query languageo  Database enforces referential integrityo  ACID s...
Problem: Complex object graphso  Object/relational   impedance   mismatcho  Complicated to   map rich domain   model to re...
Problem: Semi-structured datao  Relational schema doesn’t easily handle   semi-structured data:  n  Varying attributes  n...
Problem: Schema evolutiono  For example:  n  Add attributes to an object è add      columns to tableo  Schema changes = ...
Problem: Scalingo  Scaling reads:  n  Master/slave  n  But beware of consistency issueso  Scaling writes  n  Extremely ...
Solution: Buy high end technology   http://upload.wikimedia.org/wikipedia/commons/e/e5/Rising_Sun_Yacht.JPG
Solution: Hire more developerso  Application-level shardingo  Build your own middlewareo  …http://www.trekbikes.com/us/en/...
Solution: Use NewSQLo  Led by Stonebraker  n  Current databases are designed for 1970s      hardware and for both OLTP an...
NoSQL databases are emerging…Each one offerssome combinationof:o  Higher performanceo  Higher scalabilityo  Richer data-mo...
… but there are few commonalitieso  Everyone and their dog has written   oneo  Different data models  n    Key-value     ...
Future = multi-paradigm data storagefor enterprise applications       IEEE Software Sept/October 2010 - Debasish Ghosh / T...
Agendao  Why NoSQL?o  Overview of NoSQL databaseso  Introduction to Spring Datao  Case study: POJOs in Action & NoSQL     ...
Rediso  Advanced key-value store  n  Values can be binary strings, Lists, Sets,      Sorted Sets, Hashes, …  n  Data-typ...
Redis CLI                         Sorted set member = value + scoreredis> zadd mysortedset 5.0 a(integer) 1redis> zadd mys...
Scaling Rediso  Master/slave replication  n  Tree of Redis servers  n  Non-persistent master can replicate to a      per...
Downsides of Rediso  Low-level API compared to SQLo  Single threaded:  n  Multiple cores è multiple Redis serverso  Mast...
Redis use caseso  Drop-in replacement for Memcached  n  Session state  n  Cache of data retrieved from SORo  Replica of ...
Cassandrao  An Apache open-source projecto  Developed by Facebook for inbox searcho  Column-oriented database/Extensible r...
Cassandra data model                              My Column family (within a key space)   Keys     Columns   a        colA...
Cassandra data model – insert/update                             My Column family (within a key space)  Keys     Columns  ...
Cassandra query example – sliceKey   Columns  s      colA:             colB:                                colC:         ...
Super Column Families – one moredimension                              My Column family (within a key space)  Keys      Su...
Getting data with super slice                                                            My Column family (within a key sp...
Cassandra CLI$ bin/cassandra-cli -h localhostConnected to: "Test Cluster" on localhost/9160Welcome to cassandra CLI.[defau...
Scaling Cassandra                                                                                               • Client c...
Downsides of Cassandrao  Learning curveo  Still maturing, currently v0.8.4o  Limited queries, i.e. KV lookupo  Transaction...
Cassandra use caseso  Use cases  •    Big data  •    Multiple Data Center distributed database  •    Persistent cache  •  ...
MongoDBo  Document-oriented database   n  JSON-style documents: Lists, Maps, primitives   n  Documents organized into co...
Data Model = Binary JSON documents{    "name" : "Sahn Maru",                                                              ...
MongoDB CLI$ bin/mongo> use mydb> r1 = {name: Ajanta}{name: Ajanta}> r2 = {name: Montclair Egg Shop}{name: Montclair Egg S...
MongoDB query by example{    serviceArea:"94619",                                                                Find a   ...
Scaling MongoDB                         Shard 1                                                             Shard 2       ...
Mongo Downsideso  Server has a global write lock    n  Single writer OR multiple readers        è Long running queries b...
MongoDB use caseso  Use cases  n  High volume writes  n  Complex data  n  Semi-structured datao  Who is using it?  n  ...
Other NoSQL databasesType                                                     ExamplesExtensible columns/Column-          ...
Picking a databaseApplication requirement                                                     SolutionComplex transactions...
Proceed with cautiono  Don’t commit to a   NoSQL DB until you   have done a   significant POCo  Encapsulate your data   ac...
Agendao  Why NoSQL?o  Overview of NoSQL databaseso  Introduction to Spring Datao  Case study: POJOs in Action & NoSQL     ...
NoSQL Java APIsDatabase              LibrariesRedis                 Jedis, JRedis, JDBC-Redis, RJCCassandra             Ra...
Spring Data Project GoalsBring classic Spring value propositions to a widerange of NoSQL databases                        ...
Spring Data sub-projects§ Commons: Polyglot persistence§ Key-Value: Redis, Riak§ Document: MongoDB, CouchDB§ Graph: Ne...
MongoTemplate                        MongoTemplateSimplifies data   databaseName                                          ...
Richer mapping                                                      Annotations define mapping:                           ...
Generic Mongo Repositoriesinterface PersonRepository extends MongoRepository<Person, ObjectId> {   List<Person> findByLast...
Support for the QueryDSL project   Generated from                           Type-safedomain model class                  c...
Cross-store/polyglot persistence                                Person person = new Person(…);@Entity                     ...
Agendao  Why NoSQL?o  Overview of NoSQL databaseso  Introduction to Spring Datao  Case study: POJOs in Action &   NoSQL   ...
Food to Go – placing a takeout  ordero  Customer enters delivery address and delivery timeo  System displays available res...
Database schemaID                    Name                                                  …                              ...
Finding available restaurants on monday, 7.30pm for 94619 zipselect r.*             Straightforwardfrom restaurant r      ...
Redis - Persisting restaurants is    “easy”rest:1:details           [ name: “Ajanta”, … ]                                 ...
BUT…o  … we can only retrieve them via primary keyè  We need to implement indexesè  Queries instead of data model drives...
Simplification #1: DenormalizationRestaurant_id   Day_of_week     Open_time                           Close_time          ...
Simplification #2: Application filteringSELECT restaurant_id, open_time FROM time_range_zip_code WHERE day_of_week = ‘Mond...
Simplification #3: Eliminate multiple =’s with concatenation Restaurant_id    Zip_dow                        Open_time    ...
Sorted sets support range queries Key                                    Sorted Set [ Entry:Score, …] 94707:Monday        ...
What did I just do to query the data?     8/19/11   Copyright (c) 2011 Chris Richardson. All rights reserved.             ...
What did I just do to query the data?o  Wrote code to maintain an indexo  Reduced performance due to extra   writes     8/...
RedisTemplate-based code@Repositorypublic class AvailableRestaurantRepositoryRedisImpl implements AvailableRestaurantRepos...
Redis – Spring configuration@Configurationpublic class RedisConfiguration extends AbstractDatabaseConfig {    @Bean    pub...
Cassandra: Easy to storerestaurants                                                     Column Family: RestaurantDetails  ...
Querying using Cassandrao  Similar challenges to using Rediso  Limited querying options  n  Row key – exact or range  n ...
Cassandra: Find restaurants that close after the deliverytime and then filter       Keys          Super Columns           ...
Cassandra/Hector codeimport me.prettyprint.hector.api.Cluster;public class CassandraHelper {  @Autowired private final Clu...
MongoDB = easy to store{    "_id": "1234"    "name": "Ajanta",    "serviceArea": ["94619", "99999"],    "openingHours": [ ...
MongoDB = easy to query{    "serviceArea": "94619",    "openingHours": {       "$elemMatch": {          "open": { "$lte": ...
MongoTemplate-based code@Repositorypublic class AvailableRestaurantRepositoryMongoDbImpl                               imp...
MongoDB – Spring Configuration@Configurationpublic class MongoConfig extends AbstractDatabaseConfig { private @Value("#{mo...
Summaryo  Relational databases are great but   n    Object/relational impedance mismatch   n    Relational schema is rig...
Thank you!                                               My contact info:                                               ch...
Upcoming SlideShare
Loading in...5
×

Polygot persistence for Java Developers - August 2011 / @Oakjug

2,270

Published on

Relational databases have long been considered the one true way to persist enterprise data. But today, NoSQL databases are emerging as a viable alternative for many applications. They can simplify the persistence of complex data models and offer significantly better scalability, and performance. But NoSQL databases are very different than the ACID/SQL/JDBC/JPA world that we have become accustomed to. In this presentation, you will learn about our experience implementing a use case from POJOs in Action using popular NoSQL databases: Redis, MongoDB, and Cassandra. We will compare and contrast each database’s data model and Java API. You will learn about the benefits and drawbacks of using NoSQL.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,270
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Polygot persistence for Java Developers - August 2011 / @Oakjug"

  1. 1. Polyglot persistence for Java developers - moving out of the relational comfort zoneChris RichardsonAuthor of POJOs in ActionFounder of CloudFoundry.comchris@chrisrichardson.net@crichardson
  2. 2. Overall presentation goalThe joy and pain of building Java applications that use NoSQL 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 2
  3. 3. About Chris •  Grew up in England and live in Oakland, CA •  Over 25+ years of software development experience including 14+ years of Java •  Speaker at JavaOne, SpringOne, PhillyETE, Devoxx, etc. •  Organize the Oakland JUG and the Groovy Grails meetup http://www.theregister.co.uk/2009/08/19/springsource_cloud_foundry/ 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 3
  4. 4. Agenda o  Why NoSQL? o  Overview of NoSQL databases o  Introduction to Spring Data o  Case study: POJOs in Action & NoSQL 8/19/11Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 4
  5. 5. Relational databases are greato  SQL = Rich, declarative query languageo  Database enforces referential integrityo  ACID semanticso  Well understood by developerso  Well supported by frameworks and tools, e.g. Spring JDBC, Hibernate, JPAo  Well understood by operations n  Configuration n  Care and feeding n  Backups n  Tuning n  Failure and recovery n  Performance characteristicso  But…. 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 5
  6. 6. Problem: Complex object graphso  Object/relational impedance mismatcho  Complicated to map rich domain model to relational schemao  Performance issues n  Many rows in many tables n  Many joins
  7. 7. Problem: Semi-structured datao  Relational schema doesn’t easily handle semi-structured data: n  Varying attributes n  Custom attributes on a customer recordo  Common solution = Name/value table n  Poor performance n  E.g. Finding specific attributes for customers satisfying some criteria = multi-way outer JOIN n  Lack of constraintso  Another solution = Serialize as blob n  Fewer joins n  BUT can’t be queried
  8. 8. Problem: Schema evolutiono  For example: n  Add attributes to an object è add columns to tableo  Schema changes = n  Holding locks for a long time è application downtime n  $$
  9. 9. Problem: Scalingo  Scaling reads: n  Master/slave n  But beware of consistency issueso  Scaling writes n  Extremely difficult/impossible/expensive n  Vertical scaling is limited and requires $$ n  Horizontal scaling is limited/requires $$
  10. 10. Solution: Buy high end technology http://upload.wikimedia.org/wikipedia/commons/e/e5/Rising_Sun_Yacht.JPG
  11. 11. Solution: Hire more developerso  Application-level shardingo  Build your own middlewareo  …http://www.trekbikes.com/us/en/bikes/road/race_performance/madone_4_series/madone_4_5
  12. 12. Solution: Use NewSQLo  Led by Stonebraker n  Current databases are designed for 1970s hardware and for both OLTP and data warehouses n  http://www.slideshare.net/VoltDB/sql- myths-webinaro  NewSQL n  Next generation SQL databases, e.g. VoltDB n  Leverage multi-core, commodity hardware n  In-memory n  Horizontally scalable n  Transparently shardable n  ACID
  13. 13. NoSQL databases are emerging…Each one offerssome combinationof:o  Higher performanceo  Higher scalabilityo  Richer data-modelo  Schema-lessIn return for:o  Limited transactionso  Relaxed consistencyo  Unconstrained datao  … 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 13
  14. 14. … but there are few commonalitieso  Everyone and their dog has written oneo  Different data models n  Key-value “Same sorry state as the database market in the 1970s before SQL was n  Column invented” http://queue.acm.org/detail.cfm? n  Document id=1961297 n  Grapho  Different APIso  No JDBC, Hibernate, JPA (generally) 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 14
  15. 15. Future = multi-paradigm data storagefor enterprise applications IEEE Software Sept/October 2010 - Debasish Ghosh / Twitter @debasishg 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 15
  16. 16. Agendao  Why NoSQL?o  Overview of NoSQL databaseso  Introduction to Spring Datao  Case study: POJOs in Action & NoSQL 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 16
  17. 17. Rediso  Advanced key-value store n  Values can be binary strings, Lists, Sets, Sorted Sets, Hashes, … n  Data-type specific operationso  Very fast n  ~100K operations/second on entry-level hardware n  In-memory operations K1 V1o  Persistent K2 V2 n  Periodic snapshots of memory OR K3 V2 append commands to log fileo  Transactions within a single server n  Atomic execution of batched commands n  Optimistic locking 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 17
  18. 18. Redis CLI Sorted set member = value + scoreredis> zadd mysortedset 5.0 a(integer) 1redis> zadd mysortedset 10.0 b(integer) 1redis> zadd mysortedset 1.0 c(integer) 1redis> zrange mysortedset 0 11) "c"2) "a"redis> zrangebyscore mysortedset 1 61) "c"2) "a" 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 18
  19. 19. Scaling Rediso  Master/slave replication n  Tree of Redis servers n  Non-persistent master can replicate to a persistent slave n  Use slaves for read-only querieso  Sharding n  Client-side only – consistent hashing based on key n  Server-side sharding – coming one dayo  Run multiple servers per physical host n  Server is single threaded => Leverage multiple CPUs n  32 bit more efficient than 64 bit 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 19
  20. 20. Downsides of Rediso  Low-level API compared to SQLo  Single threaded: n  Multiple cores è multiple Redis serverso  Master/slave failover is manualo  Partitioning is done by the cliento  Dataset has to fit in memory
  21. 21. Redis use caseso  Drop-in replacement for Memcached n  Session state n  Cache of data retrieved from SORo  Replica of SOR for queries needing high- performanceo  Miscellaneous yet important n  Counting using INCR command, e.g. hit counts n  Most recent N items - LPUSH and LTRIM n  Randomly selecting an item – SRANDMEMBER n  Queuing – Lists with LPOP, RPUSH, …. n  High score tables – Sorted sets and ZINCRBY n  …o  Notable users: github, guardian.co.uk, …. 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 21
  22. 22. Cassandrao  An Apache open-source projecto  Developed by Facebook for inbox searcho  Column-oriented database/Extensible row store n  The data model will hurt your brain n  Row = map or map of mapso  Fast writes = append to a logo  Extremely scalable n  Transparent and dynamic clustering n  Rack and datacenter aware data replicationo  Tunable read/write consistency per operation n  Writes: any, one replica, quorum of replicas, …, all n  Read: one, quorum, …, allo  CQL = “SQL”-like DDL and DML 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 22
  23. 23. Cassandra data model My Column family (within a key space) Keys Columns a colA: value1 colB: value2 colC: value3 b colA: value colD: value colE: value A column has a timestamp too  4-D map: keySpace x key x columnFamily x column è valueo  Arbitrary number of columnso  Column names are dynamic; can contain datao  Columns for a row are stored on disk in order determined by comparatoro  One CF row = one DDD aggregate 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 23
  24. 24. Cassandra data model – insert/update My Column family (within a key space) Keys Columns a colA: value1 colB: value2 colC: value3 Transaction = updates to a row within a b colA: value colD: value colE: value ColumnFamily Insert(key=a, columName=colZ, value=foo) Idempotent Keys Columns a colA: value1 colB: value2 colC: value3 colZ: foo b colA: value colD: value colE: value 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 24
  25. 25. Cassandra query example – sliceKey Columns s colA: colB: colC: colZ:a value1 value2 value3 foo colA: colD: colE:b value value value slice(key=a, startColumn=colA, endColumnName=colC)Key Columns You can also do a s rangeSlice which colA: colB:a value1 value2 returns a range of keys – less efficient 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 25
  26. 26. Super Column Families – one moredimension My Column family (within a key space) Keys Super columns ScA ScB a colA: value1 colB: value2 colC: value3 b colA: value colD: value colE: value Insert(key=a, superColumn=scB, columName=colZ, value=foo) keySpace x key x columnFamily x superColumn x column -> value Keys Super columns ScA ScB a colA: value1 colB: value2 colC:colZ: foo value3 b colA: value colD: value colE: value 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 26
  27. 27. Getting data with super slice My Column family (within a key space) Keys Super columns ScA ScB a colA: value1 colB: value2 colC: value3 b colA: value colD: value colE: value superSlice(key=a, startColumn=scB, endColumnName=scC) Keys Super columns ScB a colC: value3 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 27
  28. 28. Cassandra CLI$ bin/cassandra-cli -h localhostConnected to: "Test Cluster" on localhost/9160Welcome to cassandra CLI.[default@unknown] use Keyspace1;Authenticated to keyspace: Keyspace1[default@Keyspace1] list restaurantDetails;Using default limit of 100-------------------RowKey: 1=> (super_column=attributes, (column=json, value={"id": 1,"name":"Ajanta","menuItems"....[default@Keyspace1] get restaurantDetails[1] [attributes’];=> (column=json, value={"id": 1,"name":"Ajanta","menuItems".... 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 28
  29. 29. Scaling Cassandra • Client connects to any node • Dynamically add/remove nodes Keys = [D, A] Node 1 • Reads/Writes specify how many nodes • Configurable # of replicas Token = A •  adjacent nodes •  rack and data center aware replicates replicates Node 4 Node 2 Keys = [A, B] Token = D Token = B replicatesKeys = [C, D] replicates Replicates to Node 3 Token = C Keys = [B, C] 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 29
  30. 30. Downsides of Cassandrao  Learning curveo  Still maturing, currently v0.8.4o  Limited queries, i.e. KV lookupo  Transactions limited to a column family rowo  Lacks an easy to use API 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 30
  31. 31. Cassandra use caseso  Use cases •  Big data •  Multiple Data Center distributed database •  Persistent cache •  (Write intensive) Logging •  High-availability (writes)o  Who is using it n  Digg, Facebook, Twitter, Reddit, Rackspace n  Cloudkick, Cisco, SimpleGeo, Ooyala, OpenX n  The largest production cluster has over 100 TB of data in over 150 machines. – Casssandra web site 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 31
  32. 32. MongoDBo  Document-oriented database n  JSON-style documents: Lists, Maps, primitives n  Documents organized into collections (~table) n  Schema-lesso  Rich query language for dynamic querieso  Asynchronous, configurable writes: n  No wait n  Wait for replication n  Wait for write to disko  Very fasto  Highly scalable and available: n  Replica sets (generalized master/slave) n  Sharding n  Transparent to client 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 32
  33. 33. Data Model = Binary JSON documents{ "name" : "Sahn Maru", One document "type" : ”Korean", "serviceArea" : [ = "94619", "94618" one DDD aggregate ], "openingHours" : [ { DBObject o = new BasicDBObject(); "dayOfWeek" : "Wednesday", o.put("name", ”Sahn Maru"); "open" : 1730, "close" : 2230 DBObject mi = new BasicDBObject(); } mi.put("name", "Daeji Bulgogi"); ], … "_id" : ObjectId("4bddc2f49d1505567c6220a0") List<DBObject> mis = Collections.singletonList(mi);} o.put("menuItems", mis); o  Sequence of bytes on disk = fast I/O n  No joins/seeks n  In-place updates when possible è no index updates o  Transaction = update of single document 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 33
  34. 34. MongoDB CLI$ bin/mongo> use mydb> r1 = {name: Ajanta}{name: Ajanta}> r2 = {name: Montclair Egg Shop}{name: Montclair Egg Shop}> db.restaurants.save(r1)> r1{ _id: ObjectId("98…"), name: "Ajanta"}> db.restaurants.save(r2)> r2{ _id: ObjectId("66…"), name: "Montclair Egg Shop"}> db.restaurants.find({name: /^A/}){ _id: ObjectId("98…"), name: "Ajanta"}> db.restaurants.update({name: "Ajanta"},{name: "Ajanta Restaurant"}) 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 34
  35. 35. MongoDB query by example{ serviceArea:"94619", Find a openingHours: { $elemMatch : { restaurant "dayOfWeek" : "Monday", "open": {$lte: 1800}, that serves } "close": {$gte: 1800} the 94619 zip} } code and is open at 6pmDBCursor cursor = collection.find(qbeObject);while (cursor.hasNext()) { on a Monday DBObject o = cursor.next(); … } 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 35
  36. 36. Scaling MongoDB Shard 1 Shard 2 Mongod Mongod (replica) (replica) Mongod Mongod (master) Mongod (master) Mongod (replica) (replica)ConfigServermongod A shard consists of a mongos replica set = generalization of master slavemongodmongod Collections spread over multiple client shards 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 36
  37. 37. Mongo Downsideso  Server has a global write lock n  Single writer OR multiple readers è Long running queries blocks writerso  Great that writes are not synchronous n  BUT perhaps an asynchronous response would be better than a synchronous getLastError()Interesting story: http://www.slideshare.net/eonnen/from-100s-to-100s-of-millions
  38. 38. MongoDB use caseso  Use cases n  High volume writes n  Complex data n  Semi-structured datao  Who is using it? n  Shutterfly, Foursquare n  Bit.ly Intuit n  SourceForge, NY Times n  GILT Groupe, Evite, n  SugarCRM 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 38
  39. 39. Other NoSQL databasesType ExamplesExtensible columns/Column- Hbaseoriented SimpleDBGraph Neo4jKey-value MembaseDocument CouchDb http://nosql-database.org/ lists 122+ NoSQL databases 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 39
  40. 40. Picking a databaseApplication requirement SolutionComplex transactions/ACID Relational databaseScaling NoSQLSocial data Graph databaseMultiple datacenters CassandraHighly-available writes CassandraFlexible data Document storeHigh write volumes Mongo, CassandraSuper fast cache RedisAdhoc queries Relational or Mongo… http://highscalability.com/blog/2011/6/20/35-use-cases-for-choosing-your-next-nosql-database.html 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 40
  41. 41. Proceed with cautiono  Don’t commit to a NoSQL DB until you have done a significant POCo  Encapsulate your data access code so you can switcho  Hope that one day you won’t need ACID
  42. 42. Agendao  Why NoSQL?o  Overview of NoSQL databaseso  Introduction to Spring Datao  Case study: POJOs in Action & NoSQL 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 42
  43. 43. NoSQL Java APIsDatabase LibrariesRedis Jedis, JRedis, JDBC-Redis, RJCCassandra Raw Thrift if you are a masochist Hector, …MongoDB MongoDB provides a Java driver Some are not so easy to use Stylistic differences Boilerplate code … 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 43
  44. 44. Spring Data Project GoalsBring classic Spring value propositions to a widerange of NoSQL databases è n  Productivity n  Programming model consistency: E.g. <NoSQL>Template classes n  “Portability”http://www.springsource.org/spring-data Slide 44
  45. 45. Spring Data sub-projects§ Commons: Polyglot persistence§ Key-Value: Redis, Riak§ Document: MongoDB, CouchDB§ Graph: Neo4j§ GORM for NoSQL§ Various milestone releases § Redis 1.0.0.M4 (July 20th, 2011) § Document 1.0.0.M2 (April 9, 2011) § Graph - Neo4j Support 1.0.0 (April 19, 2011) § … 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 45
  46. 46. MongoTemplate MongoTemplateSimplifies data databaseName POJO ó DBObjectaccess userId mapping PasswordTranslates defaultCollectionNameexceptions writeConcern writeResultChecking save() <<interface>> insert() remove() MongoConvertor updateFirst() write(Object, DBObject) findOne() read(Class, DBObject) find() … SimpleMongo uses Converter Mongo MongoMapping (Java Driver class) Converter Slide 46
  47. 47. Richer mapping Annotations define mapping: @Document, @Id, @Indexed, @PersistanceConstructor,@Document @CompoundIndex, @DBRef,public class Person { @GeoSpatialIndexed, @Value @Id private ObjectId id; Map fields instead of properties private String firstname; è no getters or setters required @Indexed Non-default constructor private String lastname; Index generation @PersistenceConstructor public Person(String firstname, String lastname) { this.firstname = firstname; this.lastname = lastname; }….} Slide 47
  48. 48. Generic Mongo Repositoriesinterface PersonRepository extends MongoRepository<Person, ObjectId> { List<Person> findByLastname(String lastName);}<bean> <mongo:repositories base-package="net.chrisrichardson.mongodb.example.mongorepository" mongo-template-ref="mongoTemplate" /></beans>Person p = new Person("John", "Doe");personRepository.save(p);Person p2 = personRepository.findOne(p.getId());List<Person> johnDoes = personRepository.findByLastname("Doe");assertEquals(1, johnDoes.size()); Slide 48
  49. 49. Support for the QueryDSL project Generated from Type-safedomain model class composable queries QPerson person = QPerson.person; Predicate predicate = person.homeAddress.street1.eq("1 High Street") .and(person.firstname.eq("John")) List<Person> people = personRepository.findAll(predicate); assertEquals(1, people.size()); assertPersonEquals(p, people.get(0)); Slide 49
  50. 50. Cross-store/polyglot persistence Person person = new Person(…);@Entity entityManager.persist(person);public class Person { // In Database Person p2 = entityManager.find(…) @Id private Long id; private String firstname; private String lastname;// In MongoDB@RelatedDocument private Address address; { "_id" : ObjectId(”….."), "_entity_id" : NumberLong(1), "_entity_class" : "net.. Person", "_entity_field_name" : "address", "zip" : "94611", "street1" : "1 High Street", …} Slide 50
  51. 51. Agendao  Why NoSQL?o  Overview of NoSQL databaseso  Introduction to Spring Datao  Case study: POJOs in Action & NoSQL 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 51
  52. 52. Food to Go – placing a takeout ordero  Customer enters delivery address and delivery timeo  System displays available restaurants = restaurants that serve the zip code of the delivery address AND are open at the delivery time class Restaurant { class TimeRange { long id; long id; String name; int dayOfWeek; Set<String> serviceArea; int openingTime; Set<TimeRange> openingHours; int closingTime; List<MenuItem> menuItems; } } class MenuItem { String name; double price; } 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 52
  53. 53. Database schemaID Name … RESTAURANT1 Ajanta table2 Montclair EggshopRestaurant_id zipcode RESTAURANT_ZIPCODE1 94707 table1 946192 946112 94619 RESTAURANT_TIME_RANGE tableRestaurant_id dayOfWeek openTime closeTime1 Monday 1130 14301 Monday 1730 21302 Tuesday 1130 … 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 53
  54. 54. Finding available restaurants on monday, 7.30pm for 94619 zipselect r.* Straightforwardfrom restaurant r three-way join inner join restaurant_time_range tr on r.id =tr.restaurant_id inner join restaurant_zipcode sa on r.id = sa.restaurant_idWhere ’94619’ = sa.zip_codeand tr.day_of_week=’monday’and tr.openingtime <= 1930and 1930 <=tr.closingtime 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 54
  55. 55. Redis - Persisting restaurants is “easy”rest:1:details [ name: “Ajanta”, … ] Multiple KV valuerest:1:serviceArea [ “94619”, “94611”, …] pairsrest:1:openingHours [10, 11]timerange:10 [“dayOfWeek”: “Monday”, ..]timerange:11 [“dayOfWeek”: “Tuesday”, ..] Single KV hash ORrest:1 [ name: “Ajanta”, “serviceArea:0” : “94611”, “serviceArea:1” : “94619”, “menuItem:0:name”, “Chicken Vindaloo”, …] OR Single KV String rest:1 { .. A BIG STRING/BYTE ARRAY, E.G. JSON } 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 55
  56. 56. BUT…o  … we can only retrieve them via primary keyè  We need to implement indexesè  Queries instead of data model drives NoSQL database designo  But how can a key-value store support a query that has ? n  A 3-way join n  Multiple = n  > and < 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 56
  57. 57. Simplification #1: DenormalizationRestaurant_id Day_of_week Open_time Close_time Zip_code1 Monday 1130 1430 947071 Monday 1130 1430 946191 Monday 1730 2130 947071 Monday 1730 2130 946192 Monday 0700 1430 94619… SELECT restaurant_id, open_time FROM time_range_zip_code WHERE day_of_week = ‘Monday’ Simpler query: AND zip_code = 94619 §  No joins §  Two = and two < AND 1815 < close_time AND open_time < 1815 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 57
  58. 58. Simplification #2: Application filteringSELECT restaurant_id, open_time FROM time_range_zip_code WHERE day_of_week = ‘Monday’ Even simple query AND zip_code = 94619 •  No joins AND 1815 < close_time •  Two = and one < AND open_time < 1815 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 58
  59. 59. Simplification #3: Eliminate multiple =’s with concatenation Restaurant_id Zip_dow Open_time Close_time 1 94707:Monday 1130 1430 1 94619:Monday 1130 1430 1 94707:Monday 1730 2130 1 94619:Monday 1730 2130 2 94619:Monday 0700 1430 …SELECT … FROM time_range_zip_code WHERE zip_code_day_of_week = ‘94619:Monday’ AND 1815 < close_time key range 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 59
  60. 60. Sorted sets support range queries Key Sorted Set [ Entry:Score, …] 94707:Monday [1130_1:1430, 1730_1:2130] 94619:Monday [0700_2:1430, 1130_1:1430, 1730_1:2130] zipCode:dayOfWeek Member: OpeningTime_RestaurantId Score: ClosingTime ZRANGEBYSCORE 94619:Monday 1815 2359 è {1730_1} 1730 is before 1815 è Ajanta is open 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 60
  61. 61. What did I just do to query the data? 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 61
  62. 62. What did I just do to query the data?o  Wrote code to maintain an indexo  Reduced performance due to extra writes 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 62
  63. 63. RedisTemplate-based code@Repositorypublic class AvailableRestaurantRepositoryRedisImpl implements AvailableRestaurantRepository {@Autowired private final StringRedisTemplate redisTemplate;private BoundZSetOperations<String, String> closingTimes(int dayOfWeek, String zipCode) { return redisTemplate.boundZSetOps(AvailableRestaurantKeys.closingTimesKey(dayOfWeek, zipCode)); }public List<AvailableRestaurant> findAvailableRestaurants(Address deliveryAddress, Date deliveryTime) { String zipCode = deliveryAddress.getZip(); int timeOfDay = timeOfDay(deliveryTime); int dayOfWeek = dayOfWeek(deliveryTime); Set<String> closingTrs = closingTimes(dayOfWeek, zipCode).rangeByScore(timeOfDay, 2359); Set<String> restaurantIds = new HashSet<String>(); String paddedTimeOfDay = FormattingUtil.format4(timeOfDay); for (String trId : closingTrs) { if (trId.substring(0, 4).compareTo(paddedTimeOfDay) <= 0) restaurantIds.add(StringUtils.substringAfterLast(trId, "_")); } Collection<String> jsonForRestaurants = redisTemplate.opsForValue().multiGet(AvailableRestaurantKeys.timeRangeRestaurantInfoKeys(restaurantIds )); List<AvailableRestaurant> restaurants = new ArrayList<AvailableRestaurant>(); for (String json : jsonForRestaurants) { restaurants.add(AvailableRestaurant.fromJson(json)); } return restaurants; } 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 63
  64. 64. Redis – Spring configuration@Configurationpublic class RedisConfiguration extends AbstractDatabaseConfig { @Bean public RedisConnectionFactory jedisConnectionFactory() { JedisConnectionFactory factory = new JedisConnectionFactory(); factory.setHostName(databaseHostName); factory.setPort(6379); factory.setUsePool(true); JedisPoolConfig poolConfig = new JedisPoolConfig(); poolConfig.setMaxActive(1000); factory.setPoolConfig(poolConfig); return factory; } @Bean public StringRedisTemplate stringRedisTemplate(RedisConnectionFactory factory) { StringRedisTemplate template = new StringRedisTemplate(); template.setConnectionFactory(factory); return template; }} 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 64
  65. 65. Cassandra: Easy to storerestaurants Column Family: RestaurantDetails Keys Columns 1 name: Ajanta type: Indian … name: Montclair 2 type: Breakfast … Egg Shop OR Column Family: RestaurantDetails Keys Columns 1 details: { JSON DOCUMENT } 2 details: { JSON DOCUMENT } 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 65
  66. 66. Querying using Cassandrao  Similar challenges to using Rediso  Limited querying options n  Row key – exact or range n  Column name – exact or rangeo  Use composite/concatenated keys n  Prefix - equality match n  Suffix - can be range scano  No joins è denormalize 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 66
  67. 67. Cassandra: Find restaurants that close after the deliverytime and then filter Keys Super Columns 1430 1430 213094619:Mon 1130_1: JSON FOR 1730_1: JSON FOR 0700_2: JSON FOR EGG AJANTA AJANTA SuperSlice key= 94619:Mon SliceStart = 1815 SliceEnd = 2359 Keys Super Columns 213094619:Mon 1730_1: JSON FOR AJANTA 18:15 is after 17:30 => {Ajanta} 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 67
  68. 68. Cassandra/Hector codeimport me.prettyprint.hector.api.Cluster;public class CassandraHelper { @Autowired private final Cluster cluster; public <T> List<T> getSuperSlice(String keyspace, String columnFamily, String key, String sliceStart, String sliceEnd, SuperSliceResultMapper<T> resultMapper) { SuperSliceQuery<String, String, String, String> q = HFactory.createSuperSliceQuery(HFactory.createKeyspace(keyspace, cluster), StringSerializer.get(), StringSerializer.get(), StringSerializer.get(), StringSerializer.get()); q.setColumnFamily(columnFamily); q.setKey(key); q.setRange(sliceStart, sliceEnd, false, 10000); QueryResult<SuperSlice<String, String, String>> qr = q.execute(); SuperColumnRowProcessor<T> rowProcessor = new SuperColumnRowProcessor<T>(resultMapper); for (HSuperColumn<String, String, String> superColumn : qr.get().getSuperColumns()) { List<HColumn<String, String>> columns = superColumn.getColumns(); rowProcessor.processRow(key, superColumn.getName(), columns); } return rowProcessor.getResult(); }} 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 68
  69. 69. MongoDB = easy to store{ "_id": "1234" "name": "Ajanta", "serviceArea": ["94619", "99999"], "openingHours": [ { "dayOfWeek": 1, "open": 1130, "close": 1430 }, { "dayOfWeek": 2, "open": 1130, "close": 1430 }, … ]} 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 69
  70. 70. MongoDB = easy to query{ "serviceArea": "94619", "openingHours": { "$elemMatch": { "open": { "$lte": 1815}, "dayOfWeek": 4, "close": { $gte": 1815} } } db.availableRestaurants.ensureIndex({serviceArea: 1}) 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 70
  71. 71. MongoTemplate-based code@Repositorypublic class AvailableRestaurantRepositoryMongoDbImpl implements AvailableRestaurantRepository {@Autowired private final MongoTemplate mongoTemplate;@Autowired @Overridepublic List<AvailableRestaurant> findAvailableRestaurants(Address deliveryAddress, Date deliveryTime) { int timeOfDay = DateTimeUtil.timeOfDay(deliveryTime); int dayOfWeek = DateTimeUtil.dayOfWeek(deliveryTime);Query query = new Query(where("serviceArea").is(deliveryAddress.getZip()) .and("openingHours”).elemMatch(where("dayOfWeek").is(dayOfWeek) .and("openingTime").lte(timeOfDay) .and("closingTime").gte(timeOfDay))); return mongoTemplate.find(AVAILABLE_RESTAURANTS_COLLECTION, query, AvailableRestaurant.class);} mongoTemplate.ensureIndex(“availableRestaurants”, new Index().on("serviceArea", Order.ASCENDING)); 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 71
  72. 72. MongoDB – Spring Configuration@Configurationpublic class MongoConfig extends AbstractDatabaseConfig { private @Value("#{mongoDbProperties.databaseName}") String mongoDbDatabase; public @Bean MongoFactoryBean mongo() { MongoFactoryBean factory = new MongoFactoryBean(); factory.setHost(databaseHostName); MongoOptions options = new MongoOptions(); options.connectionsPerHost = 500; factory.setMongoOptions(options); return factory; } public @Bean MongoTemplate mongoTemplate(Mongo mongo) throws Exception { MongoTemplate mongoTemplate = new MongoTemplate(mongo, mongoDbDatabase); mongoTemplate.setWriteConcern(WriteConcern.SAFE); mongoTemplate.setWriteResultChecking(WriteResultChecking.EXCEPTION); return mongoTemplate; }} 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 72
  73. 73. Summaryo  Relational databases are great but n  Object/relational impedance mismatch n  Relational schema is rigid n  Extremely difficult/impossible to scale writes n  Performance can be suboptimalo  Each NoSQL databases can solve some combination of those problems BUT n  Limited transactions n  One day needing ACID è major rewrite n  Query-driven, denormalized database design n  … èo  Carefully pick the NoSQL DB for your applicationo  Consider a polyglot persistence architecture 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 74
  74. 74. Thank you! My contact info: chris@chrisrichardson.net @crichardson 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 75

×