Polyglot persistence for Java
 developers - moving out of the
     relational comfort zone

Chris Richardson

Author of POJOs in Action
Founder of CloudFoundry.com
chris@chrisrichardson.net
@crichardson
Overall presentation goal


The joy and pain of
   building Java
  applications that
     use NoSQL

    8/19/11   Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                          Slide 2
About Chris
              •           Grew up in England and live in Oakland,
                          CA
              •           Over 25+ years of software development
                          experience including 14+ years of Java
              •           Speaker at JavaOne, SpringOne,
                          PhillyETE, Devoxx, etc.
              •           Organize the Oakland JUG and the
                          Groovy Grails meetup




                                                 http://www.theregister.co.uk/2009/08/19/springsource_cloud_foundry/




    8/19/11        Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                                            Slide 3
Agenda
          o  Why NoSQL?
          o  Overview of NoSQL databases
          o  Introduction to Spring Data
          o  Case study: POJOs in Action & NoSQL




                        8/19/11
Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                            Slide 4
Relational databases are great
o  SQL = Rich, declarative query language
o  Database enforces referential integrity
o  ACID semantics
o  Well understood by developers
o  Well supported by frameworks and tools, e.g. Spring
   JDBC, Hibernate, JPA
o  Well understood by operations
     n    Configuration
     n    Care and feeding
     n    Backups
     n    Tuning
     n    Failure and recovery
     n    Performance characteristics
o  But….


             8/19/11     Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                     Slide 5
Problem: Complex object graphs
o  Object/relational
   impedance
   mismatch
o  Complicated to
   map rich domain
   model to relational
   schema
o  Performance issues
  n  Many rows in many
      tables
  n  Many joins
Problem: Semi-structured data
o  Relational schema doesn’t easily handle
   semi-structured data:
  n  Varying attributes
  n  Custom attributes on a customer record
o  Common solution = Name/value table
  n  Poor performance
  n  E.g. Finding specific attributes for customers
      satisfying some criteria = multi-way outer
      JOIN
  n  Lack of constraints
o  Another solution = Serialize as blob
  n  Fewer joins
  n  BUT can’t be queried
Problem: Schema evolution
o  For example:
  n  Add attributes to an object è add
      columns to table
o  Schema changes =
  n  Holding locks for a long time è
      application downtime
  n  $$
Problem: Scaling
o  Scaling reads:
  n  Master/slave
  n  But beware of consistency issues
o  Scaling writes
  n  Extremely difficult/impossible/expensive
  n  Vertical scaling is limited and requires $$
  n  Horizontal scaling is limited/requires $$
Solution: Buy high end technology




   http://upload.wikimedia.org/wikipedia/commons/e/e5/Rising_Sun_Yacht.JPG
Solution: Hire more developers
o  Application-level sharding
o  Build your own middleware
o  …




http://www.trekbikes.com/us/en/bikes/road/race_performance/madone_4_series/madone_4_5
Solution: Use NewSQL
o  Led by Stonebraker
  n  Current databases are designed for 1970s
      hardware and for both OLTP and data
      warehouses
  n  http://www.slideshare.net/VoltDB/sql-
      myths-webinar
o  NewSQL
  n    Next generation SQL databases, e.g. VoltDB
  n    Leverage multi-core, commodity hardware
  n    In-memory
  n    Horizontally scalable
  n    Transparently shardable
  n    ACID
NoSQL databases are emerging…
Each one offers
some combination
of:
o  Higher performance
o  Higher scalability
o  Richer data-model
o  Schema-less
In return for:
o  Limited transactions
o  Relaxed consistency
o  Unconstrained data
o  …

         8/19/11   Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                               Slide 13
… but there are few commonalities

o  Everyone and their dog has written
   one
o  Different data models
  n    Key-value                                          “Same sorry state as the database
                                                          market in the 1970s before SQL was
  n    Column                                                         invented”
                                                            http://queue.acm.org/detail.cfm?
  n    Document                                                      id=1961297

  n    Graph
o  Different APIs
o  No JDBC, Hibernate, JPA (generally)

         8/19/11    Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                     Slide 14
Future = multi-paradigm data storage
for enterprise applications




       IEEE Software Sept/October 2010 - Debasish Ghosh / Twitter @debasishg



      8/19/11              Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                       Slide 15
Agenda
o  Why NoSQL?
o  Overview of NoSQL databases
o  Introduction to Spring Data
o  Case study: POJOs in Action & NoSQL




      8/19/11   Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                            Slide 16
Redis
o  Advanced key-value store
  n  Values can be binary strings, Lists, Sets,
      Sorted Sets, Hashes, …
  n  Data-type specific operations
o  Very fast
  n  ~100K operations/second on entry-level
      hardware
  n  In-memory operations                                               K1   V1


o  Persistent                                                            K2   V2

  n  Periodic snapshots of memory OR                                    K3   V2

      append commands to log file
o  Transactions within a single server
  n  Atomic execution of batched commands
  n  Optimistic locking

   8/19/11   Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                              Slide 17
Redis CLI                         Sorted set member = value + score


redis> zadd mysortedset 5.0 a
(integer) 1
redis> zadd mysortedset 10.0 b
(integer) 1
redis> zadd mysortedset 1.0 c
(integer) 1
redis> zrange mysortedset 0 1
1) "c"
2) "a"
redis> zrangebyscore mysortedset 1 6
1) "c"
2) "a"

        8/19/11   Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                              Slide 18
Scaling Redis
o  Master/slave replication
  n  Tree of Redis servers
  n  Non-persistent master can replicate to a
      persistent slave
  n  Use slaves for read-only queries
o  Sharding
  n  Client-side only – consistent hashing based
      on key
  n  Server-side sharding – coming one day
o  Run multiple servers per physical host
  n  Server is single threaded => Leverage
      multiple CPUs
  n  32 bit more efficient than 64 bit

      8/19/11   Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                            Slide 19
Downsides of Redis
o  Low-level API compared to SQL
o  Single threaded:
  n  Multiple cores è multiple Redis servers
o  Master/slave failover is manual
o  Partitioning is done by the client
o  Dataset has to fit in memory
Redis use cases
o  Drop-in replacement for Memcached
  n  Session state
  n  Cache of data retrieved from SOR
o  Replica of SOR for queries needing high-
   performance
o  Miscellaneous yet important
  n    Counting using INCR command, e.g. hit counts
  n    Most recent N items - LPUSH and LTRIM
  n    Randomly selecting an item – SRANDMEMBER
  n    Queuing – Lists with LPOP, RPUSH, ….
  n    High score tables – Sorted sets and ZINCRBY
  n    …

o  Notable users: github, guardian.co.uk, ….
          8/19/11   Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                Slide 21
Cassandra
o  An Apache open-source project
o  Developed by Facebook for inbox search
o  Column-oriented database/Extensible row store
   n  The data model will hurt your brain
   n  Row = map or map of maps
o  Fast writes = append to a log
o  Extremely scalable
   n  Transparent and dynamic clustering
   n  Rack and datacenter aware data replication
o  Tunable read/write consistency per operation
   n  Writes: any, one replica, quorum of replicas, …, all
   n  Read: one, quorum, …, all
o  CQL = “SQL”-like DDL and DML
         8/19/11     Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                 Slide 22
Cassandra data model
                              My Column family (within a key space)
   Keys     Columns


   a        colA: value1            colB: value2                         colC: value3


   b        colA: value             colD: value                          colE: value
                                                                                        A column has a
                                                                                        timestamp to

o  4-D map: keySpace x key x columnFamily x column è
   value
o  Arbitrary number of columns
o  Column names are dynamic; can contain data
o  Columns for a row are stored on disk in order
   determined by comparator
o  One CF row = one DDD aggregate

          8/19/11          Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                             Slide 23
Cassandra data model – insert/update
                             My Column family (within a key space)
  Keys     Columns


  a        colA: value1            colB: value2                         colC: value3   Transaction =
                                                                                       updates to a
                                                                                       row within a
  b        colA: value             colD: value                          colE: value    ColumnFamily




                     Insert(key=a, columName=colZ, value=foo)                          Idempotent
  Keys     Columns


  a        colA: value1            colB: value2                         colC: value3   colZ: foo


  b        colA: value             colD: value                          colE: value


         8/19/11          Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                              Slide 24
Cassandra query example – slice
Key   Columns
  s
      colA:             colB:                                colC:                 colZ:
a
      value1            value2                               value3                 foo

      colA:              colD:                                colE:
b
      value              value                                value




         slice(key=a, startColumn=colA, endColumnName=colC)


Key     Columns                                                          You can also do a
  s
                                                                         rangeSlice which
      colA:             colB:
a
      value1            value2                                           returns a range of keys
                                                                         – less efficient



      8/19/11      Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                           Slide 25
Super Column Families – one more
dimension
                              My Column family (within a key space)
  Keys      Super columns

                          ScA                                                    ScB
  a
           colA: value1            colB: value2                          colC: value3


  b
           colA: value              colD: value                          colE: value



                     Insert(key=a, superColumn=scB, columName=colZ, value=foo)


                                     keySpace x key x columnFamily x superColumn x column -> value
  Keys       Super columns

                           ScA                                                    ScB
  a
            colA: value1            colB: value2                          colC:colZ: foo
                                                                                value3

  b
            colA: value              colD: value                           colE: value

         8/19/11           Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                           Slide 26
Getting data with super slice
                                                            My Column family (within a key space)

  Keys      Super columns

                          ScA                                                   ScB
  a
           colA: value1           colB: value2                          colC: value3


  b
           colA: value             colD: value                          colE: value




                   superSlice(key=a, startColumn=scB, endColumnName=scC)


  Keys       Super columns

                                                                                ScB
  a
                                                                         colC: value3



         8/19/11          Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                                    Slide 27
Cassandra CLI
$ bin/cassandra-cli -h localhost
Connected to: "Test Cluster" on localhost/9160
Welcome to cassandra CLI.
[default@unknown] use Keyspace1;
Authenticated to keyspace: Keyspace1
[default@Keyspace1] list restaurantDetails;
Using default limit of 100
-------------------
RowKey: 1
=> (super_column=attributes,
     (column=json, value={"id":
   1,"name":"Ajanta","menuItems"....
[default@Keyspace1] get restaurantDetails['1']
   ['attributes’];
=> (column=json, value={"id":
   1,"name":"Ajanta","menuItems"....

      8/19/11   Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                            Slide 28
Scaling Cassandra
                                                                                               • Client connects to any node
                                                                                               • Dynamically add/remove nodes
                 Keys = [D, A]
                                           Node 1                                              • Reads/Writes specify how many nodes
                                                                                               • Configurable # of replicas
                                          Token = A                                                   •  adjacent nodes
                                                                                                      •  rack and data center aware
                         replicates                                         replicates




                  Node 4                                                           Node 2
                                                                                                                Keys = [A, B]
                Token = D                                                         Token = B

                                                                         replicates
Keys = [C, D]                    replicates                                                     Replicates to




                                           Node 3
                                          Token = C

                                                                  Keys = [B, C]

                    8/19/11                   Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                                                                Slide 29
Downsides of Cassandra
o  Learning curve
o  Still maturing, currently v0.8.4
o  Limited queries, i.e. KV lookup
o  Transactions limited to a column
   family row
o  Lacks an easy to use API




      8/19/11   Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                            Slide 30
Cassandra use cases
o  Use cases
  •    Big data
  •    Multiple Data Center distributed database
  •    Persistent cache
  •    (Write intensive) Logging
  •    High-availability (writes)
o  Who is using it
  n  Digg, Facebook, Twitter, Reddit, Rackspace
  n  Cloudkick, Cisco, SimpleGeo, Ooyala, OpenX
  n  The largest production cluster has over 100
      TB of data in over 150 machines. –
      Casssandra web site

         8/19/11   Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                               Slide 31
MongoDB
o  Document-oriented database
   n  JSON-style documents: Lists, Maps, primitives
   n  Documents organized into collections (~table)
   n  Schema-less
o  Rich query language for dynamic queries
o  Asynchronous, configurable writes:
   n  No wait
   n  Wait for replication
   n  Wait for write to disk
o  Very fast
o  Highly scalable and available:
   n  Replica sets (generalized master/slave)
   n  Sharding
   n  Transparent to client


         8/19/11      Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                  Slide 32
Data Model = Binary JSON documents
{
    "name" : "Sahn Maru",                                                                  One document
    "type" : ”Korean",
    "serviceArea" : [                                                                           =
       "94619",
       "94618"                                                                           one DDD aggregate
    ],
    "openingHours" : [
       {                                                         DBObject o = new BasicDBObject();
          "dayOfWeek" : "Wednesday",                             o.put("name", ”Sahn Maru");
          "open" : 1730,
          "close" : 2230                                         DBObject mi = new BasicDBObject();
       }                                                         mi.put("name", "Daeji Bulgogi");
    ],                                                           …
    "_id" : ObjectId("4bddc2f49d1505567c6220a0")                 List<DBObject> mis = Collections.singletonList(mi);
}
                                                                 o.put("menuItems", mis);


    o  Sequence of bytes on disk = fast I/O
         n  No joins/seeks
         n  In-place updates when possible è no index updates
    o  Transaction = update of single document

                   8/19/11         Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                                        Slide 33
MongoDB CLI
$ bin/mongo
> use mydb
> r1 = {name: 'Ajanta'}
{name: 'Ajanta'}
> r2 = {name: 'Montclair Egg Shop'}
{name: 'Montclair Egg Shop'}
> db.restaurants.save(r1)
> r1
{ _id: ObjectId("98…"), name: "Ajanta"}
> db.restaurants.save(r2)
> r2
{ _id: ObjectId("66…"), name: "Montclair Egg Shop"}
> db.restaurants.find({name: /^A/})
{ _id: ObjectId("98…"), name: "Ajanta"}
> db.restaurants.update({name: "Ajanta"},
{name: "Ajanta Restaurant"})


            8/19/11     Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                    Slide 34
MongoDB query by example
{
    serviceArea:"94619",                                                                Find a
    openingHours: {
      $elemMatch : {                                                                    restaurant
           "dayOfWeek" : "Monday",
           "open": {$lte: 1800},                                                        that serves
       }
           "close": {$gte: 1800}
                                                                                        the 94619 zip
}
    }
                                                                                        code and is
                                                                                        open at 6pm
DBCursor cursor = collection.find(qbeObject);
while (cursor.hasNext()) {                                                              on a Monday
   DBObject o = cursor.next();
   …
 }




             8/19/11        Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                                Slide 35
Scaling MongoDB
                         Shard 1                                                             Shard 2
          Mongod                                                              Mongod
          (replica)                                                           (replica)

    Mongod                                                  Mongod
   (master)           Mongod                               (master)                       Mongod
                      (replica)                                                           (replica)


Config
Server

mongod
                                                                                             A shard consists of a
                                 mongos                                                      replica set =
                                                                                             generalization of
                                                                                             master slave
mongod


mongod                                                                                     Collections spread
                                                                                             over multiple
                                    client                                                       shards



         8/19/11         Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                                         Slide 36
Mongo Downsides
o  Server has a global write lock
    n  Single writer OR multiple readers
        è Long running queries blocks writers
o  Great that writes are not synchronous
    n  BUT perhaps an asynchronous response
        would be better than a synchronous
        getLastError()


Interesting story: http://www.slideshare.net/eonnen/from-100s-to-100s-of-millions
MongoDB use cases
o  Use cases
  n  High volume writes
  n  Complex data
  n  Semi-structured data
o  Who is using it?
  n    Shutterfly, Foursquare
  n    Bit.ly Intuit
  n    SourceForge, NY Times
  n    GILT Groupe, Evite,
  n    SugarCRM

         8/19/11   Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                               Slide 38
Other NoSQL databases
Type                                                     Examples


Extensible columns/Column-                               Hbase
oriented                                                 SimpleDB


Graph                                                    Neo4j


Key-value                                                Membase


Document                                                 CouchDb


            http://nosql-database.org/ lists 122+ NoSQL databases

             8/19/11      Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                      Slide 39
Picking a database
Application requirement                                                     Solution
Complex transactions/ACID                                                   Relational database
Scaling                                                                     NoSQL
Social data                                                                 Graph database
Multiple datacenters                                                        Cassandra
Highly-available writes                                                     Cassandra
Flexible data                                                               Document store
High write volumes                                                          Mongo, Cassandra
Super fast cache                                                            Redis
Adhoc queries                                                               Relational or Mongo
…
 http://highscalability.com/blog/2011/6/20/35-use-cases-for-choosing-your-next-nosql-database.html




                     8/19/11                   Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                                           Slide 40
Proceed with caution
o  Don’t commit to a
   NoSQL DB until you
   have done a
   significant POC
o  Encapsulate your data
   access code so you
   can switch
o  Hope that one day
   you won’t need ACID
Agenda
o  Why NoSQL?
o  Overview of NoSQL databases
o  Introduction to Spring Data
o  Case study: POJOs in Action & NoSQL




      8/19/11   Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                            Slide 42
NoSQL Java APIs

Database              Libraries
Redis                 Jedis, JRedis, JDBC-Redis, RJC

Cassandra             Raw Thrift if you are a masochist
                      Hector, …

MongoDB               MongoDB provides a Java driver

            Some are not so easy to use
            Stylistic differences
            Boilerplate code
            …



            8/19/11      Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                     Slide 43
Spring Data Project Goals
Bring classic Spring value propositions to a wide
range of NoSQL databases
                        è
   n  Productivity
   n  Programming model consistency: E.g.
       <NoSQL>Template classes
   n  “Portability”



http://www.springsource.org/spring-data



                                             Slide 44
Spring Data sub-projects
§ Commons: Polyglot persistence
§ Key-Value: Redis, Riak
§ Document: MongoDB, CouchDB
§ Graph: Neo4j
§ GORM for NoSQL
§ Various milestone releases
  § Redis 1.0.0.M4 (July 20th, 2011)
  § Document 1.0.0.M2 (April 9, 2011)
  § Graph - Neo4j Support 1.0.0 (April 19, 2011)
  § …
       8/19/11   Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                             Slide 45
MongoTemplate
                        MongoTemplate
Simplifies data   databaseName
                                                   POJO ó DBObject
access            userId                               mapping
                  Password
Translates
                  defaultCollectionName
exceptions
                  writeConcern
                  writeResultChecking

                  save()
                                                     <<interface>>
                  insert()
                  remove()
                                                     MongoConvertor
                  updateFirst()                  write(Object, DBObject)
                  findOne()                      read(Class, DBObject)
                  find()
                  …


                                               SimpleMongo
                                        uses     Converter
                           Mongo
                                                         MongoMapping
                     (Java Driver class)
                                                           Converter
                                                              Slide 46
Richer mapping
                                                      Annotations define mapping:
                                                      @Document, @Id, @Indexed,
                                                      @PersistanceConstructor,
@Document
                                                      @CompoundIndex, @DBRef,
public class Person {
                                                      @GeoSpatialIndexed, @Value
 @Id
 private ObjectId id;                                 Map fields instead of properties
 private String firstname;                            è no getters or setters required

 @Indexed                                             Non-default constructor
 private String lastname;
                                                      Index generation
 @PersistenceConstructor
 public Person(String firstname, String lastname) {
   this.firstname = firstname;
   this.lastname = lastname;
 }

….
}


                                                                             Slide 47
Generic Mongo Repositories
interface PersonRepository extends MongoRepository<Person, ObjectId> {
   List<Person> findByLastname(String lastName);
}



<bean>
 <mongo:repositories
  base-package="net.chrisrichardson.mongodb.example.mongorepository"
     mongo-template-ref="mongoTemplate" />
</beans>


Person p = new Person("John", "Doe");
personRepository.save(p);

Person p2 = personRepository.findOne(p.getId());

List<Person> johnDoes = personRepository.findByLastname("Doe");
assertEquals(1, johnDoes.size());

                                                                   Slide 48
Support for the QueryDSL project

   Generated from                           Type-safe
domain model class                  composable queries


 QPerson person = QPerson.person;

 Predicate predicate =
        person.homeAddress.street1.eq("1 High Street")
               .and(person.firstname.eq("John"))

 List<Person> people = personRepository.findAll(predicate);

 assertEquals(1, people.size());
 assertPersonEquals(p, people.get(0));

                                                      Slide 49
Cross-store/polyglot persistence
                                Person person = new Person(…);
@Entity                         entityManager.persist(person);
public class Person {
  // In Database                Person p2 = entityManager.find(…)
 @Id private Long id;
 private String firstname;
 private String lastname;

// In MongoDB
@RelatedDocument private Address address;


     { "_id" : ObjectId(”….."),
      "_entity_id" : NumberLong(1),
       "_entity_class" : "net.. Person",
     "_entity_field_name" : "address",
        "zip" : "94611", "street1" : "1 High Street", …}

                                                           Slide 50
Agenda
o  Why NoSQL?
o  Overview of NoSQL databases
o  Introduction to Spring Data
o  Case study: POJOs in Action &
   NoSQL




     8/19/11   Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                           Slide 51
Food to Go – placing a takeout
  order
o  Customer enters delivery address and delivery time
o  System displays available restaurants = restaurants
   that serve the zip code of the delivery address AND
   are open at the delivery time

  class Restaurant {                                       class TimeRange {
    long id;                                                 long id;
    String name;                                             int dayOfWeek;
    Set<String> serviceArea;                                 int openingTime;
    Set<TimeRange> openingHours;
                                                             int closingTime;
    List<MenuItem> menuItems;
                                                           }
  }


                                                           class MenuItem {
                                                             String name;
                                                             double price;
                                                           }


             8/19/11       Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                       Slide 52
Database schema
ID                    Name                                                  …
                                                                                                     RESTAURANT
1                     Ajanta
                                                                                                     table
2                     Montclair Eggshop

Restaurant_id             zipcode
                                                                                            RESTAURANT_ZIPCODE
1                         94707
                                                                                            table
1                         94619
2                         94611
2                         94619                                                            RESTAURANT_TIME_RANGE
                                                                                           table

Restaurant_id    dayOfWeek                           openTime                                closeTime
1                Monday                              1130                                    1430
1                Monday                              1730                                    2130
2                Tuesday                             1130                                    …


            8/19/11            Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                                         Slide 53
Finding available restaurants on
 monday, 7.30pm for 94619 zip
select r.*             Straightforward
from restaurant r      three-way join
 inner join restaurant_time_range tr
   on r.id =tr.restaurant_id
 inner join restaurant_zipcode sa
   on r.id = sa.restaurant_id
Where ’94619’ = sa.zip_code
and tr.day_of_week=’monday’
and tr.openingtime <= 1930
and 1930 <=tr.closingtime


       8/19/11   Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                             Slide 54
Redis - Persisting restaurants is
    “easy”
rest:1:details           [ name: “Ajanta”, … ]
                                                                                                  Multiple KV value
rest:1:serviceArea       [ “94619”, “94611”, …]
                                                                                                  pairs
rest:1:openingHours      [10, 11]

timerange:10             [“dayOfWeek”: “Monday”, ..]

timerange:11             [“dayOfWeek”: “Tuesday”, ..]


                                                                                                  Single KV hash
                               OR

rest:1                    [ name: “Ajanta”,
                            “serviceArea:0” : “94611”, “serviceArea:1” : “94619”,
                            “menuItem:0:name”, “Chicken Vindaloo”,
                            …]



                               OR
                                                                                                  Single KV String
 rest:1                   { .. A BIG STRING/BYTE ARRAY, E.G. JSON }



                     8/19/11          Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                                               Slide 55
BUT…
o  … we can only retrieve them via primary key
è  We need to implement indexes
è  Queries instead of data model drives
    NoSQL database design
o  But how can a key-value store support a
    query that has



                                                                ?
    n  A 3-way join
    n  Multiple =
    n  > and <



       8/19/11   Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                             Slide 56
Simplification #1: Denormalization
Restaurant_id   Day_of_week     Open_time                           Close_time             Zip_code

1               Monday          1130                                1430                   94707
1               Monday          1130                                1430                   94619
1               Monday          1730                                2130                   94707
1               Monday          1730                                2130                   94619
2               Monday          0700                                1430                   94619
…



       SELECT restaurant_id, open_time
        FROM time_range_zip_code
        WHERE day_of_week = ‘Monday’                                                  Simpler query:
          AND zip_code = 94619                                                        §  No joins
                                                                                      §  Two = and two <
          AND 1815 < close_time
          AND open_time < 1815

                8/19/11   Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                                      Slide 57
Simplification #2: Application filtering


SELECT restaurant_id, open_time
 FROM time_range_zip_code
 WHERE day_of_week = ‘Monday’                                                  Even simple query
   AND zip_code = 94619                                                        •  No joins
   AND 1815 < close_time                                                       •  Two = and one <
   AND open_time < 1815




         8/19/11   Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                             Slide 58
Simplification #3: Eliminate multiple
 =’s with concatenation

 Restaurant_id    Zip_dow                        Open_time                              Close_time

 1                94707:Monday                   1130                                   1430
 1                94619:Monday                   1130                                   1430
 1                94707:Monday                   1730                                   2130
 1                94619:Monday                   1730                                   2130
 2                94619:Monday                   0700                                   1430
 …


SELECT …
 FROM time_range_zip_code
 WHERE zip_code_day_of_week = ‘94619:Monday’
   AND 1815 < close_time
                                                                                                     key

                                                  range

            8/19/11         Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                                           Slide 59
Sorted sets support range queries
 Key                                    Sorted Set [ Entry:Score, …]

 94707:Monday                           [1130_1:1430, 1730_1:2130]

 94619:Monday                           [0700_2:1430, 1130_1:1430, 1730_1:2130]



 zipCode:dayOfWeek                         Member:                  OpeningTime_RestaurantId
                                           Score:                   ClosingTime



       ZRANGEBYSCORE 94619:Monday 1815 2359
                       è
                    {1730_1}


             1730 is before 1815 è Ajanta is open


         8/19/11     Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                        Slide 60
What did I just do to query the data?




     8/19/11   Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                           Slide 61
What did I just do to query the data?
o  Wrote code to maintain an index
o  Reduced performance due to extra
   writes




     8/19/11   Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                           Slide 62
RedisTemplate-based code
@Repository
public class AvailableRestaurantRepositoryRedisImpl implements AvailableRestaurantRepository {

@Autowired private final StringRedisTemplate redisTemplate;

private BoundZSetOperations<String, String> closingTimes(int dayOfWeek, String zipCode) {
   return redisTemplate.boundZSetOps(AvailableRestaurantKeys.closingTimesKey(dayOfWeek, zipCode));
 }

public List<AvailableRestaurant> findAvailableRestaurants(Address deliveryAddress, Date deliveryTime) {
  String zipCode = deliveryAddress.getZip();
  int timeOfDay = timeOfDay(deliveryTime);
  int dayOfWeek = dayOfWeek(deliveryTime);

  Set<String> closingTrs = closingTimes(dayOfWeek, zipCode).rangeByScore(timeOfDay, 2359);
  Set<String> restaurantIds = new HashSet<String>();
  String paddedTimeOfDay = FormattingUtil.format4(timeOfDay);
  for (String trId : closingTrs) {
    if (trId.substring(0, 4).compareTo(paddedTimeOfDay) <= 0)
      restaurantIds.add(StringUtils.substringAfterLast(trId, "_"));
  }

  Collection<String> jsonForRestaurants =
          redisTemplate.opsForValue().multiGet(AvailableRestaurantKeys.timeRangeRestaurantInfoKeys(restaurantIds ));
   List<AvailableRestaurant> restaurants = new ArrayList<AvailableRestaurant>();
   for (String json : jsonForRestaurants) {
     restaurants.add(AvailableRestaurant.fromJson(json));
   }
   return restaurants;
 }




                         8/19/11              Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                                          Slide 63
Redis – Spring configuration
@Configuration
public class RedisConfiguration extends AbstractDatabaseConfig {

    @Bean
    public RedisConnectionFactory jedisConnectionFactory() {
      JedisConnectionFactory factory = new JedisConnectionFactory();
      factory.setHostName(databaseHostName);
      factory.setPort(6379);
      factory.setUsePool(true);
      JedisPoolConfig poolConfig = new JedisPoolConfig();
      poolConfig.setMaxActive(1000);
      factory.setPoolConfig(poolConfig);
      return factory;
    }

    @Bean
    public StringRedisTemplate stringRedisTemplate(RedisConnectionFactory factory) {
      StringRedisTemplate template = new StringRedisTemplate();
      template.setConnectionFactory(factory);
      return template;
    }
}


                 8/19/11         Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                             Slide 64
Cassandra: Easy to store
restaurants
                                                     Column Family: RestaurantDetails
  Keys              Columns



  1         name: Ajanta               type: Indian                                     …



          name: Montclair
  2                                  type: Breakfast                                    …
             Egg Shop




                            OR
                                                     Column Family: RestaurantDetails
  Keys              Columns



  1       details: { JSON DOCUMENT }



  2       details: { JSON DOCUMENT }




         8/19/11            Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                            Slide 65
Querying using Cassandra
o  Similar challenges to using Redis
o  Limited querying options
  n  Row key – exact or range
  n  Column name – exact or range
o  Use composite/concatenated keys
  n  Prefix - equality match
  n  Suffix - can be range scan
o  No joins è denormalize


      8/19/11   Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                            Slide 66
Cassandra: Find restaurants that close after the delivery
time and then filter
       Keys          Super Columns

                        1430                                      1430                             2130

94619:Mon
                                                         1130_1: JSON FOR                   1730_1: JSON FOR
                0700_2: JSON FOR EGG
                                                              AJANTA                             AJANTA




                         SuperSlice
                          key= 94619:Mon
                          SliceStart = 1815
                          SliceEnd = 2359

       Keys          Super Columns

                                                                                                   2130

94619:Mon
                                                                                            1730_1: JSON FOR
                                                                                                 AJANTA




                                     18:15 is after 17:30 => {Ajanta}


              8/19/11           Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                                               Slide 67
Cassandra/Hector code
import me.prettyprint.hector.api.Cluster;

public class CassandraHelper {
  @Autowired private final Cluster cluster;

    public <T> List<T> getSuperSlice(String keyspace, String columnFamily,
                                     String key, String sliceStart, String sliceEnd,
                                     SuperSliceResultMapper<T> resultMapper) {

        SuperSliceQuery<String, String, String, String> q =
         HFactory.createSuperSliceQuery(HFactory.createKeyspace(keyspace, cluster),
             StringSerializer.get(), StringSerializer.get(), StringSerializer.get(), StringSerializer.get());
        q.setColumnFamily(columnFamily);
        q.setKey(key);
        q.setRange(sliceStart, sliceEnd, false, 10000);

        QueryResult<SuperSlice<String, String, String>> qr = q.execute();

        SuperColumnRowProcessor<T> rowProcessor = new SuperColumnRowProcessor<T>(resultMapper);

        for (HSuperColumn<String, String, String> superColumn : qr.get().getSuperColumns()) {
          List<HColumn<String, String>> columns = superColumn.getColumns();
          rowProcessor.processRow(key, superColumn.getName(), columns);
        }
        return rowProcessor.getResult();
    }
}

                           8/19/11           Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                                         Slide 68
MongoDB = easy to store
{
    "_id": "1234"
    "name": "Ajanta",
    "serviceArea": ["94619", "99999"],
    "openingHours": [
         {
            "dayOfWeek": 1,
            "open": 1130,
            "close": 1430
         },
         {
            "dayOfWeek": 2,
            "open": 1130,
            "close": 1430
         },
        …
     ]
}




                 8/19/11       Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                           Slide 69
MongoDB = easy to query

{
    "serviceArea": "94619",
    "openingHours": {
       "$elemMatch": {
          "open": { "$lte": 1815},
          "dayOfWeek": 4,
          "close": { $gte": 1815}
       }
    }
       db.availableRestaurants.ensureIndex({serviceArea: 1})


         8/19/11   Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                               Slide 70
MongoTemplate-based code
@Repository
public class AvailableRestaurantRepositoryMongoDbImpl
                               implements AvailableRestaurantRepository {

@Autowired private final MongoTemplate mongoTemplate;

@Autowired @Override
public List<AvailableRestaurant> findAvailableRestaurants(Address deliveryAddress,
                                                          Date deliveryTime) {
 int timeOfDay = DateTimeUtil.timeOfDay(deliveryTime);
 int dayOfWeek = DateTimeUtil.dayOfWeek(deliveryTime);

Query query = new Query(where("serviceArea").is(deliveryAddress.getZip())
       .and("openingHours”).elemMatch(where("dayOfWeek").is(dayOfWeek)
              .and("openingTime").lte(timeOfDay)
              .and("closingTime").gte(timeOfDay)));

    return mongoTemplate.find(AVAILABLE_RESTAURANTS_COLLECTION, query,
                               AvailableRestaurant.class);
}

              mongoTemplate.ensureIndex(“availableRestaurants”,
                 new Index().on("serviceArea", Order.ASCENDING));
                  8/19/11        Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                             Slide 71
MongoDB – Spring Configuration
@Configuration
public class MongoConfig extends AbstractDatabaseConfig {
 private @Value("#{mongoDbProperties.databaseName}")
 String mongoDbDatabase;

    public @Bean MongoFactoryBean mongo() {
      MongoFactoryBean factory = new MongoFactoryBean();
      factory.setHost(databaseHostName);
      MongoOptions options = new MongoOptions();
      options.connectionsPerHost = 500;
      factory.setMongoOptions(options);
      return factory;
    }

    public @Bean
    MongoTemplate mongoTemplate(Mongo mongo) throws Exception {
      MongoTemplate mongoTemplate = new MongoTemplate(mongo, mongoDbDatabase);
      mongoTemplate.setWriteConcern(WriteConcern.SAFE);
      mongoTemplate.setWriteResultChecking(WriteResultChecking.EXCEPTION);
      return mongoTemplate;
    }
}


                   8/19/11       Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                             Slide 72
Summary
o  Relational databases are great but
   n    Object/relational impedance mismatch
   n    Relational schema is rigid
   n    Extremely difficult/impossible to scale writes
   n    Performance can be suboptimal
o  Each NoSQL databases can solve some
   combination of those problems BUT
   n    Limited transactions
   n    One day needing ACID è major rewrite
   n    Query-driven, denormalized database design
   n    …
                         è
o  Carefully pick the NoSQL DB for your application
o  Consider a polyglot persistence architecture


           8/19/11    Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                                  Slide 74
Thank you!
                                               My contact info:


                                               chris@chrisrichardson.net


                                               @crichardson




    8/19/11   Copyright (c) 2011 Chris Richardson. All rights reserved.
                                                                          Slide 75

Polygot persistence for Java Developers - August 2011 / @Oakjug

  • 1.
    Polyglot persistence forJava developers - moving out of the relational comfort zone Chris Richardson Author of POJOs in Action Founder of CloudFoundry.com chris@chrisrichardson.net @crichardson
  • 2.
    Overall presentation goal Thejoy and pain of building Java applications that use NoSQL 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 2
  • 3.
    About Chris •  Grew up in England and live in Oakland, CA •  Over 25+ years of software development experience including 14+ years of Java •  Speaker at JavaOne, SpringOne, PhillyETE, Devoxx, etc. •  Organize the Oakland JUG and the Groovy Grails meetup http://www.theregister.co.uk/2009/08/19/springsource_cloud_foundry/ 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 3
  • 4.
    Agenda o  Why NoSQL? o  Overview of NoSQL databases o  Introduction to Spring Data o  Case study: POJOs in Action & NoSQL 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 4
  • 5.
    Relational databases aregreat o  SQL = Rich, declarative query language o  Database enforces referential integrity o  ACID semantics o  Well understood by developers o  Well supported by frameworks and tools, e.g. Spring JDBC, Hibernate, JPA o  Well understood by operations n  Configuration n  Care and feeding n  Backups n  Tuning n  Failure and recovery n  Performance characteristics o  But…. 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 5
  • 6.
    Problem: Complex objectgraphs o  Object/relational impedance mismatch o  Complicated to map rich domain model to relational schema o  Performance issues n  Many rows in many tables n  Many joins
  • 7.
    Problem: Semi-structured data o Relational schema doesn’t easily handle semi-structured data: n  Varying attributes n  Custom attributes on a customer record o  Common solution = Name/value table n  Poor performance n  E.g. Finding specific attributes for customers satisfying some criteria = multi-way outer JOIN n  Lack of constraints o  Another solution = Serialize as blob n  Fewer joins n  BUT can’t be queried
  • 8.
    Problem: Schema evolution o For example: n  Add attributes to an object è add columns to table o  Schema changes = n  Holding locks for a long time è application downtime n  $$
  • 9.
    Problem: Scaling o  Scalingreads: n  Master/slave n  But beware of consistency issues o  Scaling writes n  Extremely difficult/impossible/expensive n  Vertical scaling is limited and requires $$ n  Horizontal scaling is limited/requires $$
  • 10.
    Solution: Buy highend technology http://upload.wikimedia.org/wikipedia/commons/e/e5/Rising_Sun_Yacht.JPG
  • 11.
    Solution: Hire moredevelopers o  Application-level sharding o  Build your own middleware o  … http://www.trekbikes.com/us/en/bikes/road/race_performance/madone_4_series/madone_4_5
  • 12.
    Solution: Use NewSQL o Led by Stonebraker n  Current databases are designed for 1970s hardware and for both OLTP and data warehouses n  http://www.slideshare.net/VoltDB/sql- myths-webinar o  NewSQL n  Next generation SQL databases, e.g. VoltDB n  Leverage multi-core, commodity hardware n  In-memory n  Horizontally scalable n  Transparently shardable n  ACID
  • 13.
    NoSQL databases areemerging… Each one offers some combination of: o  Higher performance o  Higher scalability o  Richer data-model o  Schema-less In return for: o  Limited transactions o  Relaxed consistency o  Unconstrained data o  … 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 13
  • 14.
    … but thereare few commonalities o  Everyone and their dog has written one o  Different data models n  Key-value “Same sorry state as the database market in the 1970s before SQL was n  Column invented” http://queue.acm.org/detail.cfm? n  Document id=1961297 n  Graph o  Different APIs o  No JDBC, Hibernate, JPA (generally) 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 14
  • 15.
    Future = multi-paradigmdata storage for enterprise applications IEEE Software Sept/October 2010 - Debasish Ghosh / Twitter @debasishg 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 15
  • 16.
    Agenda o  Why NoSQL? o Overview of NoSQL databases o  Introduction to Spring Data o  Case study: POJOs in Action & NoSQL 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 16
  • 17.
    Redis o  Advanced key-valuestore n  Values can be binary strings, Lists, Sets, Sorted Sets, Hashes, … n  Data-type specific operations o  Very fast n  ~100K operations/second on entry-level hardware n  In-memory operations K1 V1 o  Persistent K2 V2 n  Periodic snapshots of memory OR K3 V2 append commands to log file o  Transactions within a single server n  Atomic execution of batched commands n  Optimistic locking 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 17
  • 18.
    Redis CLI Sorted set member = value + score redis> zadd mysortedset 5.0 a (integer) 1 redis> zadd mysortedset 10.0 b (integer) 1 redis> zadd mysortedset 1.0 c (integer) 1 redis> zrange mysortedset 0 1 1) "c" 2) "a" redis> zrangebyscore mysortedset 1 6 1) "c" 2) "a" 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 18
  • 19.
    Scaling Redis o  Master/slavereplication n  Tree of Redis servers n  Non-persistent master can replicate to a persistent slave n  Use slaves for read-only queries o  Sharding n  Client-side only – consistent hashing based on key n  Server-side sharding – coming one day o  Run multiple servers per physical host n  Server is single threaded => Leverage multiple CPUs n  32 bit more efficient than 64 bit 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 19
  • 20.
    Downsides of Redis o Low-level API compared to SQL o  Single threaded: n  Multiple cores è multiple Redis servers o  Master/slave failover is manual o  Partitioning is done by the client o  Dataset has to fit in memory
  • 21.
    Redis use cases o Drop-in replacement for Memcached n  Session state n  Cache of data retrieved from SOR o  Replica of SOR for queries needing high- performance o  Miscellaneous yet important n  Counting using INCR command, e.g. hit counts n  Most recent N items - LPUSH and LTRIM n  Randomly selecting an item – SRANDMEMBER n  Queuing – Lists with LPOP, RPUSH, …. n  High score tables – Sorted sets and ZINCRBY n  … o  Notable users: github, guardian.co.uk, …. 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 21
  • 22.
    Cassandra o  An Apacheopen-source project o  Developed by Facebook for inbox search o  Column-oriented database/Extensible row store n  The data model will hurt your brain n  Row = map or map of maps o  Fast writes = append to a log o  Extremely scalable n  Transparent and dynamic clustering n  Rack and datacenter aware data replication o  Tunable read/write consistency per operation n  Writes: any, one replica, quorum of replicas, …, all n  Read: one, quorum, …, all o  CQL = “SQL”-like DDL and DML 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 22
  • 23.
    Cassandra data model My Column family (within a key space) Keys Columns a colA: value1 colB: value2 colC: value3 b colA: value colD: value colE: value A column has a timestamp to o  4-D map: keySpace x key x columnFamily x column è value o  Arbitrary number of columns o  Column names are dynamic; can contain data o  Columns for a row are stored on disk in order determined by comparator o  One CF row = one DDD aggregate 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 23
  • 24.
    Cassandra data model– insert/update My Column family (within a key space) Keys Columns a colA: value1 colB: value2 colC: value3 Transaction = updates to a row within a b colA: value colD: value colE: value ColumnFamily Insert(key=a, columName=colZ, value=foo) Idempotent Keys Columns a colA: value1 colB: value2 colC: value3 colZ: foo b colA: value colD: value colE: value 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 24
  • 25.
    Cassandra query example– slice Key Columns s colA: colB: colC: colZ: a value1 value2 value3 foo colA: colD: colE: b value value value slice(key=a, startColumn=colA, endColumnName=colC) Key Columns You can also do a s rangeSlice which colA: colB: a value1 value2 returns a range of keys – less efficient 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 25
  • 26.
    Super Column Families– one more dimension My Column family (within a key space) Keys Super columns ScA ScB a colA: value1 colB: value2 colC: value3 b colA: value colD: value colE: value Insert(key=a, superColumn=scB, columName=colZ, value=foo) keySpace x key x columnFamily x superColumn x column -> value Keys Super columns ScA ScB a colA: value1 colB: value2 colC:colZ: foo value3 b colA: value colD: value colE: value 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 26
  • 27.
    Getting data withsuper slice My Column family (within a key space) Keys Super columns ScA ScB a colA: value1 colB: value2 colC: value3 b colA: value colD: value colE: value superSlice(key=a, startColumn=scB, endColumnName=scC) Keys Super columns ScB a colC: value3 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 27
  • 28.
    Cassandra CLI $ bin/cassandra-cli-h localhost Connected to: "Test Cluster" on localhost/9160 Welcome to cassandra CLI. [default@unknown] use Keyspace1; Authenticated to keyspace: Keyspace1 [default@Keyspace1] list restaurantDetails; Using default limit of 100 ------------------- RowKey: 1 => (super_column=attributes, (column=json, value={"id": 1,"name":"Ajanta","menuItems".... [default@Keyspace1] get restaurantDetails['1'] ['attributes’]; => (column=json, value={"id": 1,"name":"Ajanta","menuItems".... 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 28
  • 29.
    Scaling Cassandra • Client connects to any node • Dynamically add/remove nodes Keys = [D, A] Node 1 • Reads/Writes specify how many nodes • Configurable # of replicas Token = A •  adjacent nodes •  rack and data center aware replicates replicates Node 4 Node 2 Keys = [A, B] Token = D Token = B replicates Keys = [C, D] replicates Replicates to Node 3 Token = C Keys = [B, C] 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 29
  • 30.
    Downsides of Cassandra o Learning curve o  Still maturing, currently v0.8.4 o  Limited queries, i.e. KV lookup o  Transactions limited to a column family row o  Lacks an easy to use API 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 30
  • 31.
    Cassandra use cases o Use cases •  Big data •  Multiple Data Center distributed database •  Persistent cache •  (Write intensive) Logging •  High-availability (writes) o  Who is using it n  Digg, Facebook, Twitter, Reddit, Rackspace n  Cloudkick, Cisco, SimpleGeo, Ooyala, OpenX n  The largest production cluster has over 100 TB of data in over 150 machines. – Casssandra web site 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 31
  • 32.
    MongoDB o  Document-oriented database n  JSON-style documents: Lists, Maps, primitives n  Documents organized into collections (~table) n  Schema-less o  Rich query language for dynamic queries o  Asynchronous, configurable writes: n  No wait n  Wait for replication n  Wait for write to disk o  Very fast o  Highly scalable and available: n  Replica sets (generalized master/slave) n  Sharding n  Transparent to client 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 32
  • 33.
    Data Model =Binary JSON documents { "name" : "Sahn Maru", One document "type" : ”Korean", "serviceArea" : [ = "94619", "94618" one DDD aggregate ], "openingHours" : [ { DBObject o = new BasicDBObject(); "dayOfWeek" : "Wednesday", o.put("name", ”Sahn Maru"); "open" : 1730, "close" : 2230 DBObject mi = new BasicDBObject(); } mi.put("name", "Daeji Bulgogi"); ], … "_id" : ObjectId("4bddc2f49d1505567c6220a0") List<DBObject> mis = Collections.singletonList(mi); } o.put("menuItems", mis); o  Sequence of bytes on disk = fast I/O n  No joins/seeks n  In-place updates when possible è no index updates o  Transaction = update of single document 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 33
  • 34.
    MongoDB CLI $ bin/mongo >use mydb > r1 = {name: 'Ajanta'} {name: 'Ajanta'} > r2 = {name: 'Montclair Egg Shop'} {name: 'Montclair Egg Shop'} > db.restaurants.save(r1) > r1 { _id: ObjectId("98…"), name: "Ajanta"} > db.restaurants.save(r2) > r2 { _id: ObjectId("66…"), name: "Montclair Egg Shop"} > db.restaurants.find({name: /^A/}) { _id: ObjectId("98…"), name: "Ajanta"} > db.restaurants.update({name: "Ajanta"}, {name: "Ajanta Restaurant"}) 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 34
  • 35.
    MongoDB query byexample { serviceArea:"94619", Find a openingHours: { $elemMatch : { restaurant "dayOfWeek" : "Monday", "open": {$lte: 1800}, that serves } "close": {$gte: 1800} the 94619 zip } } code and is open at 6pm DBCursor cursor = collection.find(qbeObject); while (cursor.hasNext()) { on a Monday DBObject o = cursor.next(); … } 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 35
  • 36.
    Scaling MongoDB Shard 1 Shard 2 Mongod Mongod (replica) (replica) Mongod Mongod (master) Mongod (master) Mongod (replica) (replica) Config Server mongod A shard consists of a mongos replica set = generalization of master slave mongod mongod Collections spread over multiple client shards 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 36
  • 37.
    Mongo Downsides o  Serverhas a global write lock n  Single writer OR multiple readers è Long running queries blocks writers o  Great that writes are not synchronous n  BUT perhaps an asynchronous response would be better than a synchronous getLastError() Interesting story: http://www.slideshare.net/eonnen/from-100s-to-100s-of-millions
  • 38.
    MongoDB use cases o Use cases n  High volume writes n  Complex data n  Semi-structured data o  Who is using it? n  Shutterfly, Foursquare n  Bit.ly Intuit n  SourceForge, NY Times n  GILT Groupe, Evite, n  SugarCRM 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 38
  • 39.
    Other NoSQL databases Type Examples Extensible columns/Column- Hbase oriented SimpleDB Graph Neo4j Key-value Membase Document CouchDb http://nosql-database.org/ lists 122+ NoSQL databases 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 39
  • 40.
    Picking a database Applicationrequirement Solution Complex transactions/ACID Relational database Scaling NoSQL Social data Graph database Multiple datacenters Cassandra Highly-available writes Cassandra Flexible data Document store High write volumes Mongo, Cassandra Super fast cache Redis Adhoc queries Relational or Mongo … http://highscalability.com/blog/2011/6/20/35-use-cases-for-choosing-your-next-nosql-database.html 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 40
  • 41.
    Proceed with caution o Don’t commit to a NoSQL DB until you have done a significant POC o  Encapsulate your data access code so you can switch o  Hope that one day you won’t need ACID
  • 42.
    Agenda o  Why NoSQL? o Overview of NoSQL databases o  Introduction to Spring Data o  Case study: POJOs in Action & NoSQL 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 42
  • 43.
    NoSQL Java APIs Database Libraries Redis Jedis, JRedis, JDBC-Redis, RJC Cassandra Raw Thrift if you are a masochist Hector, … MongoDB MongoDB provides a Java driver Some are not so easy to use Stylistic differences Boilerplate code … 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 43
  • 44.
    Spring Data ProjectGoals Bring classic Spring value propositions to a wide range of NoSQL databases è n  Productivity n  Programming model consistency: E.g. <NoSQL>Template classes n  “Portability” http://www.springsource.org/spring-data Slide 44
  • 45.
    Spring Data sub-projects § Commons:Polyglot persistence § Key-Value: Redis, Riak § Document: MongoDB, CouchDB § Graph: Neo4j § GORM for NoSQL § Various milestone releases § Redis 1.0.0.M4 (July 20th, 2011) § Document 1.0.0.M2 (April 9, 2011) § Graph - Neo4j Support 1.0.0 (April 19, 2011) § … 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 45
  • 46.
    MongoTemplate MongoTemplate Simplifies data databaseName POJO ó DBObject access userId mapping Password Translates defaultCollectionName exceptions writeConcern writeResultChecking save() <<interface>> insert() remove() MongoConvertor updateFirst() write(Object, DBObject) findOne() read(Class, DBObject) find() … SimpleMongo uses Converter Mongo MongoMapping (Java Driver class) Converter Slide 46
  • 47.
    Richer mapping Annotations define mapping: @Document, @Id, @Indexed, @PersistanceConstructor, @Document @CompoundIndex, @DBRef, public class Person { @GeoSpatialIndexed, @Value @Id private ObjectId id; Map fields instead of properties private String firstname; è no getters or setters required @Indexed Non-default constructor private String lastname; Index generation @PersistenceConstructor public Person(String firstname, String lastname) { this.firstname = firstname; this.lastname = lastname; } …. } Slide 47
  • 48.
    Generic Mongo Repositories interfacePersonRepository extends MongoRepository<Person, ObjectId> { List<Person> findByLastname(String lastName); } <bean> <mongo:repositories base-package="net.chrisrichardson.mongodb.example.mongorepository" mongo-template-ref="mongoTemplate" /> </beans> Person p = new Person("John", "Doe"); personRepository.save(p); Person p2 = personRepository.findOne(p.getId()); List<Person> johnDoes = personRepository.findByLastname("Doe"); assertEquals(1, johnDoes.size()); Slide 48
  • 49.
    Support for theQueryDSL project Generated from Type-safe domain model class composable queries QPerson person = QPerson.person; Predicate predicate = person.homeAddress.street1.eq("1 High Street") .and(person.firstname.eq("John")) List<Person> people = personRepository.findAll(predicate); assertEquals(1, people.size()); assertPersonEquals(p, people.get(0)); Slide 49
  • 50.
    Cross-store/polyglot persistence Person person = new Person(…); @Entity entityManager.persist(person); public class Person { // In Database Person p2 = entityManager.find(…) @Id private Long id; private String firstname; private String lastname; // In MongoDB @RelatedDocument private Address address; { "_id" : ObjectId(”….."), "_entity_id" : NumberLong(1), "_entity_class" : "net.. Person", "_entity_field_name" : "address", "zip" : "94611", "street1" : "1 High Street", …} Slide 50
  • 51.
    Agenda o  Why NoSQL? o Overview of NoSQL databases o  Introduction to Spring Data o  Case study: POJOs in Action & NoSQL 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 51
  • 52.
    Food to Go– placing a takeout order o  Customer enters delivery address and delivery time o  System displays available restaurants = restaurants that serve the zip code of the delivery address AND are open at the delivery time class Restaurant { class TimeRange { long id; long id; String name; int dayOfWeek; Set<String> serviceArea; int openingTime; Set<TimeRange> openingHours; int closingTime; List<MenuItem> menuItems; } } class MenuItem { String name; double price; } 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 52
  • 53.
    Database schema ID Name … RESTAURANT 1 Ajanta table 2 Montclair Eggshop Restaurant_id zipcode RESTAURANT_ZIPCODE 1 94707 table 1 94619 2 94611 2 94619 RESTAURANT_TIME_RANGE table Restaurant_id dayOfWeek openTime closeTime 1 Monday 1130 1430 1 Monday 1730 2130 2 Tuesday 1130 … 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 53
  • 54.
    Finding available restaurantson monday, 7.30pm for 94619 zip select r.* Straightforward from restaurant r three-way join inner join restaurant_time_range tr on r.id =tr.restaurant_id inner join restaurant_zipcode sa on r.id = sa.restaurant_id Where ’94619’ = sa.zip_code and tr.day_of_week=’monday’ and tr.openingtime <= 1930 and 1930 <=tr.closingtime 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 54
  • 55.
    Redis - Persistingrestaurants is “easy” rest:1:details [ name: “Ajanta”, … ] Multiple KV value rest:1:serviceArea [ “94619”, “94611”, …] pairs rest:1:openingHours [10, 11] timerange:10 [“dayOfWeek”: “Monday”, ..] timerange:11 [“dayOfWeek”: “Tuesday”, ..] Single KV hash OR rest:1 [ name: “Ajanta”, “serviceArea:0” : “94611”, “serviceArea:1” : “94619”, “menuItem:0:name”, “Chicken Vindaloo”, …] OR Single KV String rest:1 { .. A BIG STRING/BYTE ARRAY, E.G. JSON } 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 55
  • 56.
    BUT… o  … wecan only retrieve them via primary key è  We need to implement indexes è  Queries instead of data model drives NoSQL database design o  But how can a key-value store support a query that has ? n  A 3-way join n  Multiple = n  > and < 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 56
  • 57.
    Simplification #1: Denormalization Restaurant_id Day_of_week Open_time Close_time Zip_code 1 Monday 1130 1430 94707 1 Monday 1130 1430 94619 1 Monday 1730 2130 94707 1 Monday 1730 2130 94619 2 Monday 0700 1430 94619 … SELECT restaurant_id, open_time FROM time_range_zip_code WHERE day_of_week = ‘Monday’ Simpler query: AND zip_code = 94619 §  No joins §  Two = and two < AND 1815 < close_time AND open_time < 1815 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 57
  • 58.
    Simplification #2: Applicationfiltering SELECT restaurant_id, open_time FROM time_range_zip_code WHERE day_of_week = ‘Monday’ Even simple query AND zip_code = 94619 •  No joins AND 1815 < close_time •  Two = and one < AND open_time < 1815 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 58
  • 59.
    Simplification #3: Eliminatemultiple =’s with concatenation Restaurant_id Zip_dow Open_time Close_time 1 94707:Monday 1130 1430 1 94619:Monday 1130 1430 1 94707:Monday 1730 2130 1 94619:Monday 1730 2130 2 94619:Monday 0700 1430 … SELECT … FROM time_range_zip_code WHERE zip_code_day_of_week = ‘94619:Monday’ AND 1815 < close_time key range 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 59
  • 60.
    Sorted sets supportrange queries Key Sorted Set [ Entry:Score, …] 94707:Monday [1130_1:1430, 1730_1:2130] 94619:Monday [0700_2:1430, 1130_1:1430, 1730_1:2130] zipCode:dayOfWeek Member: OpeningTime_RestaurantId Score: ClosingTime ZRANGEBYSCORE 94619:Monday 1815 2359 è {1730_1} 1730 is before 1815 è Ajanta is open 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 60
  • 61.
    What did Ijust do to query the data? 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 61
  • 62.
    What did Ijust do to query the data? o  Wrote code to maintain an index o  Reduced performance due to extra writes 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 62
  • 63.
    RedisTemplate-based code @Repository public classAvailableRestaurantRepositoryRedisImpl implements AvailableRestaurantRepository { @Autowired private final StringRedisTemplate redisTemplate; private BoundZSetOperations<String, String> closingTimes(int dayOfWeek, String zipCode) { return redisTemplate.boundZSetOps(AvailableRestaurantKeys.closingTimesKey(dayOfWeek, zipCode)); } public List<AvailableRestaurant> findAvailableRestaurants(Address deliveryAddress, Date deliveryTime) { String zipCode = deliveryAddress.getZip(); int timeOfDay = timeOfDay(deliveryTime); int dayOfWeek = dayOfWeek(deliveryTime); Set<String> closingTrs = closingTimes(dayOfWeek, zipCode).rangeByScore(timeOfDay, 2359); Set<String> restaurantIds = new HashSet<String>(); String paddedTimeOfDay = FormattingUtil.format4(timeOfDay); for (String trId : closingTrs) { if (trId.substring(0, 4).compareTo(paddedTimeOfDay) <= 0) restaurantIds.add(StringUtils.substringAfterLast(trId, "_")); } Collection<String> jsonForRestaurants = redisTemplate.opsForValue().multiGet(AvailableRestaurantKeys.timeRangeRestaurantInfoKeys(restaurantIds )); List<AvailableRestaurant> restaurants = new ArrayList<AvailableRestaurant>(); for (String json : jsonForRestaurants) { restaurants.add(AvailableRestaurant.fromJson(json)); } return restaurants; } 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 63
  • 64.
    Redis – Springconfiguration @Configuration public class RedisConfiguration extends AbstractDatabaseConfig { @Bean public RedisConnectionFactory jedisConnectionFactory() { JedisConnectionFactory factory = new JedisConnectionFactory(); factory.setHostName(databaseHostName); factory.setPort(6379); factory.setUsePool(true); JedisPoolConfig poolConfig = new JedisPoolConfig(); poolConfig.setMaxActive(1000); factory.setPoolConfig(poolConfig); return factory; } @Bean public StringRedisTemplate stringRedisTemplate(RedisConnectionFactory factory) { StringRedisTemplate template = new StringRedisTemplate(); template.setConnectionFactory(factory); return template; } } 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 64
  • 65.
    Cassandra: Easy tostore restaurants Column Family: RestaurantDetails Keys Columns 1 name: Ajanta type: Indian … name: Montclair 2 type: Breakfast … Egg Shop OR Column Family: RestaurantDetails Keys Columns 1 details: { JSON DOCUMENT } 2 details: { JSON DOCUMENT } 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 65
  • 66.
    Querying using Cassandra o Similar challenges to using Redis o  Limited querying options n  Row key – exact or range n  Column name – exact or range o  Use composite/concatenated keys n  Prefix - equality match n  Suffix - can be range scan o  No joins è denormalize 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 66
  • 67.
    Cassandra: Find restaurantsthat close after the delivery time and then filter Keys Super Columns 1430 1430 2130 94619:Mon 1130_1: JSON FOR 1730_1: JSON FOR 0700_2: JSON FOR EGG AJANTA AJANTA SuperSlice key= 94619:Mon SliceStart = 1815 SliceEnd = 2359 Keys Super Columns 2130 94619:Mon 1730_1: JSON FOR AJANTA 18:15 is after 17:30 => {Ajanta} 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 67
  • 68.
    Cassandra/Hector code import me.prettyprint.hector.api.Cluster; publicclass CassandraHelper { @Autowired private final Cluster cluster; public <T> List<T> getSuperSlice(String keyspace, String columnFamily, String key, String sliceStart, String sliceEnd, SuperSliceResultMapper<T> resultMapper) { SuperSliceQuery<String, String, String, String> q = HFactory.createSuperSliceQuery(HFactory.createKeyspace(keyspace, cluster), StringSerializer.get(), StringSerializer.get(), StringSerializer.get(), StringSerializer.get()); q.setColumnFamily(columnFamily); q.setKey(key); q.setRange(sliceStart, sliceEnd, false, 10000); QueryResult<SuperSlice<String, String, String>> qr = q.execute(); SuperColumnRowProcessor<T> rowProcessor = new SuperColumnRowProcessor<T>(resultMapper); for (HSuperColumn<String, String, String> superColumn : qr.get().getSuperColumns()) { List<HColumn<String, String>> columns = superColumn.getColumns(); rowProcessor.processRow(key, superColumn.getName(), columns); } return rowProcessor.getResult(); } } 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 68
  • 69.
    MongoDB = easyto store { "_id": "1234" "name": "Ajanta", "serviceArea": ["94619", "99999"], "openingHours": [ { "dayOfWeek": 1, "open": 1130, "close": 1430 }, { "dayOfWeek": 2, "open": 1130, "close": 1430 }, … ] } 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 69
  • 70.
    MongoDB = easyto query { "serviceArea": "94619", "openingHours": { "$elemMatch": { "open": { "$lte": 1815}, "dayOfWeek": 4, "close": { $gte": 1815} } } db.availableRestaurants.ensureIndex({serviceArea: 1}) 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 70
  • 71.
    MongoTemplate-based code @Repository public classAvailableRestaurantRepositoryMongoDbImpl implements AvailableRestaurantRepository { @Autowired private final MongoTemplate mongoTemplate; @Autowired @Override public List<AvailableRestaurant> findAvailableRestaurants(Address deliveryAddress, Date deliveryTime) { int timeOfDay = DateTimeUtil.timeOfDay(deliveryTime); int dayOfWeek = DateTimeUtil.dayOfWeek(deliveryTime); Query query = new Query(where("serviceArea").is(deliveryAddress.getZip()) .and("openingHours”).elemMatch(where("dayOfWeek").is(dayOfWeek) .and("openingTime").lte(timeOfDay) .and("closingTime").gte(timeOfDay))); return mongoTemplate.find(AVAILABLE_RESTAURANTS_COLLECTION, query, AvailableRestaurant.class); } mongoTemplate.ensureIndex(“availableRestaurants”, new Index().on("serviceArea", Order.ASCENDING)); 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 71
  • 72.
    MongoDB – SpringConfiguration @Configuration public class MongoConfig extends AbstractDatabaseConfig { private @Value("#{mongoDbProperties.databaseName}") String mongoDbDatabase; public @Bean MongoFactoryBean mongo() { MongoFactoryBean factory = new MongoFactoryBean(); factory.setHost(databaseHostName); MongoOptions options = new MongoOptions(); options.connectionsPerHost = 500; factory.setMongoOptions(options); return factory; } public @Bean MongoTemplate mongoTemplate(Mongo mongo) throws Exception { MongoTemplate mongoTemplate = new MongoTemplate(mongo, mongoDbDatabase); mongoTemplate.setWriteConcern(WriteConcern.SAFE); mongoTemplate.setWriteResultChecking(WriteResultChecking.EXCEPTION); return mongoTemplate; } } 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 72
  • 73.
    Summary o  Relational databasesare great but n  Object/relational impedance mismatch n  Relational schema is rigid n  Extremely difficult/impossible to scale writes n  Performance can be suboptimal o  Each NoSQL databases can solve some combination of those problems BUT n  Limited transactions n  One day needing ACID è major rewrite n  Query-driven, denormalized database design n  … è o  Carefully pick the NoSQL DB for your application o  Consider a polyglot persistence architecture 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 74
  • 74.
    Thank you! My contact info: chris@chrisrichardson.net @crichardson 8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 75