Chicago, October 19 - 22, 2010


Using Spring with NoSQL
databases
Mark Pollack
Chris Richardson
Goal of this talk



• The benefits and drawbacks of
       NoSQL databases
  • How the Spring Data project
   simplifies the development of
       NoSQL applications
About Chris


              •Grew up in England and live in Oakland, CA
              •Over 25+ years of software development
              experience including 14 years of Java
              •Speaker at JavaOne, SpringOne, NFJS,
              JavaPolis, Spring Experience, etc.
              •Organize the Oakland JUG and the Groovy
              Grails meetup




               http://www.theregister.co.uk/2009/08/19/springsource_cloud_foundry/
About Mark

• Spring committer since 2003
• Founder of Spring.NET
• Work on some other new open source projects
   – Spring Gemfire
   – Spring AMQP
• Used to work on large scale data collection/analysis in
  High Energy Physics (~10yrs ago)
Agenda

•   Why NoSQL?
•   Some NoSQL Databases
•   Spring NoSQL projects
•   Demos/Code/Deep dive
Why NoSQL?
Relational databases are great

• Well understood by developers
• Well supported by frameworks and tools, e.g. Spring JDBC,
  Hibernate, JPA
• Well understood by operations
   –   Configuration
   –   Care and feeding
   –   Backups
   –   Tuning
   –   Failure and recovery
   –   Performance characteristics



• But….
The trouble with relational database

• Object/relational impedance mismatch
   – Complicated to map rich domain model to relational schema
   – Relational schema is rigid
• Extremely difficult/impossible to scale writes:
   – Vertical scaling is limited/expensive
   – Horizontal scaling is limited or requires $$
• Performance can be suboptimal
Why NoSQL?
 Internet scale is the primary driver in the use of non-relational DBs
    Large Data Size – TBs to PBs
    High Transaction rates
    Need for High Availability




                                                Pictures courtesy of Gilt Group
Relational DBs
       Harder to scale to larger size
       Transactions and joins reduce access rates
Different priorities

• Relational Databases with ACID semantics
   – Atomic, Consistent, Isolated, Durable
• Internet Systems with BASE semantics
   – Basically Available, Scalable, Eventually Consistent

• NoSQL/BASE values remove standard DB behaviors
   – No Joins
   – Very limited "transactions"
   – Different Durability/consistency guarantees

• NoSQL datastores emerge as “point solutions”
   – Amazon/Google papers
   – Facebook, LinkedIn …
The NoSQL Landscape
   There are a variety of non-relational datastores
        Many are Open Source
        Many with companies behind them
        Rapidly changing feature sets
                                                                             Scalaris
Neo4J        db4o          Objectivity    Amazon Dynamo

                                                Terrastore        Gemfire     Sones
                   HBase           CouchDB
InfoGrid                                                RavenDB
           Hadoop       Scalaris                                      Tokyo Cabinet
                                          MongoDB
      Hypertable           Cassandra                    Azure Table         Coherence
Mnesia              Voldemort            Gigaspaces
           Riak
                                                                 Amazon SimpleDB
                      Redis      BerkeleyDB     MemcacheDB
           Tamino
                                                 InfiniteGraph        Chordless
  Expect consolidation over time
  With so many choices, picking which one to use can be difficult
Stops on a tour through NoSQL-land
 What is the data model?
 How to I access my data?
 What are the primary use cases?
 What notable companies that are using it?
Data Model
 Key Value
                                K1   V1
   Key -> value                 K2   V2
   Redis, Riak, Voldemort…      K3   V2

   “Amazon Dynamo inspired”


 Column
   Key -> ‘column families’
   HBase, Cassandra
   “Google Bigtable inspired”
Data Model
Document
  Collections containing semi-       { id: ‘4b2b9f67a1f631733d917a7b"),’
                                       author: ‘joe’,
  structured data                      tags : [‘example’, ‘db’],
                                       comments : [ { author: 'jim', comment: 'OK' },
                                                    { author: ‘ida', comment: ‘Bad' }
        key-values , JSON, XML,                   ]

  CouchDB, MongoDB                   { id: ‘4b2b9f67a1f631733d917a7c"),
                                       author: ‘ida’, ...
                                     { id: ‘4b2b9f67a1f631733d917a7d"),
                                       author: ‘jim’, ...




Graph
  Graph: nodes and edges
        Edges can be directional
        Each can contain key-value
        attributes
  Neo4j, Sones, InfiniteGraph
How to I access my data?
How to I access my data?
 Query Mechanism
   Key lookup
   Map-Reduce
   Query by example
   Query language
 ‘out of box’ query power usually linked to data model
 Increasing power
   KV -> Column -> Document/Graph
 Counter examples
   Riak is key-value and supports Map-Reduce
   Spring Data Mapper…


    .
Reality Check - Relational DBs are not going away

    NoSQL usage small
    by comparison…
    But growing…
Future = multi-paradigm data storage for
 enterprise applications




IEEE Software Sept/October 2010 - Debasish Ghosh / Twitter @debasishg
Agenda

•   Why NoSQL?
•   Some NoSQL Databases
•   Spring NoSQL projects
•   Demos/Code/Deep dive
Redis
• Advanced key-value store
   – Think memcached on steroids (the good kind)
   – Values can be binary strings, Lists, Sets, Ordered    K1   V1
     Sets, Hash maps, ..
   – Operations for each data type, e.g. appending         K2   V2
     to a list, adding to a set, retrieving a slice of a
                                                           K3   V2
     list, …
• Very fast: ~100K operations/second on entry-
  level hardware
• Persistent
   – Periodic snapshots of memory OR append
     commands to log file
   – Limits are size of keys retained in memory.
• Has “transactions”
• Libraries – Many languages, all separate
  open source projects
Redis CLI
redis> sadd myset a
(integer) 1
redis> sadd myset b
(integer) 1
redis> smembers myset
1. "a"
2. "b"
redis> srem myset a
(integer) 1
redis> smembers myset
1. "b"
Scaling Redis

• Master/slave replication
   – Tree of Redis servers
   – Non-persistent master can replicate to a persistent slave
   – Use slaves for read-only queries
• Sharding
   – Client-side only – consistent hashing based on key
   – Server-side sharding – coming one day
• Run multiple servers per physical host
   – Leverage multiple CPUs
   – 32 bit more efficient than 64 bit
• Optional "virtual memory"
   – Ideally data should fit in RAM
   – Values (not keys) written to disc
Redis use cases

• Use in conjunction with another database as the SOR
• Drop-in replacement for Memcached
    – Session state
    – Cache of data retrieved from SOR
•   Denormalized datastore for high-performance queries
•   Hit counts using INCR command
•   Randomly selecting an item – SRANDMEMBER
•   Queuing – Lists with LPOP, RPUSH, ….
•   High score tables – Sorted sets

• Notable users: github, guardian.co.uk, ….
Cassandra

• Originally developed by Facebook for inbox search
   – Now an Apache open-source project
• Extremely scalable
• Data is replicated and sharded
• The data model will hurt your brain
   • Column-oriented database
   • 4 or 5-dimensional hash map
Cassandra data model – insert/update
  My Column family (within a key space)
    Keys    Columns

     a      colA: value1    colB: value2   colC: value3          This can
                                                                 be data!
     b       colA: value    colD: value     colE: value



                      Insert(key=a, columName=colZ, value=foo)

    Keys    Columns

     a      colA: value1    colB: value2   colC: value3     colZ: foo


     b       colA: value    colD: value     colE: value
Cassandra query – range slice
   Keys    Columns

    a       colA: value1   colB: value2   colC: value3   colZ: foo


    b       colA: value    colD: value    colE: value



          rangeSlice(startKey=a, endKey=b,
                     startColumn=colA, endColumnName=colB)

   Keys    Columns

    a       colA: value1   colB: value2

    b       colA: value
Cassandra CLI

$ bin/cassandra-cli
cassandra> connect localhost/9160
Connected to "Test Cluster" on localhost/9160
cassandra> get Keyspace1.Standard1['foo']
Returned 0 results
cassandra> set Keyspace1.Standard1['foo']['a'] = '2'
Value inserted
cassandra> get Keyspace1.Standard1['foo']
=> (Column=61, value=1, timestamp...)
cassandra> set Keyspace1.Standard1['foo']['bar'] = 'abc'
Value inserted
cassandra> get Keyspace1.Standard1['foo']
=> (Column=626172, value=abc, timestamp...)
=> (Column=61, value=1, timestamp...)
cassandra> count Keyspace1.Standard1['foo']
2 columns
cassandra>
Scaling Cassandra through sharding/replication
                Keys = [D, A]                       •Client connects to any node
                                                    •Dynamically add/remove
                                                    nodes
                                Node 1
                                                    •Configurable # replicas
                           Token = A                •Reads/Writes specify how
                                                    many nodes
                Replicates to     Replicates to        Keys = [A, B]

           Node 4                                 Node 2
         Token = D                            Token = B

Keys = [C, D]   Replicates to                Replicates to

                                Node 3
                           Token = C

                                       Keys = [B, C]
Cassandra use cases

• Use cases
   • Big data
   • Persistent cache
   • (Write intensive) Logging

• Who is using it
   – Digg, Facebook, Twitter, Reddit, Rackspace
   – Cloudkick, Cisco, SimpleGeo, Ooyala, OpenX
   – “The largest production cluster has over 100 TB of data in
     over 150 machines.“ – Casssandra web site
MongoDB

• JSON-style documents
    – Lists, Maps, Primitive values
• Documents organized into collections (~table)
• Full or partial document updates
    – Transactional update in place on one document
    – Atomic Modifiers
•   GridFS for efficiently storing large files
•   Index support – secondary and compound
•   Rich query language for dynamic queries
•   Map/Reduce
•   Replication and Auto Sharding
MongoDB Data Model
                               {
                                 "name" : "Ajanta",
• BSON data types                "type" : "Indian",
                                 "serviceArea" : [
• Documents types                   "94619",
  – Data                            "94618"
                                 ],
  – Query                        "openingHours" : [
  – Modifier for updates            {
• Query and Modify                     "dayOfWeek" : Monday,
                                       "open" : 1730,
  operators                            "close" : 2130
  – $tl, $gt, $ne, $in, $or,        }
    $elemMatch, $orderBy,        ],
    $group,                      "_id" :
                               ObjectId("4bddc2f49d1505567c
  – $where + javascript
                               6220a0")
  – $inc, $set, $push, $pull   }
MongoDB query by example

• Find a restaurant that serves the 94619 zip code and is
  open at 6pm on a Monday
   {
     serviceArea:"94619",
     openingHours: {
       $elemMatch : {
             "dayOfWeek" : Monday,
             "open": {$lte: 1800},
             "close": {$gte: 1800}
        }
     }
   }       DBCursor cursor = collection.find(qbeObject);
           while (cursor.hasNext()) {
              DBObject o = cursor.next();
              …
            }
MongoDB CLI
$ bin/mongo
> use mydb
> r1 = {name: 'Ajanta'}
{name: 'Ajanta'}
> r2 = {name: 'Montclair Egg Shop'}
{name: 'Montclair Egg Shop'}
> db.restaurants.save(r1)
> r1
{ _id: ObjectId("98…"), name: "Ajanta"}
> db.restaurants.save(r2)
> r2
{ _id: ObjectId("66…"), name: "Montclair Egg Shop"}
> db.restaurants.find()
{ _id: ObjectId("98…"), name: "Ajanta"}
{ _id: ObjectId("66…"), name: "Montclair Egg Shop"}
> db.restaurants.find(name: /^A/)
{ _id: ObjectId("98…"), name: "Ajanta"}
Scaling MongoDB
   Shard 1                         Shard 2
             mongod                          mongod


     mongod      mongod              mongod      mongod



                                                         A shard
                                                         consists of a
Config Server                                            replica set =
                                                         generalization
  mongod              mongos            mongos           of master slave

  mongod
                                                       Collections
  mongod                                               spread over
                          client                      multiple shards
MongoDB use cases

• Use cases
   –   Real-time analytics
   –   Content management systems
   –   Single document partial update
   –   Caching
   –   High volume writes
• Who is using it?
   –   Shutterfly, Foursquare
   –   Bit.ly Intuit
   –   SourceForge, NY Times
   –   GILT Groupe, Evite,
   –   SugarCRM
CouchDB

• JSON documents
• Database is flat collection of documents
• Queries are implemented as views using Map/Reduce in
  Javascript
    – Queries typically defined up front
    – Views are indexed and incrementally updated
• MVCC based
    – Need to deal with reconciliation
•   Multi-Master (bi) replication
•   Scale out with replication
•   Mobile version - Android
•   Use as a Server-Side JavaScript application server
    – “CouchApps”
• Cloudant provides Java VM based views
CouchDB with curl

$ curl -X PUT http://localhost:5984/contacts
{"ok":true}

$ curl -X PUT http://localhost:5984/contacts/markpollack -d
'{"firstName":"Mark",
  "lastName":"Pollack",
  "email":"mpollack@vmware.com"}‘
{"ok":true,"id":"markpollack","rev":"1-7bb43f219a8d4306614dbd526ed577fe"}

$curl -X GET http://localhost:5984/contacts/markpollack
{"_id":"markpollack",
 "_rev":"1-7bb43f219a8d4306614dbd526ed577fe",
"firstName":"Mark",
"lastName":"Pollack",
"email":"mpollack@vmware.com"}
Futon
Futon
CouchDB use cases

• Use cases
   – Synchronization
      • master-master replication
      • Offline replication for mobile devices: “Ground Computing”
   – Web apps that consume JSON
      • Server side JavaScript using CouchApps
   – Content management systems
   – Caching
• Who is using it?
   – Mozilla, BBC, Palm, Apple,
   – IBM, Ubuntu, meebo, Engine Yard
   – Cloudant, UNICEF, Credit Suisse
Neo4j

• Graph Database
• DB is a collection of graph nodes, relationships
   – Nodes and relationships have properties
• Querying is done via a traversal API
• Indexes on node/relationship properties
• Written in Java, can be embedded
   – Also a REST API
• Transactional (ACID)
• Master-Slave replication in next release
Neo4j Data Model
Neo4j Use Cases

• Use Cases
   – Social networking
   – Bioinformatics
   – Fraud detection
• Who is using it?
   – Fanbox, Jayway
   – Large companies that wish to remain anonymous
Agenda

•   Why NoSQL?
•   Some NoSQL Databases
•   Spring NoSQL projects
•   Demos/Code/Deep dive
Spring Data Project Goals

• Bring classic Spring value propositions to NoSQL
   – Productivity
   – Programming model consistency
• Support for a wide range of NoSQL databases
• Many entry points to use
   –   Low level data access APIs
   –   Opinionated APIs (Think JdbcTemplate)
   –   Object Mapping (Java and GORM)
   –   Cross Store Persistence Programming model
   –   Productivity support in Roo and Grails
   –   Vertical Applications
   –   Guidance
Spring Data

• Working together with various projects/companies
   –   Redis / VMware
   –   Riak / Basho
   –   CouchDB / CouchOne
   –   Mongo / 10gen
   –   Neo4j / Neo Technologies
Spring Data Projects
 Data Commons                 Planned
   Polyglot persistence         Guidance Docs
 Datastore Key-Value               The big picture
   Redis                        Datastore Column
                                   Cassandra, HBase
 Datastore Document
                                Hadoop
   MongoDB, CouchDB
                                Datastore Key-Value
 Datastore Graph
                                   Riak
   Neo4j
                                Datastore Mapping
 Datastore Mapping
                                   Additional DB Map
   Redis, Gemfire
                                Blob storage
                                   Amazon, Atmos, Azure
 Reaching M1 release in Q4.     SQL - Generic DAOs
                                Grails/Roo support
Spring Datastore Key-Value (Redis)
 Interface based low-level Redis Client API
    Method names are Redis command names
    Pluggable implementations (Jedis, JRedis)
    Translation to Spring DAO exception hierarchy
    Connection Pooling
 RedisTemplate
    Descriptive method names, grouped into natural categories
       ServerOps, SetOps
       Convenience methods – ‘redis recipies’
    getAndConvert, convertAndSet for payload/value serialization
       Java, JSON, XML
       Protocol Buffers, Avro, to come…
 Redis backed java.util.Set, List, Map, Capped Collections
 RedisPlatformTransactionManager
 Redis Messaging
    RedisMessageTemplate, RedisMessageListenerContainer
Spring Redis Examples
RedisClientFactory clientFactory = new JedisClientFactory();
RedisClient client = clientFactory.createClient();
client.sadd("s1", "1"); client.sadd("s1", "2");
client.sadd("s1", "3");
client.sadd("s2", "2"); client.sadd("s2", "3");
Set<String> intersection = client.sinter("s1", "s2");
System.out.println(intersection); // [3,2]

RedisTemplate redisTemplate = new RedisTemplate(client)
Person p = new Person("Joe", "Trader", 33);
redisTemplate.convertAndSet("trader:1", p);
p = redisTemplate.getAndConvert("trader:1", Person.class);
intersection =
  redisTemplate.getSetOps().getIntersectionOfSets(“s1”,”s2”)

RedisSet s1 = new RedisSet(template, "s1");
s1.add("1"); s1.add("2"); s1.add("3");
RedisSet s2 = new RedisSet(template, "s2");
s2.add("2"); s2.add("3");
Set s3 = s1.intersection("s3", s1, s2); // [3,2]
Spring MongoDB

• MongoTemplate
  – ‘RowMapper’ like functionality for mapping Mongo query
    results.
  – Basic POJO mapping support for simple CRUD
     • Leverage Spring 3.0 TypeConverters and SpEL
  – Connection affinity callback
• Support common map-reduce operations from Mongo
  Cookbook
• Higher level integration
  – File Upload
  – Spring MVC activity monitoring and analysis
Spring MongoDB Example
Mongo mongo = new Mongo();
DB db = mongo.getDB("mvc");
MongoTemplate mongoTemplate = new MongoTemplate(db,
                                                "people");
Person p = new Person("Joe", "Trader", 33);
mongoTemplate.save(p);
List<People> people =
       mongoTemplate.queryForCollection(Person.class);
Spring CouchDB

 CouchTemplate
   RestTemplate based API for CouchDB
   Basic JSON/POJO mapping support
      CouchDbMappingJacksonHttpMessageConverter
   Android implementation
   .NET implementation planned
 Repository base classes
   Out of the box CRUD
   Embedded View Definitions
   Automatic view generation
 Plans to work with ektrop OS project
   http://code.google.com/p/ektorp/
Spring CouchDB Example
RestTemplate restTemplate = new RestTemplate();
restTemplate.setMessageConverters(couchDbConverter);

List<Restaurants> rests = (List<Restaurant>)
       restTemplate.getForObject(getDbURL() +
                "_design/demo/_view/all", Restaurant.class);


UserAccount userAccount = new UserAccount("u1");
restTemplate.put(getDbURL() + userAccount.getUserName(),
                 userAccount);

//Note, not handling response to get new revision string…
Ektrop CouchDB Example
Spring Neo4j

• Easy POJO based programming model
• Support for graph Entities, Relationships, and Properties
    – Properties leverage Spring 3.0 converters
• DAO finder support
    – findAll, count, findById, findByPropertyValue,
      findAllByTraversal

See Graph Database Persistence Using Neo4j with Spring and Roo
Tomorrow @ 10:15am in Lilac C
Spring Cross Store

• Use Case
   – An existing application want to use NOSQL database as a
     part of its existing SQL-based application
      • Pick a problem space for which NOSQL is well suited
• How to link together the two worlds?
   – Developer application code task
   – No longer a single domain entity
• Spring Cross store allows for a single entity to be
  persisted across multiple datastores
   – AspectJ manages NOSQL entities/fields in change sets
   – Change sets are flushed when JPA commits
   – Interception to rehydrate when performing JPA queries
Spring Cross Store Example
Spring Cross Store Example
Spring Cross Store Example
Spring Mapping

• GORM for Redis, Gemfire….
   – Built on pure Java
• Mapping metadata is not tied to GORM
   – Can use annotations, XML etc.
• Will be working on making this mapping framework
  available to Java developers
Spring Mapping
Spring Mapping
Agenda

•   Why NoSQL?
•   Some NoSQL Databases
•   Spring NoSQL projects
•   Demos/Code/Deep dive
Redis meets POJOs in Action
• Finding restaurants that are open at the delivery time (e.g.
  94619) and serve the delivery zip code (e.g. 6pm)
select r.*
from FTGO_RESTAURANT_ZIPCODE zip,
  FTGO_RESTAURANT_TIME_RANGE tr, FTGO_RESTAURANT r
where r.RESTAURANT_ID = zip.RESTAURANT_ID
  and tr.RESTAURANT_ID = r.RESTAURANT_ID;
  and zip.ZIPCODE = '94619'
  and tr.day_Of_week = 'Monday'
                                                     Joins and
  and tr.open_time <= 1800
                                                      complex
  and 1800<= tr.close_time                         where clause!


   No way Redis can handle this! Right?
Redis – denormalizing the data is the key
 Tr_id      Day of      Open_time Close_tim       Zip code
            Week                  e
 Ajanta10   Monday      1130        1430          94707
 Ajanta10   Monday      1130        1430          94619
 Ajanta11   Monday      1730        2130          94707
 Ajanta11   Monday      1730        2130          94619
 Egg30      Monday      0700        1430          94707
 …                                         Two simpler queries
(SELECT tr_id
 FROM time_range_zip_code
 where day_of_week = ? and zip_code = ? open_time < ?)
INTERSECTION
(SELECT tr_id
 FROM time_range_zip_code
 where day_of_week = ? and zip_code = ? and ? < close_time)
Redis – storing the opening hours using sorted
 sets
Sorted Set
                  Keys     Members = TimeRangeId with opening time as score

   open:94707:Monday       Ajanta10:1130       Ajanta11:1730

   open:94619:Monday       Egg30:0700      Ajanta10:1130        Ajanta11:1730




                                ZRANGEBYSCORE open:94619:Monday 0 1800



       {Egg30, Ajanta10, Ajanta11, …}      Finds all time ranges that open
                                           before 6pm
Redis – storing closing times with sorted sets

Ordered Set
                 Keys     Members = TimeRangeId with closing time as score

  close:94707:Monday      Ajanta10:1430      Ajanta11:2130

  close:94619:Monday      Egg30:1430      Ajanta10:1430      Ajanta11:2130




                        ZRANGEBYSCORE close:94619:Monday 1800 2400

                          Finds all time ranges that close
       {Ajanta11, …}      after 6pm
Redis – computing the intersection

             {Egg30, Ajanta10, Ajanta11, …}

                          Intersection

                     {Ajanta11, …}
                               =

                     {Ajanta11, …}



• This is super-cool!
• Illustrates the feasibility of using Redis as a queryable
  datastore to improve scalability
Cassandra – Food to Go
                            AvailableRestaurants Column Family
        Keys   Columns

               1430
    94619_
                                                       Super
    Monday_       Egg30: {…}                           column
     0700


               1430
    94619_
    Monday_      Ajanta10: {…}
     1130


               2130
    94619_
    Monday_      Ajanta11: {…}
     1730
Cassandra – Food to Go
   rangeSlice(startKey=94619_Monday_0000,
             endKey=94619_Monday_1800,
             startColumn=1800,
             endColumnName=2359)




        Keys   Columns


                 2130
    94619_
    Monday_         Ajanta11: {…}
     1730
Food to Go - MongoDB
{                               {
  "name" : "Ajanta",                serviceArea:"94619",
  "type" : "Indian",                openingHours: {
  "serviceArea" : [                   $elemMatch : {
     "94619",                               "dayOfWeek" : Monday,
     "94618"                                "open": {$lte: 1800},
  ],                                        "close": {$gte: 1800}
  "openingHours" : [                   }
     {                              }
        "dayOfWeek" : Monday,   }
        "open" : 1730,
        "close" : 2130
     }
  ],                            Straightforward to model a
  "_id" :                       restaurant aggregate as a MongoDB
ObjectId("4bddc2f49d1505567c    document and execute queries
6220a0")
}
MongoDB – MVC Analytics

• Monitoring page view and web application activity is
  critical to analyzing effectiveness of
   – Marketing campaigns
   – Site design / new features
• Standard solutions are problematic aside from scaling
   – Google Analytics
      • not real time, hard to setup
   – MySQL
      • Downtime for schema changes
• Solutions needs to handle high write volume, handle large
  amounts of data, allow for flexible data exploration.
Web Analytics - SQL
                                     Still difficult to
                                     scale writes

            Application    MySQL
              Server       Master



            Reporting
                          MySQL
             Server        MySQL
                          Slave
                            MySQL
                           Slave
                             Slave
Web Analytics - MongoDB


           Application
             Server

                          MongoD
            Reporting
             Server
Web Analytics - MongoDB


           Application             MongoD
             Server

                          MongoS
            Reporting
             Server                MongoD
Spring MVC based solution

• Custom Handler Adapter with a ActionInterceptor
• ActionInterceptor has full context of web action
   –   Controller class, method name, and parameters
   –   WebRequest
   –   ModelAndView
   –   Exception
• Similar to Ruby on Rails/ASP.NET MVC Action Filters
• Collect data in MongoDB
   – MvcEvent
   – Increment counters per controller and action method
• AnalyticsController to display results
   – Action method parameter information is critical for reporting


• Also a good fit for an Insight plugin!
Collected Sample Data - Counters


{ "_id" : ObjectId("4cba8e0a5a49000000004961"),
 "name" : "SignUpController",
 "count" : 8,
 "methods" : { "create" : 2, "createForm" : 2, "show" : 4 }
}
{ "_id" : ObjectId("4cba8ea85a49000000004962"),
 "name" : "RestaurantController",
 "count" : 9,
 "methods" : { "addFavoriteRestaurant" : 2, "list" : 5, "show" : 2 }
}
Collected Sample Data
{
  "_id" : ObjectId("4cba929e4b445bc779b65c5f"),
  "action" : "addFavoriteRestaurant",
  "controller" : "RestaurantController",
  "date" : "Sun Oct 17 2010 02:07:26 GMT-0400 (Eastern Daylight Time)",
  "parameters" : {
    "p1" : "5",
    "p2" : "1",
    "p3" : "{currentUserAccountId= 1, userAccount_birthdate_date_format=M/d/yy,
useraccount=Id: 1, Version: 1, UserName: mpollack, FirstName: Mark, LastName:
Pollack, BirthDate: 2010-10-27 00:00:00.0, Favorites: 1, itemId=5}" },
  "remoteUser" : "mpollack",
  "requestAddress" : "127.0.0.1",
  "requestUri" : "/myrestaurants-analytics/restaurants/5/1"
}
Restaurants Web Application
Favorite Restaurants Analysis
Controller Invocation
CouchDB – Mobile offline access and sync

•   Smartphones have huge data capacity
•   Can carry large data sets with you.
•   Mobile devices are not always connected
•   Data sync is a difficult problem to solve
    – CouchOne Mobile handles the dirty bits of sync
    – Available for Android
• Same programming model as on server side
• Spring RestTemplate based CouchDB API
    – On Server and Phone
• Can also be used a full-stack environment on mobile
    – Database, JavaScript, Web Server
CouchDB – Offline data access and sync


               Continuous
               bi-directional
               replication      Application
                                  Server
Sync Favorite Restaurants
Summary

• Many new options for persistence
   – A particular use case in your application might be a good fit
   – Polyglot persistence will be increasingly common
• Trade-offs with
   – Normalization
   – Transactions
   – Familiarity
• The Spring Data project
   – Make it easier to build Spring-powered applications that use
     these new technologies
• More info at http://www.springframework.org/spring-data
   – Welcome contributors, comments…
Q&A



Chris Richardson         Mark Pollack
crichardson@vmware.com   mpollack@vmware.com
@crichardson             @markpollack

Spring one2gx2010 spring-nonrelational_data

  • 1.
    Chicago, October 19- 22, 2010 Using Spring with NoSQL databases Mark Pollack Chris Richardson
  • 2.
    Goal of thistalk • The benefits and drawbacks of NoSQL databases • How the Spring Data project simplifies the development of NoSQL applications
  • 3.
    About Chris •Grew up in England and live in Oakland, CA •Over 25+ years of software development experience including 14 years of Java •Speaker at JavaOne, SpringOne, NFJS, JavaPolis, Spring Experience, etc. •Organize the Oakland JUG and the Groovy Grails meetup http://www.theregister.co.uk/2009/08/19/springsource_cloud_foundry/
  • 4.
    About Mark • Springcommitter since 2003 • Founder of Spring.NET • Work on some other new open source projects – Spring Gemfire – Spring AMQP • Used to work on large scale data collection/analysis in High Energy Physics (~10yrs ago)
  • 5.
    Agenda • Why NoSQL? • Some NoSQL Databases • Spring NoSQL projects • Demos/Code/Deep dive
  • 6.
  • 7.
    Relational databases aregreat • Well understood by developers • Well supported by frameworks and tools, e.g. Spring JDBC, Hibernate, JPA • Well understood by operations – Configuration – Care and feeding – Backups – Tuning – Failure and recovery – Performance characteristics • But….
  • 8.
    The trouble withrelational database • Object/relational impedance mismatch – Complicated to map rich domain model to relational schema – Relational schema is rigid • Extremely difficult/impossible to scale writes: – Vertical scaling is limited/expensive – Horizontal scaling is limited or requires $$ • Performance can be suboptimal
  • 9.
    Why NoSQL? Internetscale is the primary driver in the use of non-relational DBs Large Data Size – TBs to PBs High Transaction rates Need for High Availability Pictures courtesy of Gilt Group Relational DBs Harder to scale to larger size Transactions and joins reduce access rates
  • 10.
    Different priorities • RelationalDatabases with ACID semantics – Atomic, Consistent, Isolated, Durable • Internet Systems with BASE semantics – Basically Available, Scalable, Eventually Consistent • NoSQL/BASE values remove standard DB behaviors – No Joins – Very limited "transactions" – Different Durability/consistency guarantees • NoSQL datastores emerge as “point solutions” – Amazon/Google papers – Facebook, LinkedIn …
  • 11.
    The NoSQL Landscape There are a variety of non-relational datastores Many are Open Source Many with companies behind them Rapidly changing feature sets Scalaris Neo4J db4o Objectivity Amazon Dynamo Terrastore Gemfire Sones HBase CouchDB InfoGrid RavenDB Hadoop Scalaris Tokyo Cabinet MongoDB Hypertable Cassandra Azure Table Coherence Mnesia Voldemort Gigaspaces Riak Amazon SimpleDB Redis BerkeleyDB MemcacheDB Tamino InfiniteGraph Chordless Expect consolidation over time With so many choices, picking which one to use can be difficult
  • 12.
    Stops on atour through NoSQL-land What is the data model? How to I access my data? What are the primary use cases? What notable companies that are using it?
  • 13.
    Data Model KeyValue K1 V1 Key -> value K2 V2 Redis, Riak, Voldemort… K3 V2 “Amazon Dynamo inspired” Column Key -> ‘column families’ HBase, Cassandra “Google Bigtable inspired”
  • 14.
    Data Model Document Collections containing semi- { id: ‘4b2b9f67a1f631733d917a7b"),’ author: ‘joe’, structured data tags : [‘example’, ‘db’], comments : [ { author: 'jim', comment: 'OK' }, { author: ‘ida', comment: ‘Bad' } key-values , JSON, XML, ] CouchDB, MongoDB { id: ‘4b2b9f67a1f631733d917a7c"), author: ‘ida’, ... { id: ‘4b2b9f67a1f631733d917a7d"), author: ‘jim’, ... Graph Graph: nodes and edges Edges can be directional Each can contain key-value attributes Neo4j, Sones, InfiniteGraph
  • 15.
    How to Iaccess my data?
  • 16.
    How to Iaccess my data? Query Mechanism Key lookup Map-Reduce Query by example Query language ‘out of box’ query power usually linked to data model Increasing power KV -> Column -> Document/Graph Counter examples Riak is key-value and supports Map-Reduce Spring Data Mapper… .
  • 17.
    Reality Check -Relational DBs are not going away NoSQL usage small by comparison… But growing…
  • 18.
    Future = multi-paradigmdata storage for enterprise applications IEEE Software Sept/October 2010 - Debasish Ghosh / Twitter @debasishg
  • 19.
    Agenda • Why NoSQL? • Some NoSQL Databases • Spring NoSQL projects • Demos/Code/Deep dive
  • 20.
    Redis • Advanced key-valuestore – Think memcached on steroids (the good kind) – Values can be binary strings, Lists, Sets, Ordered K1 V1 Sets, Hash maps, .. – Operations for each data type, e.g. appending K2 V2 to a list, adding to a set, retrieving a slice of a K3 V2 list, … • Very fast: ~100K operations/second on entry- level hardware • Persistent – Periodic snapshots of memory OR append commands to log file – Limits are size of keys retained in memory. • Has “transactions” • Libraries – Many languages, all separate open source projects
  • 21.
    Redis CLI redis> saddmyset a (integer) 1 redis> sadd myset b (integer) 1 redis> smembers myset 1. "a" 2. "b" redis> srem myset a (integer) 1 redis> smembers myset 1. "b"
  • 22.
    Scaling Redis • Master/slavereplication – Tree of Redis servers – Non-persistent master can replicate to a persistent slave – Use slaves for read-only queries • Sharding – Client-side only – consistent hashing based on key – Server-side sharding – coming one day • Run multiple servers per physical host – Leverage multiple CPUs – 32 bit more efficient than 64 bit • Optional "virtual memory" – Ideally data should fit in RAM – Values (not keys) written to disc
  • 23.
    Redis use cases •Use in conjunction with another database as the SOR • Drop-in replacement for Memcached – Session state – Cache of data retrieved from SOR • Denormalized datastore for high-performance queries • Hit counts using INCR command • Randomly selecting an item – SRANDMEMBER • Queuing – Lists with LPOP, RPUSH, …. • High score tables – Sorted sets • Notable users: github, guardian.co.uk, ….
  • 24.
    Cassandra • Originally developedby Facebook for inbox search – Now an Apache open-source project • Extremely scalable • Data is replicated and sharded • The data model will hurt your brain • Column-oriented database • 4 or 5-dimensional hash map
  • 25.
    Cassandra data model– insert/update My Column family (within a key space) Keys Columns a colA: value1 colB: value2 colC: value3 This can be data! b colA: value colD: value colE: value Insert(key=a, columName=colZ, value=foo) Keys Columns a colA: value1 colB: value2 colC: value3 colZ: foo b colA: value colD: value colE: value
  • 26.
    Cassandra query –range slice Keys Columns a colA: value1 colB: value2 colC: value3 colZ: foo b colA: value colD: value colE: value rangeSlice(startKey=a, endKey=b, startColumn=colA, endColumnName=colB) Keys Columns a colA: value1 colB: value2 b colA: value
  • 27.
    Cassandra CLI $ bin/cassandra-cli cassandra>connect localhost/9160 Connected to "Test Cluster" on localhost/9160 cassandra> get Keyspace1.Standard1['foo'] Returned 0 results cassandra> set Keyspace1.Standard1['foo']['a'] = '2' Value inserted cassandra> get Keyspace1.Standard1['foo'] => (Column=61, value=1, timestamp...) cassandra> set Keyspace1.Standard1['foo']['bar'] = 'abc' Value inserted cassandra> get Keyspace1.Standard1['foo'] => (Column=626172, value=abc, timestamp...) => (Column=61, value=1, timestamp...) cassandra> count Keyspace1.Standard1['foo'] 2 columns cassandra>
  • 28.
    Scaling Cassandra throughsharding/replication Keys = [D, A] •Client connects to any node •Dynamically add/remove nodes Node 1 •Configurable # replicas Token = A •Reads/Writes specify how many nodes Replicates to Replicates to Keys = [A, B] Node 4 Node 2 Token = D Token = B Keys = [C, D] Replicates to Replicates to Node 3 Token = C Keys = [B, C]
  • 29.
    Cassandra use cases •Use cases • Big data • Persistent cache • (Write intensive) Logging • Who is using it – Digg, Facebook, Twitter, Reddit, Rackspace – Cloudkick, Cisco, SimpleGeo, Ooyala, OpenX – “The largest production cluster has over 100 TB of data in over 150 machines.“ – Casssandra web site
  • 30.
    MongoDB • JSON-style documents – Lists, Maps, Primitive values • Documents organized into collections (~table) • Full or partial document updates – Transactional update in place on one document – Atomic Modifiers • GridFS for efficiently storing large files • Index support – secondary and compound • Rich query language for dynamic queries • Map/Reduce • Replication and Auto Sharding
  • 31.
    MongoDB Data Model { "name" : "Ajanta", • BSON data types "type" : "Indian", "serviceArea" : [ • Documents types "94619", – Data "94618" ], – Query "openingHours" : [ – Modifier for updates { • Query and Modify "dayOfWeek" : Monday, "open" : 1730, operators "close" : 2130 – $tl, $gt, $ne, $in, $or, } $elemMatch, $orderBy, ], $group, "_id" : ObjectId("4bddc2f49d1505567c – $where + javascript 6220a0") – $inc, $set, $push, $pull }
  • 32.
    MongoDB query byexample • Find a restaurant that serves the 94619 zip code and is open at 6pm on a Monday { serviceArea:"94619", openingHours: { $elemMatch : { "dayOfWeek" : Monday, "open": {$lte: 1800}, "close": {$gte: 1800} } } } DBCursor cursor = collection.find(qbeObject); while (cursor.hasNext()) { DBObject o = cursor.next(); … }
  • 33.
    MongoDB CLI $ bin/mongo >use mydb > r1 = {name: 'Ajanta'} {name: 'Ajanta'} > r2 = {name: 'Montclair Egg Shop'} {name: 'Montclair Egg Shop'} > db.restaurants.save(r1) > r1 { _id: ObjectId("98…"), name: "Ajanta"} > db.restaurants.save(r2) > r2 { _id: ObjectId("66…"), name: "Montclair Egg Shop"} > db.restaurants.find() { _id: ObjectId("98…"), name: "Ajanta"} { _id: ObjectId("66…"), name: "Montclair Egg Shop"} > db.restaurants.find(name: /^A/) { _id: ObjectId("98…"), name: "Ajanta"}
  • 34.
    Scaling MongoDB Shard 1 Shard 2 mongod mongod mongod mongod mongod mongod A shard consists of a Config Server replica set = generalization mongod mongos mongos of master slave mongod Collections mongod spread over client multiple shards
  • 35.
    MongoDB use cases •Use cases – Real-time analytics – Content management systems – Single document partial update – Caching – High volume writes • Who is using it? – Shutterfly, Foursquare – Bit.ly Intuit – SourceForge, NY Times – GILT Groupe, Evite, – SugarCRM
  • 36.
    CouchDB • JSON documents •Database is flat collection of documents • Queries are implemented as views using Map/Reduce in Javascript – Queries typically defined up front – Views are indexed and incrementally updated • MVCC based – Need to deal with reconciliation • Multi-Master (bi) replication • Scale out with replication • Mobile version - Android • Use as a Server-Side JavaScript application server – “CouchApps” • Cloudant provides Java VM based views
  • 37.
    CouchDB with curl $curl -X PUT http://localhost:5984/contacts {"ok":true} $ curl -X PUT http://localhost:5984/contacts/markpollack -d '{"firstName":"Mark", "lastName":"Pollack", "email":"mpollack@vmware.com"}‘ {"ok":true,"id":"markpollack","rev":"1-7bb43f219a8d4306614dbd526ed577fe"} $curl -X GET http://localhost:5984/contacts/markpollack {"_id":"markpollack", "_rev":"1-7bb43f219a8d4306614dbd526ed577fe", "firstName":"Mark", "lastName":"Pollack", "email":"mpollack@vmware.com"}
  • 38.
  • 39.
  • 40.
    CouchDB use cases •Use cases – Synchronization • master-master replication • Offline replication for mobile devices: “Ground Computing” – Web apps that consume JSON • Server side JavaScript using CouchApps – Content management systems – Caching • Who is using it? – Mozilla, BBC, Palm, Apple, – IBM, Ubuntu, meebo, Engine Yard – Cloudant, UNICEF, Credit Suisse
  • 41.
    Neo4j • Graph Database •DB is a collection of graph nodes, relationships – Nodes and relationships have properties • Querying is done via a traversal API • Indexes on node/relationship properties • Written in Java, can be embedded – Also a REST API • Transactional (ACID) • Master-Slave replication in next release
  • 42.
  • 43.
    Neo4j Use Cases •Use Cases – Social networking – Bioinformatics – Fraud detection • Who is using it? – Fanbox, Jayway – Large companies that wish to remain anonymous
  • 44.
    Agenda • Why NoSQL? • Some NoSQL Databases • Spring NoSQL projects • Demos/Code/Deep dive
  • 45.
    Spring Data ProjectGoals • Bring classic Spring value propositions to NoSQL – Productivity – Programming model consistency • Support for a wide range of NoSQL databases • Many entry points to use – Low level data access APIs – Opinionated APIs (Think JdbcTemplate) – Object Mapping (Java and GORM) – Cross Store Persistence Programming model – Productivity support in Roo and Grails – Vertical Applications – Guidance
  • 46.
    Spring Data • Workingtogether with various projects/companies – Redis / VMware – Riak / Basho – CouchDB / CouchOne – Mongo / 10gen – Neo4j / Neo Technologies
  • 47.
    Spring Data Projects Data Commons Planned Polyglot persistence Guidance Docs Datastore Key-Value The big picture Redis Datastore Column Cassandra, HBase Datastore Document Hadoop MongoDB, CouchDB Datastore Key-Value Datastore Graph Riak Neo4j Datastore Mapping Datastore Mapping Additional DB Map Redis, Gemfire Blob storage Amazon, Atmos, Azure Reaching M1 release in Q4. SQL - Generic DAOs Grails/Roo support
  • 48.
    Spring Datastore Key-Value(Redis) Interface based low-level Redis Client API Method names are Redis command names Pluggable implementations (Jedis, JRedis) Translation to Spring DAO exception hierarchy Connection Pooling RedisTemplate Descriptive method names, grouped into natural categories ServerOps, SetOps Convenience methods – ‘redis recipies’ getAndConvert, convertAndSet for payload/value serialization Java, JSON, XML Protocol Buffers, Avro, to come… Redis backed java.util.Set, List, Map, Capped Collections RedisPlatformTransactionManager Redis Messaging RedisMessageTemplate, RedisMessageListenerContainer
  • 49.
    Spring Redis Examples RedisClientFactoryclientFactory = new JedisClientFactory(); RedisClient client = clientFactory.createClient(); client.sadd("s1", "1"); client.sadd("s1", "2"); client.sadd("s1", "3"); client.sadd("s2", "2"); client.sadd("s2", "3"); Set<String> intersection = client.sinter("s1", "s2"); System.out.println(intersection); // [3,2] RedisTemplate redisTemplate = new RedisTemplate(client) Person p = new Person("Joe", "Trader", 33); redisTemplate.convertAndSet("trader:1", p); p = redisTemplate.getAndConvert("trader:1", Person.class); intersection = redisTemplate.getSetOps().getIntersectionOfSets(“s1”,”s2”) RedisSet s1 = new RedisSet(template, "s1"); s1.add("1"); s1.add("2"); s1.add("3"); RedisSet s2 = new RedisSet(template, "s2"); s2.add("2"); s2.add("3"); Set s3 = s1.intersection("s3", s1, s2); // [3,2]
  • 50.
    Spring MongoDB • MongoTemplate – ‘RowMapper’ like functionality for mapping Mongo query results. – Basic POJO mapping support for simple CRUD • Leverage Spring 3.0 TypeConverters and SpEL – Connection affinity callback • Support common map-reduce operations from Mongo Cookbook • Higher level integration – File Upload – Spring MVC activity monitoring and analysis
  • 51.
    Spring MongoDB Example Mongomongo = new Mongo(); DB db = mongo.getDB("mvc"); MongoTemplate mongoTemplate = new MongoTemplate(db, "people"); Person p = new Person("Joe", "Trader", 33); mongoTemplate.save(p); List<People> people = mongoTemplate.queryForCollection(Person.class);
  • 52.
    Spring CouchDB CouchTemplate RestTemplate based API for CouchDB Basic JSON/POJO mapping support CouchDbMappingJacksonHttpMessageConverter Android implementation .NET implementation planned Repository base classes Out of the box CRUD Embedded View Definitions Automatic view generation Plans to work with ektrop OS project http://code.google.com/p/ektorp/
  • 53.
    Spring CouchDB Example RestTemplaterestTemplate = new RestTemplate(); restTemplate.setMessageConverters(couchDbConverter); List<Restaurants> rests = (List<Restaurant>) restTemplate.getForObject(getDbURL() + "_design/demo/_view/all", Restaurant.class); UserAccount userAccount = new UserAccount("u1"); restTemplate.put(getDbURL() + userAccount.getUserName(), userAccount); //Note, not handling response to get new revision string…
  • 54.
  • 55.
    Spring Neo4j • EasyPOJO based programming model • Support for graph Entities, Relationships, and Properties – Properties leverage Spring 3.0 converters • DAO finder support – findAll, count, findById, findByPropertyValue, findAllByTraversal See Graph Database Persistence Using Neo4j with Spring and Roo Tomorrow @ 10:15am in Lilac C
  • 56.
    Spring Cross Store •Use Case – An existing application want to use NOSQL database as a part of its existing SQL-based application • Pick a problem space for which NOSQL is well suited • How to link together the two worlds? – Developer application code task – No longer a single domain entity • Spring Cross store allows for a single entity to be persisted across multiple datastores – AspectJ manages NOSQL entities/fields in change sets – Change sets are flushed when JPA commits – Interception to rehydrate when performing JPA queries
  • 57.
  • 58.
  • 59.
  • 60.
    Spring Mapping • GORMfor Redis, Gemfire…. – Built on pure Java • Mapping metadata is not tied to GORM – Can use annotations, XML etc. • Will be working on making this mapping framework available to Java developers
  • 61.
  • 62.
  • 63.
    Agenda • Why NoSQL? • Some NoSQL Databases • Spring NoSQL projects • Demos/Code/Deep dive
  • 64.
    Redis meets POJOsin Action • Finding restaurants that are open at the delivery time (e.g. 94619) and serve the delivery zip code (e.g. 6pm) select r.* from FTGO_RESTAURANT_ZIPCODE zip, FTGO_RESTAURANT_TIME_RANGE tr, FTGO_RESTAURANT r where r.RESTAURANT_ID = zip.RESTAURANT_ID and tr.RESTAURANT_ID = r.RESTAURANT_ID; and zip.ZIPCODE = '94619' and tr.day_Of_week = 'Monday' Joins and and tr.open_time <= 1800 complex and 1800<= tr.close_time where clause! No way Redis can handle this! Right?
  • 65.
    Redis – denormalizingthe data is the key Tr_id Day of Open_time Close_tim Zip code Week e Ajanta10 Monday 1130 1430 94707 Ajanta10 Monday 1130 1430 94619 Ajanta11 Monday 1730 2130 94707 Ajanta11 Monday 1730 2130 94619 Egg30 Monday 0700 1430 94707 … Two simpler queries (SELECT tr_id FROM time_range_zip_code where day_of_week = ? and zip_code = ? open_time < ?) INTERSECTION (SELECT tr_id FROM time_range_zip_code where day_of_week = ? and zip_code = ? and ? < close_time)
  • 66.
    Redis – storingthe opening hours using sorted sets Sorted Set Keys Members = TimeRangeId with opening time as score open:94707:Monday Ajanta10:1130 Ajanta11:1730 open:94619:Monday Egg30:0700 Ajanta10:1130 Ajanta11:1730 ZRANGEBYSCORE open:94619:Monday 0 1800 {Egg30, Ajanta10, Ajanta11, …} Finds all time ranges that open before 6pm
  • 67.
    Redis – storingclosing times with sorted sets Ordered Set Keys Members = TimeRangeId with closing time as score close:94707:Monday Ajanta10:1430 Ajanta11:2130 close:94619:Monday Egg30:1430 Ajanta10:1430 Ajanta11:2130 ZRANGEBYSCORE close:94619:Monday 1800 2400 Finds all time ranges that close {Ajanta11, …} after 6pm
  • 68.
    Redis – computingthe intersection {Egg30, Ajanta10, Ajanta11, …} Intersection {Ajanta11, …} = {Ajanta11, …} • This is super-cool! • Illustrates the feasibility of using Redis as a queryable datastore to improve scalability
  • 69.
    Cassandra – Foodto Go AvailableRestaurants Column Family Keys Columns 1430 94619_ Super Monday_ Egg30: {…} column 0700 1430 94619_ Monday_ Ajanta10: {…} 1130 2130 94619_ Monday_ Ajanta11: {…} 1730
  • 70.
    Cassandra – Foodto Go rangeSlice(startKey=94619_Monday_0000, endKey=94619_Monday_1800, startColumn=1800, endColumnName=2359) Keys Columns 2130 94619_ Monday_ Ajanta11: {…} 1730
  • 71.
    Food to Go- MongoDB { { "name" : "Ajanta", serviceArea:"94619", "type" : "Indian", openingHours: { "serviceArea" : [ $elemMatch : { "94619", "dayOfWeek" : Monday, "94618" "open": {$lte: 1800}, ], "close": {$gte: 1800} "openingHours" : [ } { } "dayOfWeek" : Monday, } "open" : 1730, "close" : 2130 } ], Straightforward to model a "_id" : restaurant aggregate as a MongoDB ObjectId("4bddc2f49d1505567c document and execute queries 6220a0") }
  • 72.
    MongoDB – MVCAnalytics • Monitoring page view and web application activity is critical to analyzing effectiveness of – Marketing campaigns – Site design / new features • Standard solutions are problematic aside from scaling – Google Analytics • not real time, hard to setup – MySQL • Downtime for schema changes • Solutions needs to handle high write volume, handle large amounts of data, allow for flexible data exploration.
  • 73.
    Web Analytics -SQL Still difficult to scale writes Application MySQL Server Master Reporting MySQL Server MySQL Slave MySQL Slave Slave
  • 74.
    Web Analytics -MongoDB Application Server MongoD Reporting Server
  • 75.
    Web Analytics -MongoDB Application MongoD Server MongoS Reporting Server MongoD
  • 76.
    Spring MVC basedsolution • Custom Handler Adapter with a ActionInterceptor • ActionInterceptor has full context of web action – Controller class, method name, and parameters – WebRequest – ModelAndView – Exception • Similar to Ruby on Rails/ASP.NET MVC Action Filters • Collect data in MongoDB – MvcEvent – Increment counters per controller and action method • AnalyticsController to display results – Action method parameter information is critical for reporting • Also a good fit for an Insight plugin!
  • 77.
    Collected Sample Data- Counters { "_id" : ObjectId("4cba8e0a5a49000000004961"), "name" : "SignUpController", "count" : 8, "methods" : { "create" : 2, "createForm" : 2, "show" : 4 } } { "_id" : ObjectId("4cba8ea85a49000000004962"), "name" : "RestaurantController", "count" : 9, "methods" : { "addFavoriteRestaurant" : 2, "list" : 5, "show" : 2 } }
  • 78.
    Collected Sample Data { "_id" : ObjectId("4cba929e4b445bc779b65c5f"), "action" : "addFavoriteRestaurant", "controller" : "RestaurantController", "date" : "Sun Oct 17 2010 02:07:26 GMT-0400 (Eastern Daylight Time)", "parameters" : { "p1" : "5", "p2" : "1", "p3" : "{currentUserAccountId= 1, userAccount_birthdate_date_format=M/d/yy, useraccount=Id: 1, Version: 1, UserName: mpollack, FirstName: Mark, LastName: Pollack, BirthDate: 2010-10-27 00:00:00.0, Favorites: 1, itemId=5}" }, "remoteUser" : "mpollack", "requestAddress" : "127.0.0.1", "requestUri" : "/myrestaurants-analytics/restaurants/5/1" }
  • 79.
  • 80.
  • 81.
  • 82.
    CouchDB – Mobileoffline access and sync • Smartphones have huge data capacity • Can carry large data sets with you. • Mobile devices are not always connected • Data sync is a difficult problem to solve – CouchOne Mobile handles the dirty bits of sync – Available for Android • Same programming model as on server side • Spring RestTemplate based CouchDB API – On Server and Phone • Can also be used a full-stack environment on mobile – Database, JavaScript, Web Server
  • 83.
    CouchDB – Offlinedata access and sync Continuous bi-directional replication Application Server
  • 84.
  • 85.
    Summary • Many newoptions for persistence – A particular use case in your application might be a good fit – Polyglot persistence will be increasingly common • Trade-offs with – Normalization – Transactions – Familiarity • The Spring Data project – Make it easier to build Spring-powered applications that use these new technologies • More info at http://www.springframework.org/spring-data – Welcome contributors, comments…
  • 86.
    Q&A Chris Richardson Mark Pollack crichardson@vmware.com mpollack@vmware.com @crichardson @markpollack