NoSQL / Spring Data

Polyglot Persistence – An introduction to Spring Data
Pronam Chatterjee
pronamc@vmware.com




                                                        © 2011 VMware Inc. All rights reserved
Presentation goal



    How Spring Data simplifies the
           development of NoSQL
                    applications




2
Agenda

•   Why NoSQL?
•   Overview of NoSQL databases
•   Introduction to Spring Data
•   Database APIs
      - MongoDB
      - HyperSQL
      - Neo4J




3
Relational databases are great

• SQL = Rich, declarative query language
• Database enforces referential integrity
• ACID semantics
• Well understood by developers
• Well supported by frameworks and tools, e.g. Spring JDBC, Hibernate, JPA
• Well understood by operations
 • Configuration
 • Care and feeding
 • Backups
 • Tuning
 • Failure and recovery
 • Performance characteristics
• But….




    4
The trouble with relational databases

• Object/relational impedance mismatch
 - Complicated to map rich domain model to relational schema
• Relational schema is rigid
 - Difficult to handle semi-structured data, e.g. varying attributes
 - Schema changes = downtime or $$
• Extremely difficult/impossible to scale writes:
 - Vertical scaling is limited/requires $$
 - Horizontal scaling is limited or requires $$
• Performance can be suboptimal for some use cases




  5
NoSQL databases have emerged…

Each one offers some combination of:
• High performance
• High scalability
• Rich data-model
• Schema less
In return for:
• Limited transactions
• Relaxed consistency
•…




 6
… but there are few commonalities

• Everyone and their dog has written one
• Different data models
 - Key-value
 - Column
 - Document
 - Graph
• Different APIs – No JDBC, Hibernate, JPA (generally)
• “Same sorry state as the database market in the 1970s before SQL was
    invented” http://queue.acm.org/detail.cfm?id=1961297




7
NoSQL databases have emerged…

    • NoSQL usage small by
      comparison…
    • But growing…




8
Agenda
• Why NoSQL?
• Overview of NoSQL databases
• Introduction to Spring Data
• Database APIs
       - MongoDB
       - HyperSQL
       - Neo4J




  10
Redis
• Advanced key-value store
 - Think memcached on steroids (the good kind)
 - Values can be binary strings, Lists, Sets, Ordered Sets, Hash maps, ..
 - Operations for each data type, e.g. appending to a list, adding to a
   set, retrieving a slice of a list, …
 - Provides pub/sub-based messaging                                         K1   V1

• Very fast:                                                                K2   V2
 - In-memory operations
 - ~100K operations/second on entry-level hardware                          K3   V2

• Persistent
 - Periodic snapshots of memory OR append commands to log file
 - Limits are size of keys retained in memory.
• Has “transactions”
 - Commands can be batched and executed atomically




  11
Scaling Redis

• Master/slave replication
 - Tree of Redis servers
 - Non-persistent master can replicate to a persistent slave
 - Use slaves for read-only queries
• Sharding
 - Client-side only – consistent hashing based on key
 - Server-side sharding – coming one day
• Run multiple servers per physical host
 - Server is single threaded => Leverage multiple CPUs
 - 32 bit more efficient than 64 bit
• Optional "virtual memory"
 - Ideally data should fit in RAM
 - Values (not keys) written to disc




 13
Redis use cases
• Use in conjunction with another database as the SOR
• Drop-in replacement for Memcached
  - Session state
  - Cache of data retrieved from SOR
  - Denormalized datastore for high-performance queries
• Hit counts using INCR command
• Randomly selecting an item – SRANDMEMBER
• Queuing – Lists with LPOP, RPUSH, ….
• High score tables – Sorted sets

Notable users: github, guardian.co.uk, ….




  14
vFabric Gemfire - Elastic data fabric
• High performance data grid
• Enhanced parallel disk persistence
• Non Disruptive up/down scalability
• Session state
  - Cache of data retrieved from SOR
  - Denormalized datastore for high-performance queries
• Heterogenous data sharing
  • Java
  • .net
  • C++
• Co-located Transactions




    14
Gemfire - Use Cases

 • Ultra low latency high throughput application
 • As an L2 cache in hibernate
 • Distributed Batch process
 • Session state
   - Tomcat
   - tcServer
 • Wide Area replication




     14
Neo4j

 •Graph data model
  - Collection of graph nodes
  - Typed relationships between nodes
  - Nodes and relationships have properties
 •High performance traversal API from roots
  - Breadth first/depth first
 •Query to find root nodes
  - Indexes on node/relationship properties
  - Pluggable - Lucene is the default
 •Graph algorithms: shortest path, …
 •Transactional (ACID) including 2PC
 •Deployment modes
  - Embedded – written in Java
  - Server with REST API


  15
Neo4j Data Model




  16
Neo4j Use Cases

 • Use Cases
  -    Anything social
  -    Cloud/Network management, i.e. tracking/managing physical/virtual resources
  -    Any kind of geospatial data
  -    Master data management
  -    Bioinformatics
  -    Fraud detection
  -    Metadata management
 • Who is using it?
  -    StudiVZ (the largest social network in Europe)
  -    Fanbox
  -    The Swedish military
  -    And big organizations in datacom, intelligence, and finance that wish to remain anonymous




  19
MongoDB

• Document-oriented database
  - JSON-style documents: Lists, Maps, primitives
  - Documents organized into collections (~table)
• Full or partial document updates
  - Transactional update in place on one document
  - Atomic Modifiers
• Rich query language for dynamic queries
• Index support – secondary and compound
• GridFS for efficiently storing large files
• Map/Reduce




  20
Data Model = Binary JSON documents

 {

          "name" : "Ajanta",
                                                                       One document
          "type" : "Indian",
                                                                             =
          "serviceArea" : [
               "94619",                                             one DDD aggregate
               "94618"
          ],

          "openingHours" : [
               {
                                                         • Sequence of bytes on disk = fast I/O
                                                          - No joins/seeks
                   "dayOfWeek" : Monday,

                   "open" : 1730,
                                                          - In-place updates when possible => no index updates
                   "close" : 2130                        • Transaction = update of single document
               }
          ],

          "_id" : ObjectId("4bddc2f49d1505567c6220a0")
 }




     21
MongoDB query by example

 • Find a restaurant that serves the 94619 zip code and is open at 6pm on a Monday

  {
       serviceArea:"94619",
       openingHours: {
           $elemMatch :    {
                  "dayOfWeek" : "Monday",
                  "open": {$lte: 1800},
                  "close": {$gte: 1800}
           }
       }
  }                      DBCursor cursor = collection.find(qbeObject);
                         while (cursor.hasNext()) {
                               DBObject o = cursor.next();
                               …
                           }




  23
MongoDB use cases

 •                                                Use cases
     -    Real-time analytics
     -    Content management systems
     -    Single document partial update
     -    Caching
     -    High volume writes
 •                                                Who is using it?
     -    Shutterfly, Foursquare
     -    Bit.ly Intuit
     -    SourceForge, NY Times
     -    GILT Groupe, Evite,
     -    SugarCRM




 Copyright (c) 2011 Chris Richardson. All rights reserved.


     25
Other NoSQL databases

• SimpleDB – “key-value”
• Cassandra – column oriented database
• CouchDB – document-oriented
• Membase – key-value
• Riak – key-value + links
• Hbase – column-oriented…




      http://nosql-database.org/ has a list of 122 NoSQL databases



 26
Agenda

 • Why NoSQL?
 • Overview of NoSQL databases
 • Introduction to Spring Data
 • Database APIs
       - MongoDB
       - HyperSQL
       - Neo4J




  27
NoSQL Java APIs

Database                  Libraries
Redis                     Jedis, JRedis, JDBC-Redis, RJC

Neo4j                     Vendor-provided
MongoDB                   Vendor-provided Java driver
Gemfire                   Pure Java map API, Spring-Gemfire templates

But
• Usage patterns
• Tedious configuration
• Repetitive code
• Error prone code
•…




  28
Spring Data Project Goals

 • Bring classic Spring value propositions to a wide range of NoSQL databases:
  - Productivity
  - Programming model consistency: E.g. <NoSQL>Template classes
  - “Portability”




  30
Spring Data sub-projects

 •   Commons: Polyglot persistence
 •   Key-Value: Redis, Riak
 •   Document: MongoDB, CouchDB
 •   Graph: Neo4j
 •   GORM for NoSQL



                                 http://www.springsource.org/spring-data




31
Many entry points to use

 • Auto-generated repository implementations
 • Opinionated APIs (Think JdbcTemplate)
 • Object Mapping (Java and GORM)
 • Cross Store Persistence Programming model
 • Productivity support in Roo and Grails




  32
Cloud Foundry supports NoSQL




 MongoDB and Redis are provided as services
 è Deploy your MongoDB and Redis applications in seconds




33
Agenda

• Why NoSQL?
• Overview of NoSQL databases
• Introduction to Spring Data
• Database APIs
      - MongoDB
      - HyperSQL
      - Neo4J




 34
Three databases for today’s talk


        Document database


         Relational database


           Graph database




35
Three persistence strategies for today’s talk

• Lower level template approach
• Conventions based persistence (Hades)
• Cross-Store persistence using JPA and a NoSQL datastore




  36
Spring Template Patterns

• Resource Management
• Callback methods
• Exception Translation
• Simple Query API




 37
Repository Implementation




38
• Also known as HSQLDB or Hypersonic SQL
• Relational Database
• Table oriented data model
• SQL used for for queries
• … you know the rest…




 39
Spring Data Repository Support

• Eliminate bolierplate code – only finder methods
• findByLastName – Specifications for type safe queries
• JPA CrietriaBuilder integration QueryDSL




40
• Type safe queries for multiple backends including JPA, SQL and MongoDB in Java
• Generate Query classes using Java APT
• Code completion in IDE
• Domain types and properties can be referenced safely
• Adopts better to refactoring changes in domain types



http://www.querydsl.com




 41
QueryDSL




 • Repository Support
 • Spring Data JPA
 • Spring data Mongo
 • Spring Data JDBC extensions
 • QueryDslJdbcTemplate




  42
Spring Data Neo4J

•    Using AspectJ support providing a new programming model
•    Use annotations to define POJO entities
•    Constructor advice automatically handles entity creation
•    Entity field state persisted to graph using aspects
•    Leverage graph database APIs from POJO model
•    Annotation-driven indexing of entities for search




    43
Spring Data Graph Neo4J cross-store

• JPA data and “NOSQL” data can share a data model
• Separate the persistence provider by using annotations
– could be the entire Entity
– or, some of the fields of an Entity
• We call this cross-store persistence
– One transaction manager to coordinate the “NOSQL” store with the JPA relational database
– AspectJ support to manage the “NOSQL” entities and fields
• holds on to changed values in “change sets” until the transaction commits for non-
  transactional data stores




  44
A cross-store scenario ...


     You have a traditional web app using JPA to persist data to a relational
     database ...




45
JPA Data Model




46

      8/3/11     Slide 46
Cross-Store Data Model




47

Wmware NoSQL

  • 1.
    NoSQL / SpringData Polyglot Persistence – An introduction to Spring Data Pronam Chatterjee pronamc@vmware.com © 2011 VMware Inc. All rights reserved
  • 2.
    Presentation goal How Spring Data simplifies the development of NoSQL applications 2
  • 3.
    Agenda • Why NoSQL? • Overview of NoSQL databases • Introduction to Spring Data • Database APIs - MongoDB - HyperSQL - Neo4J 3
  • 4.
    Relational databases aregreat • SQL = Rich, declarative query language • Database enforces referential integrity • ACID semantics • Well understood by developers • Well supported by frameworks and tools, e.g. Spring JDBC, Hibernate, JPA • Well understood by operations • Configuration • Care and feeding • Backups • Tuning • Failure and recovery • Performance characteristics • But…. 4
  • 5.
    The trouble withrelational databases • Object/relational impedance mismatch - Complicated to map rich domain model to relational schema • Relational schema is rigid - Difficult to handle semi-structured data, e.g. varying attributes - Schema changes = downtime or $$ • Extremely difficult/impossible to scale writes: - Vertical scaling is limited/requires $$ - Horizontal scaling is limited or requires $$ • Performance can be suboptimal for some use cases 5
  • 6.
    NoSQL databases haveemerged… Each one offers some combination of: • High performance • High scalability • Rich data-model • Schema less In return for: • Limited transactions • Relaxed consistency •… 6
  • 7.
    … but thereare few commonalities • Everyone and their dog has written one • Different data models - Key-value - Column - Document - Graph • Different APIs – No JDBC, Hibernate, JPA (generally) • “Same sorry state as the database market in the 1970s before SQL was invented” http://queue.acm.org/detail.cfm?id=1961297 7
  • 8.
    NoSQL databases haveemerged… • NoSQL usage small by comparison… • But growing… 8
  • 9.
    Agenda • Why NoSQL? •Overview of NoSQL databases • Introduction to Spring Data • Database APIs - MongoDB - HyperSQL - Neo4J 10
  • 10.
    Redis • Advanced key-valuestore - Think memcached on steroids (the good kind) - Values can be binary strings, Lists, Sets, Ordered Sets, Hash maps, .. - Operations for each data type, e.g. appending to a list, adding to a set, retrieving a slice of a list, … - Provides pub/sub-based messaging K1 V1 • Very fast: K2 V2 - In-memory operations - ~100K operations/second on entry-level hardware K3 V2 • Persistent - Periodic snapshots of memory OR append commands to log file - Limits are size of keys retained in memory. • Has “transactions” - Commands can be batched and executed atomically 11
  • 11.
    Scaling Redis • Master/slavereplication - Tree of Redis servers - Non-persistent master can replicate to a persistent slave - Use slaves for read-only queries • Sharding - Client-side only – consistent hashing based on key - Server-side sharding – coming one day • Run multiple servers per physical host - Server is single threaded => Leverage multiple CPUs - 32 bit more efficient than 64 bit • Optional "virtual memory" - Ideally data should fit in RAM - Values (not keys) written to disc 13
  • 12.
    Redis use cases •Use in conjunction with another database as the SOR • Drop-in replacement for Memcached - Session state - Cache of data retrieved from SOR - Denormalized datastore for high-performance queries • Hit counts using INCR command • Randomly selecting an item – SRANDMEMBER • Queuing – Lists with LPOP, RPUSH, …. • High score tables – Sorted sets Notable users: github, guardian.co.uk, …. 14
  • 13.
    vFabric Gemfire -Elastic data fabric • High performance data grid • Enhanced parallel disk persistence • Non Disruptive up/down scalability • Session state - Cache of data retrieved from SOR - Denormalized datastore for high-performance queries • Heterogenous data sharing • Java • .net • C++ • Co-located Transactions 14
  • 14.
    Gemfire - UseCases • Ultra low latency high throughput application • As an L2 cache in hibernate • Distributed Batch process • Session state - Tomcat - tcServer • Wide Area replication 14
  • 15.
    Neo4j •Graph datamodel - Collection of graph nodes - Typed relationships between nodes - Nodes and relationships have properties •High performance traversal API from roots - Breadth first/depth first •Query to find root nodes - Indexes on node/relationship properties - Pluggable - Lucene is the default •Graph algorithms: shortest path, … •Transactional (ACID) including 2PC •Deployment modes - Embedded – written in Java - Server with REST API 15
  • 16.
  • 17.
    Neo4j Use Cases • Use Cases - Anything social - Cloud/Network management, i.e. tracking/managing physical/virtual resources - Any kind of geospatial data - Master data management - Bioinformatics - Fraud detection - Metadata management • Who is using it? - StudiVZ (the largest social network in Europe) - Fanbox - The Swedish military - And big organizations in datacom, intelligence, and finance that wish to remain anonymous 19
  • 18.
    MongoDB • Document-oriented database - JSON-style documents: Lists, Maps, primitives - Documents organized into collections (~table) • Full or partial document updates - Transactional update in place on one document - Atomic Modifiers • Rich query language for dynamic queries • Index support – secondary and compound • GridFS for efficiently storing large files • Map/Reduce 20
  • 19.
    Data Model =Binary JSON documents { "name" : "Ajanta", One document "type" : "Indian", = "serviceArea" : [ "94619", one DDD aggregate "94618" ], "openingHours" : [ { • Sequence of bytes on disk = fast I/O - No joins/seeks "dayOfWeek" : Monday, "open" : 1730, - In-place updates when possible => no index updates "close" : 2130 • Transaction = update of single document } ], "_id" : ObjectId("4bddc2f49d1505567c6220a0") } 21
  • 20.
    MongoDB query byexample • Find a restaurant that serves the 94619 zip code and is open at 6pm on a Monday { serviceArea:"94619", openingHours: { $elemMatch : { "dayOfWeek" : "Monday", "open": {$lte: 1800}, "close": {$gte: 1800} } } } DBCursor cursor = collection.find(qbeObject); while (cursor.hasNext()) { DBObject o = cursor.next(); … } 23
  • 21.
    MongoDB use cases • Use cases - Real-time analytics - Content management systems - Single document partial update - Caching - High volume writes • Who is using it? - Shutterfly, Foursquare - Bit.ly Intuit - SourceForge, NY Times - GILT Groupe, Evite, - SugarCRM Copyright (c) 2011 Chris Richardson. All rights reserved. 25
  • 22.
    Other NoSQL databases •SimpleDB – “key-value” • Cassandra – column oriented database • CouchDB – document-oriented • Membase – key-value • Riak – key-value + links • Hbase – column-oriented… http://nosql-database.org/ has a list of 122 NoSQL databases 26
  • 23.
    Agenda • WhyNoSQL? • Overview of NoSQL databases • Introduction to Spring Data • Database APIs - MongoDB - HyperSQL - Neo4J 27
  • 24.
    NoSQL Java APIs Database Libraries Redis Jedis, JRedis, JDBC-Redis, RJC Neo4j Vendor-provided MongoDB Vendor-provided Java driver Gemfire Pure Java map API, Spring-Gemfire templates But • Usage patterns • Tedious configuration • Repetitive code • Error prone code •… 28
  • 25.
    Spring Data ProjectGoals • Bring classic Spring value propositions to a wide range of NoSQL databases: - Productivity - Programming model consistency: E.g. <NoSQL>Template classes - “Portability” 30
  • 26.
    Spring Data sub-projects • Commons: Polyglot persistence • Key-Value: Redis, Riak • Document: MongoDB, CouchDB • Graph: Neo4j • GORM for NoSQL http://www.springsource.org/spring-data 31
  • 27.
    Many entry pointsto use • Auto-generated repository implementations • Opinionated APIs (Think JdbcTemplate) • Object Mapping (Java and GORM) • Cross Store Persistence Programming model • Productivity support in Roo and Grails 32
  • 28.
    Cloud Foundry supportsNoSQL MongoDB and Redis are provided as services è Deploy your MongoDB and Redis applications in seconds 33
  • 29.
    Agenda • Why NoSQL? •Overview of NoSQL databases • Introduction to Spring Data • Database APIs - MongoDB - HyperSQL - Neo4J 34
  • 30.
    Three databases fortoday’s talk Document database Relational database Graph database 35
  • 31.
    Three persistence strategiesfor today’s talk • Lower level template approach • Conventions based persistence (Hades) • Cross-Store persistence using JPA and a NoSQL datastore 36
  • 32.
    Spring Template Patterns •Resource Management • Callback methods • Exception Translation • Simple Query API 37
  • 33.
  • 34.
    • Also knownas HSQLDB or Hypersonic SQL • Relational Database • Table oriented data model • SQL used for for queries • … you know the rest… 39
  • 35.
    Spring Data RepositorySupport • Eliminate bolierplate code – only finder methods • findByLastName – Specifications for type safe queries • JPA CrietriaBuilder integration QueryDSL 40
  • 36.
    • Type safequeries for multiple backends including JPA, SQL and MongoDB in Java • Generate Query classes using Java APT • Code completion in IDE • Domain types and properties can be referenced safely • Adopts better to refactoring changes in domain types http://www.querydsl.com 41
  • 37.
    QueryDSL • RepositorySupport • Spring Data JPA • Spring data Mongo • Spring Data JDBC extensions • QueryDslJdbcTemplate 42
  • 38.
    Spring Data Neo4J • Using AspectJ support providing a new programming model • Use annotations to define POJO entities • Constructor advice automatically handles entity creation • Entity field state persisted to graph using aspects • Leverage graph database APIs from POJO model • Annotation-driven indexing of entities for search 43
  • 39.
    Spring Data GraphNeo4J cross-store • JPA data and “NOSQL” data can share a data model • Separate the persistence provider by using annotations – could be the entire Entity – or, some of the fields of an Entity • We call this cross-store persistence – One transaction manager to coordinate the “NOSQL” store with the JPA relational database – AspectJ support to manage the “NOSQL” entities and fields • holds on to changed values in “change sets” until the transaction commits for non- transactional data stores 44
  • 40.
    A cross-store scenario... You have a traditional web app using JPA to persist data to a relational database ... 45
  • 41.
    JPA Data Model 46 8/3/11 Slide 46
  • 42.