Infinispan
Data Grids, NoSQL, Cloud Storage & JSR-347


  Manik Surtani
  Founder and Project Lead, Infinispan
  Red Hat, Inc.
Who is Manik?
• Hacker@JBoss, Red Hat’s middleware division
• Founder and Project Lead, Infinispan
• Spec lead, JSR 347
  •Data Grids for Java
• EG representative, JSR 107
  •Temporary Caching for Java

• http://blog.infinispan.org
• http://twitter.com/maniksurtani
Agenda
•   A brief introduction to Infinispan
•   Understanding Data Grids
•   .. and NoSQL
•   Their role in Cloud Storage
•   JSR 347 and related standards
What is Infinispan?
• An open source data grid platform
• Written in Java and Scala
  • Not just for the JVM though
• Distributed key/value store
  • Transactional (JTA)
  • Low-latency (in-memory)
  • Optionally persisted to disk
  • Feature-rich
P2P Embedded Architecture
Client/Server Architecture
                   Supported
                   Protocols

                  • REST
                  • Memcached
                  • Hot Rod
WTF is Hot Rod?
• Wire protocol for client server
    communications
•   Open
•   Language independent
•   Built-in failover and load
    balancing
•   Smart routing
Server Endpoint Comparison
            Protocol Client      Clustered? Smart     Load Balancing/
                     Libraries              Routing   Failover


REST        Text     N/A         Yes        No        Any HTTP load
                                                      balancer

Memcached   Text     Plenty      Yes        No        Only with
                                                      predefined
                                                      server list
Hot Rod     Binary   Java,       Yes        Yes       Dynamic
                     Python,
                     Ruby
Understanding

Data Grids
Data Grids.
What Are They?




              An evolution of
             distributed caches
Why use distributed caches?
• Cache data that is expensive to retrieve/calculate
  • E.g., from a database
• The need for fast, low-latency data access
  • Performance or time-sensitive applications
• Very commonly used in:
  • Financial Services industry
  • Telcos
  • Highly scalable e-commerce
Data grids as clustering toolkits
• To introduce high availability
  and failover to frameworks
  • Commercial and open source
    frameworks
  • In-house frameworks and
    reusable architectures
• Delegate all state
  management to the data
  grid
  • Framework becomes
    stateless and hence elastic
But
Data Grids > Distributed Caches

     • Querying
     • Task execution and map/reduce
     • Control over data co-location
Understanding

NoSQL
What is NoSQL?
• An alternative form of typically disk-based data
  storage
• Free from relational structure
  • Usually key/value or document-based
• Allows for greater scalability and easier
  clustering/distribution
NoSQL and Consistency
NoSQL and Consistency
• BASE not ACID
  • Relax consistency in exchange for high availability
    and partition tolerance
• Usually eventually consistent
  • Which means applications need to be designed with
    this in mind
NoSQL and Consistency
Data Grids and NoSQL
        used as

Cloud Storage
Cloud Storage
• Traditional mechanisms (RDBMSs and file
  systems) are hard to deal with
• Clouds are ephemeral
• All cloud components are expected to be:
  • elastic
  • highly available
Cloud Storage
•Data grids and NoSQL win over traditional
  storage mechanisms in the cloud
• Data grids and NoSQL are fast converging in
  feature sets
  • E.g., Data grids can write through to disk; many
    NoSQL engines would also cache in memory
JSR 347
JSR 347
   Data Grids for the Java Platform
• A new JSR for proposed inclusion in Java EE 8
  • to make enterprise Java more cloud-friendly
• Standardize data grid APIs and behavior for the
  Java platform
• Does not define NoSQL
  • Data grids primarily used from within a JVM
  • NoSQL primarily used via client connectors over a
    socket
     • Standardizing wire protocols beyond the scope of the JCP
JSR 347
   Data Grids for the Java Platform
• Extends JSR 107 (Temporary Caching for Java)
• Adds:
  • Asynchronous, non-blocking API
  • Grouping API to control co-location
  • Distributed code execution and Map/Reduce APIs
  • Eventually consistent API
  • Possibly more
• Still very much work in progress
  • Participate!
Related standards and efforts
• JSR 107
  • A temporary caching API that defines:
    • Basic interaction
    • JTA compatibility
    • Persistence: write-through and write-behind
    • Listeners
Related standards and efforts
• Hibernate OGM
  • JPA for key/value stores!
  • Common and familiar paradigm for persisting data
    • Except persistence is made to a data grid or NoSQL store
Related standards and efforts
• Contexts and Dependency Injection
  • Interaction with caches defined in JSR 107
  • Familiar and well proven programming model
  • Works well with JPA and hence Hibernate OGM
  • Works well even for direct access to key/value data
   grids
Where does Infinispan fit in?

• Will implement JSR 107
  • Currently implements most of this at least in concept
• Will implement JSR 347
  • Currently serves as a “donor” for most of JSR 347
    features and API
• Is already the reference backend for Hibernate
 OGM
• Already supports CDI integration
To Summarize


•What data grids and distributed caches are
•Where NoSQL came from and main differences
 between NoSQL and data grids
•Cloud storage challenges
•JSR 347: Data Grids for the Java Platform
•Infinispan and where it sits in all this
Questions &
  More Info

• http://github.com/datagrids/spec/wiki
• http://groups.google.com/group/jsr347
• http://twitter.com/jsr347

• http://www.infinispan.org
• http://twitter.com/infinispan
• http://hibernate.org/subprojects/ogm.html

Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347

  • 1.
    Infinispan Data Grids, NoSQL,Cloud Storage & JSR-347 Manik Surtani Founder and Project Lead, Infinispan Red Hat, Inc.
  • 2.
    Who is Manik? •Hacker@JBoss, Red Hat’s middleware division • Founder and Project Lead, Infinispan • Spec lead, JSR 347 •Data Grids for Java • EG representative, JSR 107 •Temporary Caching for Java • http://blog.infinispan.org • http://twitter.com/maniksurtani
  • 3.
    Agenda • A brief introduction to Infinispan • Understanding Data Grids • .. and NoSQL • Their role in Cloud Storage • JSR 347 and related standards
  • 5.
    What is Infinispan? •An open source data grid platform • Written in Java and Scala • Not just for the JVM though • Distributed key/value store • Transactional (JTA) • Low-latency (in-memory) • Optionally persisted to disk • Feature-rich
  • 6.
  • 7.
    Client/Server Architecture Supported Protocols • REST • Memcached • Hot Rod
  • 8.
    WTF is HotRod? • Wire protocol for client server communications • Open • Language independent • Built-in failover and load balancing • Smart routing
  • 9.
    Server Endpoint Comparison Protocol Client Clustered? Smart Load Balancing/ Libraries Routing Failover REST Text N/A Yes No Any HTTP load balancer Memcached Text Plenty Yes No Only with predefined server list Hot Rod Binary Java, Yes Yes Dynamic Python, Ruby
  • 10.
  • 11.
    Data Grids. What AreThey? An evolution of distributed caches
  • 12.
    Why use distributedcaches? • Cache data that is expensive to retrieve/calculate • E.g., from a database • The need for fast, low-latency data access • Performance or time-sensitive applications • Very commonly used in: • Financial Services industry • Telcos • Highly scalable e-commerce
  • 13.
    Data grids asclustering toolkits • To introduce high availability and failover to frameworks • Commercial and open source frameworks • In-house frameworks and reusable architectures • Delegate all state management to the data grid • Framework becomes stateless and hence elastic
  • 14.
    But Data Grids >Distributed Caches • Querying • Task execution and map/reduce • Control over data co-location
  • 15.
  • 16.
    What is NoSQL? •An alternative form of typically disk-based data storage • Free from relational structure • Usually key/value or document-based • Allows for greater scalability and easier clustering/distribution
  • 17.
  • 18.
    NoSQL and Consistency •BASE not ACID • Relax consistency in exchange for high availability and partition tolerance • Usually eventually consistent • Which means applications need to be designed with this in mind
  • 19.
  • 20.
    Data Grids andNoSQL used as Cloud Storage
  • 21.
    Cloud Storage • Traditionalmechanisms (RDBMSs and file systems) are hard to deal with • Clouds are ephemeral • All cloud components are expected to be: • elastic • highly available
  • 22.
    Cloud Storage •Data gridsand NoSQL win over traditional storage mechanisms in the cloud • Data grids and NoSQL are fast converging in feature sets • E.g., Data grids can write through to disk; many NoSQL engines would also cache in memory
  • 23.
  • 24.
    JSR 347 Data Grids for the Java Platform • A new JSR for proposed inclusion in Java EE 8 • to make enterprise Java more cloud-friendly • Standardize data grid APIs and behavior for the Java platform • Does not define NoSQL • Data grids primarily used from within a JVM • NoSQL primarily used via client connectors over a socket • Standardizing wire protocols beyond the scope of the JCP
  • 25.
    JSR 347 Data Grids for the Java Platform • Extends JSR 107 (Temporary Caching for Java) • Adds: • Asynchronous, non-blocking API • Grouping API to control co-location • Distributed code execution and Map/Reduce APIs • Eventually consistent API • Possibly more • Still very much work in progress • Participate!
  • 26.
    Related standards andefforts • JSR 107 • A temporary caching API that defines: • Basic interaction • JTA compatibility • Persistence: write-through and write-behind • Listeners
  • 27.
    Related standards andefforts • Hibernate OGM • JPA for key/value stores! • Common and familiar paradigm for persisting data • Except persistence is made to a data grid or NoSQL store
  • 28.
    Related standards andefforts • Contexts and Dependency Injection • Interaction with caches defined in JSR 107 • Familiar and well proven programming model • Works well with JPA and hence Hibernate OGM • Works well even for direct access to key/value data grids
  • 29.
    Where does Infinispanfit in? • Will implement JSR 107 • Currently implements most of this at least in concept • Will implement JSR 347 • Currently serves as a “donor” for most of JSR 347 features and API • Is already the reference backend for Hibernate OGM • Already supports CDI integration
  • 30.
    To Summarize •What datagrids and distributed caches are •Where NoSQL came from and main differences between NoSQL and data grids •Cloud storage challenges •JSR 347: Data Grids for the Java Platform •Infinispan and where it sits in all this
  • 31.
    Questions & More Info • http://github.com/datagrids/spec/wiki • http://groups.google.com/group/jsr347 • http://twitter.com/jsr347 • http://www.infinispan.org • http://twitter.com/infinispan • http://hibernate.org/subprojects/ogm.html

Editor's Notes

  • #2 Welcome to session on Infinispan, I hope you find this both informative and amusing.\n
  • #3 A bit about me\nFounder and project lead of Infinispan\n\n
  • #4 \n
  • #5 \n
  • #6 \n
  • #7 Embedded setup.\nApp in JVM, starts ISPN instance.\nInstances form a cluster\nApp stores all state in ISPN, app is now HA, can be LB’d, etc!\n\nHow to build clustered fwks and appservers\n
  • #8 Infinispan nodes form a p2p cluster as usual\nShare state and communicate with each other\nEach node also opens a network socket for client comms\nAttaches an encoder and decoder NETTY\nClients talk to Infinispan instances via sockets\nClients now don’t need to be in a JVM\n\n
  • #9 Explain Hot Rod\n
  • #10 Talk about protocols and endpoints\n
  • #11 Lets talk about data grids in general.\n
  • #12 \n
  • #13 \n
  • #14 \n
  • #15 Offers more than just what distributed caches do.\n
  • #16 Strictly distributed NoSQL.\nLeaving out the likes of CouchDB, Redis, etc.\n
  • #17 Alternative to an RDBMS\nUnstructured data\nPrimary goal: scalability and elasticity.\n
  • #18 RDBMSs strive for ACIDity. (Atomic Consistent Isolated Durable)\nBASE (Basic Availability, Soft-state, Eventually consistent)\nEric Brewer’s CAP theorem\n
  • #19 RDBMSs strive for ACIDity. (Atomic Consistent Isolated Durable)\nBASE (Basic Availability, Soft-state, Eventually consistent)\nEric Brewer’s CAP theorem\n
  • #20 RDBMSs strive for ACIDity. (Atomic Consistent Isolated Durable)\nBASE (Basic Availability, Soft-state, Eventually consistent)\nEric Brewer’s CAP theorem\n
  • #21 \n
  • #22 \n
  • #23 \n
  • #24 \n
  • #25 \n
  • #26 \n
  • #27 \n
  • #28 \n
  • #29 \n
  • #30 \n
  • #31 \n
  • #32 \n