Galder Zamarreno gave a presentation on Infinispan, an open source data grid platform designed for cloud computing. He discussed how traditional databases do not work well in cloud environments due to their stateful and failure-prone nature. Data grids are better suited as they are highly scalable, have no single point of failure, and work with ephemeral cloud nodes. Infinispan is a new data grid that improves on an earlier product, JBoss Cache, with a more scalable architecture and features like a simple map API, client/server support, and integration with Hibernate and Lucene. Future plans for Infinispan include enhanced replication, distributed execution capabilities, and support for cloud-based data
2. Infinispan:
New Kid on the NoSQL Block
Galder Zamarreño
Senior Engineer, Red Hat
14th October 2010, Lausanne JUG
galder@jboss.org | twitter.com/galderz | zamarreno.com
Monday, October 18, 2010
3. “There is a need for a viable cloudready data store. People need to
rethink the way they organize, store
and access data.”
galder@jboss.org | twitter.com/galderz | zamarreno.com
Monday, October 18, 2010
4. Who is Galder?
• R&D engineer (Red Hat Inc):
• Infinispan developer
• JBoss Cache developer
• Contributor and committer:
• JBoss AS, Hibernate, JGroups, JBoss Portal,...etc
• Blog: zamarreno.com
• Twitter: @galderz
galder@jboss.org | twitter.com/galderz | zamarreno.com
Monday, October 18, 2010
5. Agenda
• Cloud computing and data storage
• And why you should care!
• Data grids and cloud storage
• Introducing Infinispan
galder@jboss.org | twitter.com/galderz | zamarreno.com
Monday, October 18, 2010
6. Clouds are today!
• Clouds are happening
• *aaS
• You cannot escape them!
• Public: Amazon, Google, Rackspace, ...
• Private: Red Hat, Oracle, VMWare, ...
• Clouds will become mainstream
• Traditional data centers become marginalized
galder@jboss.org | twitter.com/galderz | zamarreno.com
Monday, October 18, 2010
7. Why are clouds popular?
• Piecemeal costs, perfect utilization
• Pay for what you use, no more!
• Massive economies of scale
• High availability = Implicit backups!
• Very fast provisioning -> Elasticity
• Familiar charging model, controllable costs
• Operational expenditure versus capital expenditure
galder@jboss.org | twitter.com/galderz | zamarreno.com
Monday, October 18, 2010
8. Why should I care?
• My favorite platform is still relevant
• Java, Java EE
• Python, Ruby, .NET,... whatever!
• My favorite OS is still relevant:
• Linux
• Solaris, ...etc.
galder@jboss.org | twitter.com/galderz | zamarreno.com
Monday, October 18, 2010
9. Data Storage
• Databases on clouds:
• not a match made in heaven!
• Traditional modes of data storage won't work
• Clouds are inherently stateless, ephemeral
• Cloud deployments should scale
• ... but databases still are a bottleneck
• … and single point of failure!
galder@jboss.org | twitter.com/galderz | zamarreno.com
Monday, October 18, 2010
10. RDBMS on clouds:
your options
• Non-ephemeral storage
• Restrictive
• Highly specialized hardware
• E.g., a SAN for Oracle RAC, ExaLogic?
• Hardly commodity hardware!
• Native database clustering
• Unreliable, expensive
galder@jboss.org | twitter.com/galderz | zamarreno.com
Monday, October 18, 2010
11. Another solution: Data Grids!
• Data grids are perfect for clouds
• Highly scalable
• No single point of failure
• Works with ephemeral cloud nodes
• Very low latency
galder@jboss.org | twitter.com/galderz | zamarreno.com
Monday, October 18, 2010
12. Data Grids and other vendors
• Data grids
• Amazon SimpleDB uses Dynamo
• Google BigTable
• Infinispan
• Many other commercial and OSS offerings
galder@jboss.org | twitter.com/galderz | zamarreno.com
Monday, October 18, 2010
13. In-Memory Data Grids - Speed!
• Low latency
• minimal disk lookup
• Memory 2 orders of magnitude faster than disk
• especially for frequently used data
• Concurrency, hardware threads
• Disk IO is always a concurrency bottleneck
• Memory offers far greater concurrency
galder@jboss.org | twitter.com/galderz | zamarreno.com
Monday, October 18, 2010
15. Introducing Infinispan
• Scalable data grid platform
• open source - LGPL
• based on some JBoss Cache code ... but mostly all-new
• JBoss Cache...
• ... is a clustered caching library
• ... exposes a tree-structured API
• Infinispan has a Map-like API - (JSR-107 JCACHE)
• ... so, primarily key/value NoSQL
galder@jboss.org | twitter.com/galderz | zamarreno.com
Monday, October 18, 2010
17. Infinispan != JBoss Cache 4
• New architecture
• Brand new data container design
• Cutting edge algorithms
• New, completely different, APIs
• Not backward-compatible
• Although an code-level compatibility layer is available
• New expectations
• Designed for a far wider scope of purpose
galder@jboss.org | twitter.com/galderz | zamarreno.com
Monday, October 18, 2010
18. More scalable than JBC
• Internal structures more memory-efficient
• Data organised in Map-like dictionaries
• As opposed to a tree
• Making better use of CAS
• Minimizing synchronized blocks, mutexes
• Highly precise and low overhead data eviction
• Uses JBoss Marshalling
• smaller payloads + poolable streams = faster RPC
galder@jboss.org | twitter.com/galderz | zamarreno.com
Monday, October 18, 2010
19. “Borrowed” from JBoss Cache
• JTA transactions
• Replicated data structure
• Fine-grained replication
• Eviction, cache persistence
• Notifications and eventing API
• JMX reporting and Query API
• MVCC locking
• Non-blocking state transfer techniques
galder@jboss.org | twitter.com/galderz | zamarreno.com
Monday, October 18, 2010
20. … and new features!
• Consistent hash based data distribution
• Much simpler Map API (JSR-107 compliant)
• Ability to be consumed by non-JVM platforms
• Client/server module
• Memcached compatibility
• HotRod - binary protocol supporting “smart clients”
• Javascript access via Websocket server
• REST API
galder@jboss.org | twitter.com/galderz | zamarreno.com
Monday, October 18, 2010
21. … and new features!
• JOPR based GUI management console
• JPA-like API
• Distributed execution
• Map/reduce made easy!
galder@jboss.org | twitter.com/galderz | zamarreno.com
Monday, October 18, 2010
22. Data distribution
• Consistent hash based data distribution
• Locating entries very efficient
• No network calls, no need for metadata
• Will allow us to scale to bigger clusters
• Goal of efficient scaling to 1000’s of nodes
• Lightweight, “L1” cache for efficient reads
• On writes, “L1” gets invalidated
• Dynamic rebalancing
galder@jboss.org | twitter.com/galderz | zamarreno.com
Monday, October 18, 2010
23. JPA-like API, fine-grained
replication
• Successor to POJO Cache
• JPA-like interface: persist, find, remove...
• Will not rely on AOP, javassist, etc
• More robust and easier to use/debug
• Familiar JPA-like interface
• Easy migration from existing, “traditional” data stores!
galder@jboss.org | twitter.com/galderz | zamarreno.com
Monday, October 18, 2010
24. Management
• Uses JOPR, a rich web-based GUI
• Simple WAR file
• Open Source (LGPL)
• Infinispan exposes data, operations in JMX
• Infinispan-JOPR plugin represents this graphically
• Other plugins can be built for other tools
• HP OpenView, Hyperic, etc.
galder@jboss.org | twitter.com/galderz | zamarreno.com
Monday, October 18, 2010
25. So why is Infinispan sexy?
galder@jboss.org | twitter.com/galderz | zamarreno.com
Monday, October 18, 2010
26. Why is Infinispan sexy?
• Transparent horizontal scalability
• Elastic in both directions
• Fast, low latency data access
• Ability to address a very large heap
• Cloud-ready datastore
• Not just for Java
• Free and doesn't suck!
galder@jboss.org | twitter.com/galderz | zamarreno.com
Monday, October 18, 2010
28. Roadmap
• Infinispan 4.0.0 Starobrno (Released Feb 2010)
• New Map API
• Async API
• Distributed cache mode
• Management tooling
• REST API
• Hibernate 2nd level cache
galder@jboss.org | twitter.com/galderz | zamarreno.com
Monday, October 18, 2010
29. Roadmap
• Infinispan 4.1.0 Radegast (Released August 2010)
• Client/server
• Memcached protocol
• Hot Rod protocol
• Smart clients using HotRod
• Websocket server
• Lucene Directory
• LIRS adaptive, recency-based eviction
galder@jboss.org | twitter.com/galderz | zamarreno.com
Monday, October 18, 2010
30. Roadmap
• Infinispan 4.2.0 Ursus
• Collocated nodes in DIST
• Cassandra based cache store
• Infinispan 5.0.0 Pagoa
• JPA-like API + fine-grained replication
• Distributed executors
• Map/reduce programming model
galder@jboss.org | twitter.com/galderz | zamarreno.com
Monday, October 18, 2010
31. To sum it up
• Clouds are becoming mainstream
• Need to think about challenges
• DBs and clouds pose many challenges
• Data grids offer a good alternative
• Infinispan, a new open source data grid
• Viable cloud data store but not just for clouds
• removes bottlenecks, single points of failure in non-cloud
architectures too
galder@jboss.org | twitter.com/galderz | zamarreno.com
Monday, October 18, 2010
32. How can YOU participate?
• Download and try it out!
• Report bugs in code, even docs, wikis, etc.
• Suggest new features!
• Test with your own use cases and tell us how you use it!!
• Lend a hand with development
• Open and democratic dev process
• Helps prioritize features you want!
• Several non-Red Hat core committers already!
galder@jboss.org | twitter.com/galderz | zamarreno.com
Monday, October 18, 2010