A universal Data & State sharing Fabric
                March 2013




                                    Shay Hassidim
                                      Deputy CTO
Agenda
• Company Snapshot
• Shared Data & State Fabric
   – Problem Statement
   – Existing Solutions – Pros & Cons
   – IMDG – Fundamentals & Case Study
GigaSpaces – Telco Customers and Partners




 Founded 2000 , 200 Customers , Sales: US , EU , APAC , RND: TLV , Main Investor: Intel
Problem Description
• Telecom systems internal systems currently use a
  variety of MOM (AMQP, XMPP) , IM protocols
  , Network based protocols, Application level
  protocols (SIP), HTML based protocols (Rest, SOAP) or
  DB (RDBMS, No SQL) to communicate with each
  other.
    – Introduce complexity, cost/ROI concerns, increased
      time to market, impact customer satisfaction


• A universal data & state sharing fabric connecting the different
  products / systems / components is required
   – Real time response
   – Total reliability
   – Scalable & Highly Available
Possible Solutions - #1:Database
• Pros
  – Everybody knows databases. No need to
    educate the developers.
• Cons
  – Users need to reshape the data structure to use
    a specific model (Table , Document). Lock-in.
  – No proactive event based messaging fabric.
    Requires external messaging integration
  – Management overhead
  – Not designed for real-time communication.
    Mostly disk based storage medium.
Possible Solutions - #2:ESB
• Pros
  – Designed for data transformation and
    routing between alien systems
  – Protocol translation focused
• Cons
  – Does not maintain state. No state sharing.
    Getting the latest state requires a DB/Cache
  – Not designed for real-time communication
What if we could
 combine both DB
and ESB but avoid
   all the cons?
    +     =
Enter IMDG
• Roots in the Tuple Space concept, RDBMS, OODBMS,
  MOM
• Pros
   – State sharing in real time in Massive scale
   – Built-in proactive event based fabric
         • Fused to the engine core. No external subsystem
   – Support both Key/Value, SQL Based data access and
     Message deliver modes with the same API
         • Topic , Queue, Pub/Sub
   – Support locality of reference
         • Collocate Data and business logic
• Cons
   – No full SQL Query support as a RDBMS
   – Requires some education/ramp up
GigaSpaces IMDG Basic Operations
                                                                               Space

                                           Application   WriteMultiple

                                  Space

           Application   Write


                                                                                Space

                                                           ReadMultiple
                                           Application
                                  Space

                         Read
           Application


                                                                                Space

                                           Application    TakeMultiple
                                  Space

           Application    Take


                                                                                 Space

                                           Application    Execute

                                  Space

                         Notify
           Application

                                                                                 Space
                                                                                 set
                                           Application       Change           putInMap
                                                                             increment
                                                                            deccrement
                                                                          addToCollection



http://www.gigaspaces.com/docs/JavaDoc9.1/org/openspaces/core/GigaSpace.html
IMDG Real-Time Interoperability
                       Document




                IMDG

                                  POJO
                                  JavaSpace
                                  Map
                                  JPA
                                  JMS
                                  JDBC
Application-IMDG Topologies
       Remote


                           Remote call
         Application                      IMDG
          Process                        Process




     Collocated

                            Local call
          Application                      IMDG
                                         Instance


     Application Process
IMDG Real-Time Low-Latency Performance
                                       GigaSpaces IMDG Latency Benchmark
                                                                                        300
                              No Serialization
                   300                                         Async operation
                             No Network usage
    Microseconds

                   250
                                                      200
                                                                                                          180
                   200



                   150



                   100



                    50
                                                                       10
                                   5

                     0
                   Colocated Read/Write latency
                                     Remote Read/Write latency Notification latency
                                                    Colocated            Remote Notification latencyreplication latency
                                                                                            Remote
  Benchmark using Cisco UCS , 2.93GHz CPU
 http://blog.gigaspaces.com/2010/12/06/possible-impossibility-the-race-to-zero-latency/
Cassandra vs. GigaSpaces
                                     Cassandra vs. GigaSpaces Read Benchmark
GigaSpaces TPS




                                                                                                                  Cassandra TPS
                 50,000,000                                                                              25,000
                 40,000,000                                                                              20,000
                                                                                                                                   GigaSpaces is
                 30,000,000                                                                              15,000                    1000-2000
                 20,000,000                                                                              10,000                    times faster
                 10,000,000                                                                              5,000
                             0                                                                           0
                                     1       2       3        4       5       6       7       8    10
                                                         # of Client Threads

                                 Cassandra vs. GigaSpaces Write Benchmark
GigaSpaces TPS




                 1,400,000                                                                              180,000




                                                                                                                   Cassandra TPS
                                                                                                        160,000
                 1,200,000
                                                                                                        140,000
                                                                                                                                   GigaSpaces is
                 1,000,000
                                                                                                        120,000                    3-7 times faster
                  800,000                                                                               100,000
                  600,000                                                                               80,000
                                                                                                        60,000
                  400,000
                                                                                                        40,000
                  200,000                                                                               20,000
                        0                                                                               0
                                 1       2       3        4       5       6       7       8       10
                                # of Client Threads
Benchmark using Cisco UCS , 2.93GHz CPU
ESB Performance
http://esbperformance.org/display/comparison/ESB+Performance - August 2012




                                                                  2 orders of
                                                                  magnitude slower
                                                                  than IMDG
IMDG Use Cases
•   Database Performance, Scaling
•   State replication for correlation
•   Distributed Stateful Processing
•   State replication over the WAN
Database Performance, Scaling
• Description
   – Device activity history
   – Last n activities
• Challenge
   – Central DB bottleneck
• Desired solution
   – Use a distributed cache to front end the database
   – Preload the data from the DB
   – Keep the DB in-synch
• Requirements
   – High consistency
   – High Performance (<10,000 TX/Sec)
Database Performance, Scaling
Solution Architecture

• Use IMDG to front-end the
  database
                                  Application
• Keep the DB in-synch
    – Load the data into memory
      using Object based model
    – Write behind all updates
• Use the IMDG as the system                                 load   Write behind
  of record
    – Rich Query (SQL,..)
    – Transaction support

                                                Database 1     Database 2   Database 3
State Replication for correlation
• Description
   – State correlation
• Challenge
   – High availability
• Desired solution
   – Peer model , multi-master architecture
• Requirements
   – CAP based consistency
State Replication for correlation
Solution Architecture


• Store correlation state in   Application
  the IMDG
• Can use replicated or
                               Application
  partitioned IMDG topology
• Can use Client-side cache
  for greater performance      Application

  optimization of read
  operation
Distributed Stateful Processing
• Description
   – Workflow of distributed process
   – 5-10 steps per process
• Challenge
   – State information is too big to transfer
• Desired solution
   – Shared consistent state
• Requirements
   – Scale to deal with large amount of nodes (>1000)
Distributed Stateful Processing
Solution Architecture

• Store workflow state in the
  IMDG                            STATE
                                    A
                                          POCESS
                                            A
                                                   STATE
                                                     B
                                                         PRCESS B
                                                                    STATE PROCESS
                                                                      C      C

• Use pub/sub model to
  synchronize the state
• Use Template matching /
  SQL for querying the state
  of a particular object in the
  IMDG
Sharing Device State Across Different DC
• Description
   – Systems/devices running across different data centers
   – Need to share state in real time
• Challenge
   – Requires reliable and scalable data replication over the
     WAN
• Desired solution
   – Simple secured point2point state replication over the WAN
• Requirements
   – Support low-bandwidth , high-latency networks
Replication over the WAN Support
Solution Architecture

                             London




          New York




                         Hong Kong
GigaSpaces IMDG WAN Replication Features
       Any Replication Topology   Fully Transactional



       High Availability          Custom Conflict Resolution



                                  Interoperable
       Data Filtering


       Cloud Enabled              Single Click Bootstrapping



       Security
                                  Optimized Connection


            24
XAP – One Product for In Memory Computing

    Java-.NET-C++
                                               Customize
   Spring, JPA,JMS
                                              Application
      JDBC, Map
                                              Management
     Schema-Free
                                                Rules &
                                               Workflows


 One Model for all
    components
- Clustering
- Security
- HA
- OA&M

                                              Real-Time
    Virtualize All                            Automated
    Middleware                               Deployment
    Components                                Monitoring
                                             Management
Consistent Management & Monitoring Module (“Cloudify”)
                                     Application recipe uses domain-specific language (DSL) to describe the
                                                              application life cycle




                                                                                                            Configuration and setup
                                                                                                          separate from the application
                                                                                                                      recipe




           All necessary plumbing
           provided out of the box


26
A Typical App…
Extensive Platform Support




28
Summary – GigaSpaces IMDG
• Designed for real-time data-driven
  interoperability
• Management of data-centric architecture
  made easy
• Supports high-end, complex applications
• Blazing fast. Highly-Available.
Thank You!
  www.gigaspaces.com

Telecom universal datastatesharingfabric

  • 1.
    A universal Data& State sharing Fabric March 2013 Shay Hassidim Deputy CTO
  • 2.
    Agenda • Company Snapshot •Shared Data & State Fabric – Problem Statement – Existing Solutions – Pros & Cons – IMDG – Fundamentals & Case Study
  • 3.
    GigaSpaces – TelcoCustomers and Partners Founded 2000 , 200 Customers , Sales: US , EU , APAC , RND: TLV , Main Investor: Intel
  • 4.
    Problem Description • Telecomsystems internal systems currently use a variety of MOM (AMQP, XMPP) , IM protocols , Network based protocols, Application level protocols (SIP), HTML based protocols (Rest, SOAP) or DB (RDBMS, No SQL) to communicate with each other. – Introduce complexity, cost/ROI concerns, increased time to market, impact customer satisfaction • A universal data & state sharing fabric connecting the different products / systems / components is required – Real time response – Total reliability – Scalable & Highly Available
  • 5.
    Possible Solutions -#1:Database • Pros – Everybody knows databases. No need to educate the developers. • Cons – Users need to reshape the data structure to use a specific model (Table , Document). Lock-in. – No proactive event based messaging fabric. Requires external messaging integration – Management overhead – Not designed for real-time communication. Mostly disk based storage medium.
  • 6.
    Possible Solutions -#2:ESB • Pros – Designed for data transformation and routing between alien systems – Protocol translation focused • Cons – Does not maintain state. No state sharing. Getting the latest state requires a DB/Cache – Not designed for real-time communication
  • 7.
    What if wecould combine both DB and ESB but avoid all the cons? + =
  • 8.
    Enter IMDG • Rootsin the Tuple Space concept, RDBMS, OODBMS, MOM • Pros – State sharing in real time in Massive scale – Built-in proactive event based fabric • Fused to the engine core. No external subsystem – Support both Key/Value, SQL Based data access and Message deliver modes with the same API • Topic , Queue, Pub/Sub – Support locality of reference • Collocate Data and business logic • Cons – No full SQL Query support as a RDBMS – Requires some education/ramp up
  • 9.
    GigaSpaces IMDG BasicOperations Space Application WriteMultiple Space Application Write Space ReadMultiple Application Space Read Application Space Application TakeMultiple Space Application Take Space Application Execute Space Notify Application Space set Application Change putInMap increment deccrement addToCollection http://www.gigaspaces.com/docs/JavaDoc9.1/org/openspaces/core/GigaSpace.html
  • 10.
    IMDG Real-Time Interoperability Document IMDG POJO JavaSpace Map JPA JMS JDBC
  • 11.
    Application-IMDG Topologies Remote Remote call Application IMDG Process Process Collocated Local call Application IMDG Instance Application Process
  • 12.
    IMDG Real-Time Low-LatencyPerformance GigaSpaces IMDG Latency Benchmark 300 No Serialization 300 Async operation No Network usage Microseconds 250 200 180 200 150 100 50 10 5 0 Colocated Read/Write latency Remote Read/Write latency Notification latency Colocated Remote Notification latencyreplication latency Remote Benchmark using Cisco UCS , 2.93GHz CPU http://blog.gigaspaces.com/2010/12/06/possible-impossibility-the-race-to-zero-latency/
  • 13.
    Cassandra vs. GigaSpaces Cassandra vs. GigaSpaces Read Benchmark GigaSpaces TPS Cassandra TPS 50,000,000 25,000 40,000,000 20,000 GigaSpaces is 30,000,000 15,000 1000-2000 20,000,000 10,000 times faster 10,000,000 5,000 0 0 1 2 3 4 5 6 7 8 10 # of Client Threads Cassandra vs. GigaSpaces Write Benchmark GigaSpaces TPS 1,400,000 180,000 Cassandra TPS 160,000 1,200,000 140,000 GigaSpaces is 1,000,000 120,000 3-7 times faster 800,000 100,000 600,000 80,000 60,000 400,000 40,000 200,000 20,000 0 0 1 2 3 4 5 6 7 8 10 # of Client Threads Benchmark using Cisco UCS , 2.93GHz CPU
  • 14.
  • 15.
    IMDG Use Cases • Database Performance, Scaling • State replication for correlation • Distributed Stateful Processing • State replication over the WAN
  • 16.
    Database Performance, Scaling •Description – Device activity history – Last n activities • Challenge – Central DB bottleneck • Desired solution – Use a distributed cache to front end the database – Preload the data from the DB – Keep the DB in-synch • Requirements – High consistency – High Performance (<10,000 TX/Sec)
  • 17.
    Database Performance, Scaling SolutionArchitecture • Use IMDG to front-end the database Application • Keep the DB in-synch – Load the data into memory using Object based model – Write behind all updates • Use the IMDG as the system load Write behind of record – Rich Query (SQL,..) – Transaction support Database 1 Database 2 Database 3
  • 18.
    State Replication forcorrelation • Description – State correlation • Challenge – High availability • Desired solution – Peer model , multi-master architecture • Requirements – CAP based consistency
  • 19.
    State Replication forcorrelation Solution Architecture • Store correlation state in Application the IMDG • Can use replicated or Application partitioned IMDG topology • Can use Client-side cache for greater performance Application optimization of read operation
  • 20.
    Distributed Stateful Processing •Description – Workflow of distributed process – 5-10 steps per process • Challenge – State information is too big to transfer • Desired solution – Shared consistent state • Requirements – Scale to deal with large amount of nodes (>1000)
  • 21.
    Distributed Stateful Processing SolutionArchitecture • Store workflow state in the IMDG STATE A POCESS A STATE B PRCESS B STATE PROCESS C C • Use pub/sub model to synchronize the state • Use Template matching / SQL for querying the state of a particular object in the IMDG
  • 22.
    Sharing Device StateAcross Different DC • Description – Systems/devices running across different data centers – Need to share state in real time • Challenge – Requires reliable and scalable data replication over the WAN • Desired solution – Simple secured point2point state replication over the WAN • Requirements – Support low-bandwidth , high-latency networks
  • 23.
    Replication over theWAN Support Solution Architecture London New York Hong Kong
  • 24.
    GigaSpaces IMDG WANReplication Features Any Replication Topology Fully Transactional High Availability Custom Conflict Resolution Interoperable Data Filtering Cloud Enabled Single Click Bootstrapping Security Optimized Connection 24
  • 25.
    XAP – OneProduct for In Memory Computing Java-.NET-C++ Customize Spring, JPA,JMS Application JDBC, Map Management Schema-Free Rules & Workflows One Model for all components - Clustering - Security - HA - OA&M Real-Time Virtualize All Automated Middleware Deployment Components Monitoring Management
  • 26.
    Consistent Management &Monitoring Module (“Cloudify”) Application recipe uses domain-specific language (DSL) to describe the application life cycle Configuration and setup separate from the application recipe All necessary plumbing provided out of the box 26
  • 27.
  • 28.
  • 29.
    Summary – GigaSpacesIMDG • Designed for real-time data-driven interoperability • Management of data-centric architecture made easy • Supports high-end, complex applications • Blazing fast. Highly-Available.
  • 30.
    Thank You! www.gigaspaces.com

Editor's Notes

  • #17 A classic caching use where they want to use a cache to decouple Apama from a database (Oracle) used for reference data lookup as part of adata enrichment pattern. They store the history of the last 10 card (credit/debit) transactions done on all the ATMs. Data is pre-loaded into the database but updated during the uptime of the correlator through the cache. The primary driver is performance improvements delivering in-memory speed to complement the correlator story. The data size is in the ~500k transactions a day.For use case one, they could also change the architecture to use a data grid instead, which acts as the system or record (instead of Oracle). The grid needs to provide capabilities to load data from an RDBMS.
  • #19 Use a data grid as as data store for state replication of the correlator and the application running therein with the goal of delivering high availably. The expectation is that there is no loss of transaction.The vision for use case two is to have a distributed correlator that is represented by a set of nodes (&gt;2) that run the Apama contexts and that seamlessly hand over processing from one node to the next. You will want to do away with the classic 2 correlator active-passive or active-active architectures and move to a peer model of masters (N&gt;1) and slaves (M&gt;1). This will go hand in hand with configurable CAP tradeoffs, seamless failover of the nodes.
  • #21  Highly Distributed Processing:  A worldwide organization has 3000 nodes.  A given process moves asynchronously across these nodes in a non-determanistic way.   At any stop along the way, it may need access to the current state of a ‘record’ (let’s say an order or a complex document).      The record is too big to transfer with the process, but you want to have access to it at any step.  (There would be typoically 5-10 steps where aspects of the complex document would be required.  The entire document is not necessary at any given step (usually). 
  • #25 To sum upThe following replication features address enterprise multisite data replication needs:Any topology of replication is possible to adjust the replication to the business processes it servesThe gateway components are highly available and self healed to ensure replication continuityData can be filtered so only subsets of the data are replicatedThe entire solution is cloud enabled so it can run on any private or public cloud without further effort or adjustmentsThe replication is transactional to allow for data integrity and consistencyUser can customize the conflict resolution algorithm to match any business logicThe data is fully interoperable and can be read and written by .NET, Java and REST systemsOne site can remotely bootstrap other sites
  • #29 Add Puppet too!