• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
NoSQL meetup July 2011
 

NoSQL meetup July 2011

on

  • 1,663 views

Real-Time processing with In-Memory-Data-Grid and NoSQL Database

Real-Time processing with In-Memory-Data-Grid and NoSQL Database

Statistics

Views

Total Views
1,663
Views on SlideShare
1,663
Embed Views
0

Actions

Likes
2
Downloads
39
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    NoSQL meetup July 2011 NoSQL meetup July 2011 Presentation Transcript

    • NoSQL meetup
      July 2011
      Real-Time processing with In-Memory-Data-Grid and NoSQL
      Shay Hassidim
      Deputy CTO
      GigaSpaces Inc.
      shay@gigaspaces.com
    • Agenda
      Slides – 30 min
      Live Demos – 45 min
      Q&A – 15 min
      2
    • Real-Time Processing Use Cases
      Risk – Calculation engines
      Call Center Management
      E-commerce – auction monitoring , inventory
      Gaming – Multi-user , on-line gaming
      On-line marketing – Improve conversion rate
      Weather reporting
      Traffic analysis
      Supply-Chain optimization
      Manufacturing - Quality management in
      Shipment & Delivery Monitoring
      Fraud Detection
      3
    • Note the Time dimension
      4
    • Data resolution & processing models
      5
    • Traditional Processing - RDBMS
      Scale-up Database
      Use traditional RDBMS
      Stored procedure
      Flash memory to reduce I/O
      Read-only replica
      Limitations
      Doesn’t scale on write
      Extremely expensive (HW + SW)
      6
    • Traditional Processing - CEP
      Process the data as it comes
      Maintain a small fraction of the data in-memory
      Pros:
      Low-latency
      Relatively low-cost
      Cons
      Hard to scale (Mostly limited to scale-up)
      Not agile - Queries must be pre-generated
      Fairly complex
      7
    • In-Memory Database
      Scale up
      Pros
      Scale both on write & read
      Fits the event-driven model (CEP style) , ad-hoc query model
      SQL
      Cons
      • Cost of memory vs. disk
      • Memory capacity is limited
      • SQL
      8
    • NoSQL DB
      Distributed database
      Hbase, Cassandra, MongoDB
      Pros
      Scale on write/read
      Elastic
      Cons
      High latency on Read (tunable)
      Consistency tradeoffs are hard
      Non-Transactional
      9
    • Hadoop Map/Reduce
      Distributed batch processing
      Pros
      Designed to process massive amount of data
      Mature
      Low cost
      Cons
      Not real-time
      New Programming Model
      HDFS must be carefully tuned to improve data locality
      10
    • So what’s the bottom line?
      One size fit all model doesn’t cut it..
      The solution has to be a combination of several technologies and patterns...
      11
    • About GigaSpaces XAP…
      MW
      • Application Platform
      • Java, .Net, C++
      • Real-Time processing
      Free Edition
      • All Functionality
      • Limited Capacity
      Open
      • Entire client side source code provided
      12
    • GigaSpaces
      GigaSpaces delivers software middleware that provides enterprises and ISVs with end-to-end application scalability and cloud-enablement for mission-critical applications for hundreds of tier-1 organizations worldwide.
      13
    • GigaSpaces XAP Components
      Java-.Net-C++
      Ruby-Groovy-Jython-Spring JPA-JMS JDBC
      Schema-Free
      Customize Application Management Rules & Workflows
      1 Clustering Model for all components
      Run entire application in-memory… transaction -safe
      In-Memory
      Data Grid
      Real-Time Automated Deployment
      Monitoring
      Management
      Virtualize All Middleware Components
      14
    • Other Solutions…
      App Server
      Weblogic , websphere, Jboss AS , Tomcat …
      Orchestration
      Cheff, Pupet, Rightscale, Nolio ..
      JMS
      AQ , MQ , Active MQ…
      CEP
      Esper , Aleri , StreamBase…
      Caching
      • Alachi Soft
      • IBM extreme scale
      • Microsoft Velocity
      • Oracle Coherence
      • JBoss Infinispan
      • ScaleOut Software
      • Terracotta-EHCache
      • Tibco ActiveSpaces
      • Vmware GemFire
      • Gridgain
      • hazelcast
      15
    • RT Processing with IMDG and NoSQL DB
      - In Memory Data Grid
      - RT Processing Grid
      • Light Event Processing
      • Map-reduce
      • Event-driven
      • Execute code with data
      • Transactional
      • Secured
      • Elastic
      Event
      Sources
      Write
      behind
      NoSQL DB
      • Low-cost storage
      • Write/Read scalability
      • Dynamic scaling
      • Raw Data and aggregated Data
      Analytics Application
      Generate Patterns
      16
    • Use Case
      Calculation Engine Design Patterns
      With XAP
      17
    • Main Features Used
      Data Partitioning: Transparent content-based data partitioning to evenly and intelligently distribute data across your data-grid cluster
      Querying: Sophisticated query engine with support for SQL and example based queries
      Indexing: Predefined and ad-hoc property indexing for blazing fast data access
      Write Behind: Asynchronous and reliable propagation of data to any external data source
      Locking Support: Locking and transaction isolation for robust and hassle-free data access
      Master-Worker Support: Intuitive and highly scalable master-worker implementation for distributing computation-intensive tasks
      Distributed Code Execution: Dynamic code shipment and map/reduce execution across the grid for optimized processing and data access
      Content Based Routing: Routing of events to relevant cluster members based on their content
      Workflow Support: Implement complex workflows using event propagation and sophisticated event filtering
      Admin API: Comprehensive and intuitive API for monitoring and controlling every aspect of your cluster and application
      18
    • Elastic Calculation Engine - Colocated Logic
      Step 2 - The Task reads all the Trade objects and performs the NPV calculation for each Task. Result sent back into the client for final aggregation
      Step 1 - The client sends calculation Task to each partition with the specific Trade IDs required.
      Step 3 - The Calculation Task searches for all Trades. Any missing Trades are loaded in a lazy manner from the DB in one bulk query.
      The Data-Grid and the calculations Grid scale together
      Step 4 - Intermediate results retrieved from each partition and reduced.
      19
    • Elastic Calculation Engine - Remote Logic
      Step 3 - The Calculation logic searches for all Trades. Any missing Trades are loaded in a lazy manner from the DB in one bulk query and written into the space to be reused later.
      Step 2 - Each Calculation engine consumes a different Request , processes it and writes the Result back into the space. Using local cache for reference data.
      Step 1 - The client sends calculation Requests to the space cluster.
      Scales on demand separately from the Data-Grid
      The Data Grid and the calculations Grid scale independently
      Step 4 - The client consumes all the calculation results and performs final aggregation.
      20
    • Demos
      Simple IMDG Operations
      IMDG write,read,execute…
      IMDG and NoSQL DB Integration
      Cassandra
      MongoDB
      Calculation Engine
      Small scale Demo
      Large scale Demo – on the Cloud
      21
    • 22