NoSQL meetup July 2011
Upcoming SlideShare
Loading in...5
×
 

NoSQL meetup July 2011

on

  • 1,730 views

Real-Time processing with In-Memory-Data-Grid and NoSQL Database

Real-Time processing with In-Memory-Data-Grid and NoSQL Database

Statistics

Views

Total Views
1,730
Views on SlideShare
1,729
Embed Views
1

Actions

Likes
2
Downloads
39
Comments
0

1 Embed 1

https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

NoSQL meetup July 2011 NoSQL meetup July 2011 Presentation Transcript

  • NoSQL meetup
    July 2011
    Real-Time processing with In-Memory-Data-Grid and NoSQL
    Shay Hassidim
    Deputy CTO
    GigaSpaces Inc.
    shay@gigaspaces.com
  • Agenda
    Slides – 30 min
    Live Demos – 45 min
    Q&A – 15 min
    2
  • Real-Time Processing Use Cases
    Risk – Calculation engines
    Call Center Management
    E-commerce – auction monitoring , inventory
    Gaming – Multi-user , on-line gaming
    On-line marketing – Improve conversion rate
    Weather reporting
    Traffic analysis
    Supply-Chain optimization
    Manufacturing - Quality management in
    Shipment & Delivery Monitoring
    Fraud Detection
    3
  • Note the Time dimension
    4
  • Data resolution & processing models
    5
  • Traditional Processing - RDBMS
    Scale-up Database
    Use traditional RDBMS
    Stored procedure
    Flash memory to reduce I/O
    Read-only replica
    Limitations
    Doesn’t scale on write
    Extremely expensive (HW + SW)
    6
  • Traditional Processing - CEP
    Process the data as it comes
    Maintain a small fraction of the data in-memory
    Pros:
    Low-latency
    Relatively low-cost
    Cons
    Hard to scale (Mostly limited to scale-up)
    Not agile - Queries must be pre-generated
    Fairly complex
    7
  • In-Memory Database
    Scale up
    Pros
    Scale both on write & read
    Fits the event-driven model (CEP style) , ad-hoc query model
    SQL
    Cons
    • Cost of memory vs. disk
    • Memory capacity is limited
    • SQL
    8
  • NoSQL DB
    Distributed database
    Hbase, Cassandra, MongoDB
    Pros
    Scale on write/read
    Elastic
    Cons
    High latency on Read (tunable)
    Consistency tradeoffs are hard
    Non-Transactional
    9
  • Hadoop Map/Reduce
    Distributed batch processing
    Pros
    Designed to process massive amount of data
    Mature
    Low cost
    Cons
    Not real-time
    New Programming Model
    HDFS must be carefully tuned to improve data locality
    10
  • So what’s the bottom line?
    One size fit all model doesn’t cut it..
    The solution has to be a combination of several technologies and patterns...
    11
  • About GigaSpaces XAP…
    MW
    • Application Platform
    • Java, .Net, C++
    • Real-Time processing
    Free Edition
    • All Functionality
    • Limited Capacity
    Open
    • Entire client side source code provided
    12
  • GigaSpaces
    GigaSpaces delivers software middleware that provides enterprises and ISVs with end-to-end application scalability and cloud-enablement for mission-critical applications for hundreds of tier-1 organizations worldwide.
    13
  • GigaSpaces XAP Components
    Java-.Net-C++
    Ruby-Groovy-Jython-Spring JPA-JMS JDBC
    Schema-Free
    Customize Application Management Rules & Workflows
    1 Clustering Model for all components
    Run entire application in-memory… transaction -safe
    In-Memory
    Data Grid
    Real-Time Automated Deployment
    Monitoring
    Management
    Virtualize All Middleware Components
    14
  • Other Solutions…
    App Server
    Weblogic , websphere, Jboss AS , Tomcat …
    Orchestration
    Cheff, Pupet, Rightscale, Nolio ..
    JMS
    AQ , MQ , Active MQ…
    CEP
    Esper , Aleri , StreamBase…
    Caching
    • Alachi Soft
    • IBM extreme scale
    • Microsoft Velocity
    • Oracle Coherence
    • JBoss Infinispan
    • ScaleOut Software
    • Terracotta-EHCache
    • Tibco ActiveSpaces
    • Vmware GemFire
    • Gridgain
    • hazelcast
    15
  • RT Processing with IMDG and NoSQL DB
    - In Memory Data Grid
    - RT Processing Grid
    • Light Event Processing
    • Map-reduce
    • Event-driven
    • Execute code with data
    • Transactional
    • Secured
    • Elastic
    Event
    Sources
    Write
    behind
    NoSQL DB
    • Low-cost storage
    • Write/Read scalability
    • Dynamic scaling
    • Raw Data and aggregated Data
    Analytics Application
    Generate Patterns
    16
  • Use Case
    Calculation Engine Design Patterns
    With XAP
    17
  • Main Features Used
    Data Partitioning: Transparent content-based data partitioning to evenly and intelligently distribute data across your data-grid cluster
    Querying: Sophisticated query engine with support for SQL and example based queries
    Indexing: Predefined and ad-hoc property indexing for blazing fast data access
    Write Behind: Asynchronous and reliable propagation of data to any external data source
    Locking Support: Locking and transaction isolation for robust and hassle-free data access
    Master-Worker Support: Intuitive and highly scalable master-worker implementation for distributing computation-intensive tasks
    Distributed Code Execution: Dynamic code shipment and map/reduce execution across the grid for optimized processing and data access
    Content Based Routing: Routing of events to relevant cluster members based on their content
    Workflow Support: Implement complex workflows using event propagation and sophisticated event filtering
    Admin API: Comprehensive and intuitive API for monitoring and controlling every aspect of your cluster and application
    18
  • Elastic Calculation Engine - Colocated Logic
    Step 2 - The Task reads all the Trade objects and performs the NPV calculation for each Task. Result sent back into the client for final aggregation
    Step 1 - The client sends calculation Task to each partition with the specific Trade IDs required.
    Step 3 - The Calculation Task searches for all Trades. Any missing Trades are loaded in a lazy manner from the DB in one bulk query.
    The Data-Grid and the calculations Grid scale together
    Step 4 - Intermediate results retrieved from each partition and reduced.
    19
  • Elastic Calculation Engine - Remote Logic
    Step 3 - The Calculation logic searches for all Trades. Any missing Trades are loaded in a lazy manner from the DB in one bulk query and written into the space to be reused later.
    Step 2 - Each Calculation engine consumes a different Request , processes it and writes the Result back into the space. Using local cache for reference data.
    Step 1 - The client sends calculation Requests to the space cluster.
    Scales on demand separately from the Data-Grid
    The Data Grid and the calculations Grid scale independently
    Step 4 - The client consumes all the calculation results and performs final aggregation.
    20
  • Demos
    Simple IMDG Operations
    IMDG write,read,execute…
    IMDG and NoSQL DB Integration
    Cassandra
    MongoDB
    Calculation Engine
    Small scale Demo
    Large scale Demo – on the Cloud
    21
  • 22