NoSQL meetup July 2011

1,572 views
1,465 views

Published on

Real-Time processing with In-Memory-Data-Grid and NoSQL Database

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,572
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
40
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

NoSQL meetup July 2011

  1. 1. NoSQL meetup<br />July 2011<br />Real-Time processing with In-Memory-Data-Grid and NoSQL<br />Shay Hassidim<br />Deputy CTO<br />GigaSpaces Inc.<br />shay@gigaspaces.com<br />
  2. 2. Agenda<br />Slides – 30 min<br />Live Demos – 45 min<br />Q&A – 15 min<br />2<br />
  3. 3. Real-Time Processing Use Cases<br />Risk – Calculation engines<br />Call Center Management<br />E-commerce – auction monitoring , inventory <br />Gaming – Multi-user , on-line gaming <br />On-line marketing – Improve conversion rate<br />Weather reporting<br />Traffic analysis <br />Supply-Chain optimization<br />Manufacturing - Quality management in<br />Shipment & Delivery Monitoring<br />Fraud Detection<br />3<br />
  4. 4. Note the Time dimension<br />4<br />
  5. 5. Data resolution & processing models<br />5<br />
  6. 6. Traditional Processing - RDBMS<br />Scale-up Database <br />Use traditional RDBMS<br />Stored procedure<br />Flash memory to reduce I/O<br />Read-only replica<br />Limitations<br />Doesn’t scale on write<br />Extremely expensive (HW + SW)<br />6<br />
  7. 7. Traditional Processing - CEP<br />Process the data as it comes<br />Maintain a small fraction of the data in-memory<br />Pros:<br />Low-latency<br />Relatively low-cost<br />Cons<br />Hard to scale (Mostly limited to scale-up)<br />Not agile - Queries must be pre-generated<br />Fairly complex <br />7<br />
  8. 8. In-Memory Database<br />Scale up<br />Pros<br />Scale both on write & read<br />Fits the event-driven model (CEP style) , ad-hoc query model<br />SQL<br />Cons<br /><ul><li>Cost of memory vs. disk
  9. 9. Memory capacity is limited
  10. 10. SQL</li></ul>8<br />
  11. 11. NoSQL DB<br />Distributed database<br />Hbase, Cassandra, MongoDB <br />Pros<br />Scale on write/read<br />Elastic<br />Cons<br />High latency on Read (tunable)<br />Consistency tradeoffs are hard<br />Non-Transactional<br />9<br />
  12. 12. Hadoop Map/Reduce<br />Distributed batch processing<br />Pros<br />Designed to process massive amount of data<br />Mature<br />Low cost<br />Cons<br />Not real-time <br />New Programming Model<br />HDFS must be carefully tuned to improve data locality<br />10<br />
  13. 13. So what’s the bottom line?<br />One size fit all model doesn’t cut it..<br />The solution has to be a combination of several technologies and patterns...<br />11<br />
  14. 14. About GigaSpaces XAP…<br />MW<br /><ul><li>Application Platform
  15. 15. Java, .Net, C++
  16. 16. Real-Time processing</li></ul>Free Edition<br /><ul><li>All Functionality
  17. 17. Limited Capacity</li></ul>Open<br /><ul><li>Entire client side source code provided</li></ul>12<br />
  18. 18. GigaSpaces<br />GigaSpaces delivers software middleware that provides enterprises and ISVs with end-to-end application scalability and cloud-enablement for mission-critical applications for hundreds of tier-1 organizations worldwide.<br />13<br />
  19. 19. GigaSpaces XAP Components<br />Java-.Net-C++<br />Ruby-Groovy-Jython-Spring JPA-JMS JDBC<br />Schema-Free<br />Customize Application Management Rules & Workflows <br />1 Clustering Model for all components<br />Run entire application in-memory… transaction -safe<br />In-Memory<br />Data Grid<br />Real-Time Automated Deployment <br />Monitoring<br />Management<br />Virtualize All Middleware Components<br />14<br />
  20. 20. Other Solutions…<br />App Server<br />Weblogic , websphere, Jboss AS , Tomcat …<br />Orchestration <br />Cheff, Pupet, Rightscale, Nolio ..<br />JMS<br />AQ , MQ , Active MQ…<br />CEP<br />Esper , Aleri , StreamBase…<br />Caching<br /><ul><li>Alachi Soft
  21. 21. IBM extreme scale
  22. 22. Microsoft Velocity
  23. 23. Oracle Coherence
  24. 24. JBoss Infinispan
  25. 25. ScaleOut Software
  26. 26. Terracotta-EHCache
  27. 27. Tibco ActiveSpaces
  28. 28. Vmware GemFire
  29. 29. Gridgain
  30. 30. hazelcast</li></ul>15<br />
  31. 31. RT Processing with IMDG and NoSQL DB<br />- In Memory Data Grid<br />- RT Processing Grid<br /><ul><li>Light Event Processing
  32. 32. Map-reduce
  33. 33. Event-driven
  34. 34. Execute code with data
  35. 35. Transactional
  36. 36. Secured
  37. 37. Elastic</li></ul>Event<br />Sources<br />Write <br />behind<br />NoSQL DB<br /><ul><li>Low-cost storage
  38. 38. Write/Read scalability
  39. 39. Dynamic scaling
  40. 40. Raw Data and aggregated Data</li></ul>Analytics Application<br />Generate Patterns<br />16<br />
  41. 41. Use Case<br />Calculation Engine Design Patterns<br />With XAP<br />17<br />
  42. 42. Main Features Used<br />Data Partitioning: Transparent content-based data partitioning to evenly and intelligently distribute data across your data-grid cluster<br />Querying: Sophisticated query engine with support for SQL and example based queries <br />Indexing: Predefined and ad-hoc property indexing for blazing fast data access <br />Write Behind: Asynchronous and reliable propagation of data to any external data source <br />Locking Support: Locking and transaction isolation for robust and hassle-free data access <br />Master-Worker Support: Intuitive and highly scalable master-worker implementation for distributing computation-intensive tasks <br />Distributed Code Execution: Dynamic code shipment and map/reduce execution across the grid for optimized processing and data access <br />Content Based Routing: Routing of events to relevant cluster members based on their content <br />Workflow Support: Implement complex workflows using event propagation and sophisticated event filtering <br />Admin API: Comprehensive and intuitive API for monitoring and controlling every aspect of your cluster and application<br />18<br />
  43. 43. Elastic Calculation Engine - Colocated Logic<br />Step 2 - The Task reads all the Trade objects and performs the NPV calculation for each Task. Result sent back into the client for final aggregation<br />Step 1 - The client sends calculation Task to each partition with the specific Trade IDs required.<br />Step 3 - The Calculation Task searches for all Trades. Any missing Trades are loaded in a lazy manner from the DB in one bulk query. <br />The Data-Grid and the calculations Grid scale together<br />Step 4 - Intermediate results retrieved from each partition and reduced.<br />19<br />
  44. 44. Elastic Calculation Engine - Remote Logic<br />Step 3 - The Calculation logic searches for all Trades. Any missing Trades are loaded in a lazy manner from the DB in one bulk query and written into the space to be reused later.<br />Step 2 - Each Calculation engine consumes a different Request , processes it and writes the Result back into the space. Using local cache for reference data.<br />Step 1 - The client sends calculation Requests to the space cluster.<br />Scales on demand separately from the Data-Grid<br />The Data Grid and the calculations Grid scale independently<br />Step 4 - The client consumes all the calculation results and performs final aggregation.<br />20<br />
  45. 45. Demos<br />Simple IMDG Operations<br />IMDG write,read,execute…<br />IMDG and NoSQL DB Integration<br />Cassandra <br />MongoDB<br />Calculation Engine<br />Small scale Demo<br />Large scale Demo – on the Cloud<br />21<br />
  46. 46. 22<br />

×