Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Complex Analytics with NoSQL Data Store in Real Time


Published on

NOSQL are often limited in the type of queries that they can support due to the distributed nature of the data. In this session we would learn patterns on how we can overcome this limitation and combine multiple query semantics with NoSQL based engines.
We will demonstrate specifically a combination of key/value, SQL like, Document model and Graph based queries as well as more advanced topic such as handling partial update and query through projection. We will also demonstrate how we can create a meshaup between those API's i.e. write fast through Key/Value API and execute complex queries on that same data through SQL query.

- See more at:

Published in: Technology
  • Be the first to comment

Complex Analytics with NoSQL Data Store in Real Time

  1. 1. Complex Analytics with NoSQL Data Store in Real Time Nested Queries, Projection, Transactions and more Nati Shalom @natishalom
  2. 2. What were here to discuss? Making Sense of the Exploding Data World How that World Could Look Like if Disk is no Longer the Bottleneck Live Demo
  3. 3. Making Sense of The Exploding Data World
  4. 4. Capacity and Performance Drives New Data Management Technologies PB TB GB Data Volume Data Mining Machine Learning Data Business Intelligence Warehouse High Throughput OLTP Yr Mo Day Hr Min Sec MS μS Data Velocity Operational Intelligence Exploratory Analytics OLTP Streaming
  5. 5. Let’s Look at Tradeoffs of Some Selected Solutions
  6. 6. SQL Queries • Query: SQL • Semantics: • CRUD • Aggregation • Projection • Partial update • Performance: 100’s/Sec • Consistency: Transactional • Scaling: Mostly Scale-UP • Availability: Disk Based
  7. 7. NoSQL • Query: Proprietary but rich • Semantics: • CRUD • Limited Aggregation (Map/Reduce) • No Projection • No Partial update • Performance: 1000s/Sec • Consistency: Eventual • Scaling: Mostly Scale-Out • Availability: Based on replication
  8. 8. IMDG • Query: Propriety but rich • Semantics: • CRUD • Aggregation API + Map/Reduce • Projection (GigaSpaces) • Partial Update (GigaSpaces) • Performance: 100k/sec • Consistency: Transactional • Scaling: Mostly Scale-Out • Availability: Replication
  9. 9. Key/Value • Query: Key, Value • Semantics: • Mostly Read • No Aggregation • No Projection • No Partial update • Performance: 1M’s/sec • Consistency: Atomic • Scaling: Mostly Scale-Out • Availability: Limited (varies quite substantially between implementations)
  10. 10. Stream Processing (Storm) • Semantics – Event driven data processing • Used for continues updates Spouts – No need for a costly “SELECT FOR UPDATE” • Performance: 10’sM/sec updates Bolt
  11. 11. Common Assumption Disk is the bottleneck 100X 10,000X HDD Latency (Seek & Rotate) = Little Improvement 2010 Performance^10 2000 2020 Source: GigaOM Research
  12. 12. Capacity and Performance Drives New Data Management Technologies (Source: IDC, 2013) Big Data (Hadoop) NoSQL In Memory, Stream Processing RDBMS
  13. 13. There’s No One Size Fits All
  14. 14. A Typical App Looks Like This.. Front End Analytics RT STORM Batch The Data Flow Complexity
  15. 15. What if Disk Was no Longer the Bottleneck? FLASH Closes the CPU to Storage Gap
  16. 16. Our Application Cloud Look Like This.. Front End High Speed Data Store (Using Flash/NVM) Key/Value SQL Document Graph Map/Reduce Transactional Disk Becomes the new Tape StreamBase Common Data Store serving Multiple Semantics/API
  17. 17. We're not there yet .. But..
  18. 18. We can use High Speed Data Bus for Integrating All of our Data Sources Front End Analytics RT STORM Batch High Speed Data Bus (Built-In Caching) RT Transactional Data Access Direct Access RT Streaming Hadoop Synch MySQL Synch Mongo Synch
  19. 19. High Speed Data Bus (Zoom In)
  20. 20. Designed for Transactional and Analytics Scenarios.. Homeland Security Real Time Search Social eCommerce User Tracking & Engagement Financial Services
  21. 21. Many API’s – Same Data Key/Value SQL Document Graph Map/Reduce Transactional
  22. 22. Let’s take a closer look..
  23. 23. Nested Queries & Projections
  24. 24. Aggregations.
  25. 25. Fast Update … Remains with strong consistency!
  26. 26. Transactions support
  27. 27. The Performance of RAM at a Cost/Capacity Closer to Disk Provides 2x – 3.6x Better TPS/$ 1:50 More Capacity ZetaScale-GigaSpaces on SSDs Stock GigaSpaces in DRAM 62 - 1KB object size and uniform distribution - 2 sockets 2.8GHz CPU with total 24 cores, CentOS 5.8, 2 FusionIO SLC PCIe cards RAID - YCSB measurements performed by SanDisk 121 17 56 160 140 120 100 80 60 40 20 0 No Read / 100% Write 100 % Read / No Write FDF-GigaSpaces on SSDs Stock GigaSpaces in DRAM Assumptions: 1TB Flash = $2K; 1TB RAM = $20K ZetaScale-GigaSpaces 1200 1000 800 600 400 200 ZetaScale™ – XAP MemoryXtend 1:50 20 1000 0 Capacity XAP XAP Extend 242k Read/Sec
  28. 28. Data is Moving to Cloud Source: Managing Storage: Trends, Challenges, and Options (2013-2014). (EMC, 2013)
  29. 29. Orchestration needs to be integrated into DataBase solution to make it Cloud Ready
  30. 30. Click on the relevant box to get the demo Many API’s Same Data Demo References Data Bus (Integration with Storm) Built In Orchestration
  31. 31. Summary
  32. 32. Nati Shalom Check out the slide on