Complex Analytics with NoSQL Data Store in Real Time


Published on

NOSQL are often limited in the type of queries that they can support due to the distributed nature of the data. In this session we would learn patterns on how we can overcome this limitation and combine multiple query semantics with NoSQL based engines.
We will demonstrate specifically a combination of key/value, SQL like, Document model and Graph based queries as well as more advanced topic such as handling partial update and query through projection. We will also demonstrate how we can create a meshaup between those API's i.e. write fast through Key/Value API and execute complex queries on that same data through SQL query.

- See more at:

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

  • Some of the emerging NewSQL and NoSQL disk-based databases might have had the ability to deal with the more demanding data volume and variety but…
    But disk-based databases have always been I/O bound – in other words, keeping up with the new velocity demands of data is much harder. Disks have always gotten in the way of database velocity or throughput. The closer to real-time that transaction throughput or analytics must be, the harder it is for disk-based approaches to keep up.
  • It constructs a processing graph that feeds data from an input source through processing nodes.
    The processing graph is called a "topology".
    The input data sources are called "spouts", and the processing nodes are called "bolts".
    The data model consists of tuples.
    Tuples flow from Spouts to the bolts, which execute user code.
  • Complex Analytics with NoSQL Data Store in Real Time

    1. 1. Complex Analytics with NoSQL Data Store in Real Time Nested Queries, Projection, Transactions and more Nati Shalom @natishalom
    2. 2. What were here to discuss? Making Sense of the Exploding Data World How that World Could Look Like if Disk is no Longer the Bottleneck Live Demo
    3. 3. Making Sense of The Exploding Data World
    4. 4. Capacity and Performance Drives New Data Management Technologies PB TB GB Data Volume Data Mining Machine Learning Data Business Intelligence Warehouse High Throughput OLTP Yr Mo Day Hr Min Sec MS μS Data Velocity Operational Intelligence Exploratory Analytics OLTP Streaming
    5. 5. Let’s Look at Tradeoffs of Some Selected Solutions
    6. 6. SQL Queries • Query: SQL • Semantics: • CRUD • Aggregation • Projection • Partial update • Performance: 100’s/Sec • Consistency: Transactional • Scaling: Mostly Scale-UP • Availability: Disk Based
    7. 7. NoSQL • Query: Proprietary but rich • Semantics: • CRUD • Limited Aggregation (Map/Reduce) • No Projection • No Partial update • Performance: 1000s/Sec • Consistency: Eventual • Scaling: Mostly Scale-Out • Availability: Based on replication
    8. 8. IMDG • Query: Propriety but rich • Semantics: • CRUD • Aggregation API + Map/Reduce • Projection (GigaSpaces) • Partial Update (GigaSpaces) • Performance: 100k/sec • Consistency: Transactional • Scaling: Mostly Scale-Out • Availability: Replication
    9. 9. Key/Value • Query: Key, Value • Semantics: • Mostly Read • No Aggregation • No Projection • No Partial update • Performance: 1M’s/sec • Consistency: Atomic • Scaling: Mostly Scale-Out • Availability: Limited (varies quite substantially between implementations)
    10. 10. Stream Processing (Storm) • Semantics – Event driven data processing • Used for continues updates Spouts – No need for a costly “SELECT FOR UPDATE” • Performance: 10’sM/sec updates Bolt
    11. 11. Common Assumption Disk is the bottleneck 100X 10,000X HDD Latency (Seek & Rotate) = Little Improvement 2010 Performance^10 2000 2020 Source: GigaOM Research
    12. 12. Capacity and Performance Drives New Data Management Technologies (Source: IDC, 2013) Big Data (Hadoop) NoSQL In Memory, Stream Processing RDBMS
    13. 13. There’s No One Size Fits All
    14. 14. A Typical App Looks Like This.. Front End Analytics RT STORM Batch The Data Flow Complexity
    15. 15. What if Disk Was no Longer the Bottleneck? FLASH Closes the CPU to Storage Gap
    16. 16. Our Application Cloud Look Like This.. Front End High Speed Data Store (Using Flash/NVM) Key/Value SQL Document Graph Map/Reduce Transactional Disk Becomes the new Tape StreamBase Common Data Store serving Multiple Semantics/API
    17. 17. We're not there yet .. But..
    18. 18. We can use High Speed Data Bus for Integrating All of our Data Sources Front End Analytics RT STORM Batch High Speed Data Bus (Built-In Caching) RT Transactional Data Access Direct Access RT Streaming Hadoop Synch MySQL Synch Mongo Synch
    19. 19. High Speed Data Bus (Zoom In)
    20. 20. Designed for Transactional and Analytics Scenarios.. Homeland Security Real Time Search Social eCommerce User Tracking & Engagement Financial Services
    21. 21. Many API’s – Same Data Key/Value SQL Document Graph Map/Reduce Transactional
    22. 22. Let’s take a closer look..
    23. 23. Nested Queries & Projections
    24. 24. Aggregations.
    25. 25. Fast Update … Remains with strong consistency!
    26. 26. Transactions support
    27. 27. The Performance of RAM at a Cost/Capacity Closer to Disk Provides 2x – 3.6x Better TPS/$ 1:50 More Capacity ZetaScale-GigaSpaces on SSDs Stock GigaSpaces in DRAM 62 - 1KB object size and uniform distribution - 2 sockets 2.8GHz CPU with total 24 cores, CentOS 5.8, 2 FusionIO SLC PCIe cards RAID - YCSB measurements performed by SanDisk 121 17 56 160 140 120 100 80 60 40 20 0 No Read / 100% Write 100 % Read / No Write FDF-GigaSpaces on SSDs Stock GigaSpaces in DRAM Assumptions: 1TB Flash = $2K; 1TB RAM = $20K ZetaScale-GigaSpaces 1200 1000 800 600 400 200 ZetaScale™ – XAP MemoryXtend 1:50 20 1000 0 Capacity XAP XAP Extend 242k Read/Sec
    28. 28. Data is Moving to Cloud Source: Managing Storage: Trends, Challenges, and Options (2013-2014). (EMC, 2013)
    29. 29. Orchestration needs to be integrated into DataBase solution to make it Cloud Ready
    30. 30. Click on the relevant box to get the demo Many API’s Same Data Demo References Data Bus (Integration with Storm) Built In Orchestration
    31. 31. Summary
    32. 32. Nati Shalom Check out the slide on