Complex Analytics with NoSQL Data Store in 
Real Time 
Nested Queries, Projection, 
Transactions and more 
Nati Shalom 
@natishalom 
slideshare.net/giganati
What were here to discuss? 
Making Sense of the Exploding Data 
World 
How that World Could Look Like if 
Disk is no Longer the Bottleneck 
Live Demo
Making Sense of The Exploding Data World
Capacity and Performance Drives 
New Data Management Technologies 
PB 
TB 
GB 
Data Volume 
Data Mining 
Machine 
Learning 
Data 
Business Intelligence 
Warehouse High Throughput OLTP 
Yr Mo Day Hr Min Sec MS μS 
Data Velocity 
Operational Intelligence 
Exploratory Analytics 
OLTP 
Streaming
Let’s Look at 
Tradeoffs of 
Some Selected 
Solutions
SQL Queries 
• Query: SQL 
• Semantics: 
• CRUD 
• Aggregation 
• Projection 
• Partial update 
• Performance: 100’s/Sec 
• Consistency: Transactional 
• Scaling: Mostly Scale-UP 
• Availability: Disk Based
NoSQL 
• Query: Proprietary but rich 
• Semantics: 
• CRUD 
• Limited Aggregation 
(Map/Reduce) 
• No Projection 
• No Partial update 
• Performance: 1000s/Sec 
• Consistency: Eventual 
• Scaling: Mostly Scale-Out 
• Availability: Based on 
replication
IMDG 
• Query: Propriety but rich 
• Semantics: 
• CRUD 
• Aggregation API + 
Map/Reduce 
• Projection (GigaSpaces) 
• Partial Update 
(GigaSpaces) 
• Performance: 100k/sec 
• Consistency: Transactional 
• Scaling: Mostly Scale-Out 
• Availability: Replication
Key/Value 
• Query: Key, Value 
• Semantics: 
• Mostly Read 
• No Aggregation 
• No Projection 
• No Partial update 
• Performance: 1M’s/sec 
• Consistency: Atomic 
• Scaling: Mostly Scale-Out 
• Availability: Limited (varies 
quite substantially between 
implementations)
Stream Processing (Storm) 
• Semantics 
– Event driven data processing 
• Used for continues 
updates 
Spouts 
– No need for a costly “SELECT 
FOR UPDATE” 
• Performance: 10’sM/sec 
updates 
Bolt
Common Assumption 
Disk is the bottleneck 
100X 
10,000X 
HDD Latency (Seek & Rotate) = Little Improvement 
2010 
Performance^10 
2000 2020 
Source: GigaOM Research
Capacity and Performance Drives 
New Data Management Technologies 
(Source: IDC, 2013) 
Big Data (Hadoop) 
NoSQL 
In Memory, 
Stream 
Processing 
RDBMS
There’s No One Size Fits All
A Typical App Looks Like This.. 
Front End Analytics 
RT 
STORM 
Batch 
The Data Flow 
Complexity
What if Disk Was no Longer the 
Bottleneck? 
FLASH Closes the 
CPU to Storage Gap
Our Application Cloud Look Like This.. 
Front End 
High Speed 
Data Store 
(Using Flash/NVM) 
Key/Value 
SQL 
Document 
Graph 
Map/Reduce 
Transactional 
Disk Becomes 
the new Tape 
StreamBase 
Common Data Store serving 
Multiple Semantics/API
We're not there yet .. 
But..
We can use High Speed Data Bus for 
Integrating All of our Data Sources 
Front End Analytics 
RT 
STORM 
Batch 
High Speed 
Data Bus 
(Built-In 
Caching) 
RT 
Transactional 
Data Access 
Direct Access 
RT Streaming 
Hadoop Synch 
MySQL Synch 
Mongo Synch
High Speed Data Bus (Zoom In)
Designed for Transactional and 
Analytics Scenarios.. 
Homeland Security 
Real Time Search 
Social 
eCommerce 
User Tracking & 
Engagement 
Financial Services
Many API’s – Same Data 
Key/Value SQL Document Graph Map/Reduce Transactional
Let’s take a closer look..
Nested Queries & Projections
Aggregations.
Fast Update … 
Remains with strong consistency!
Transactions support
The Performance of RAM at a Cost/Capacity Closer to Disk 
Provides 2x – 3.6x Better TPS/$ 1:50 More Capacity 
ZetaScale-GigaSpaces on SSDs 
Stock GigaSpaces in DRAM 
62 
- 1KB object size and uniform distribution 
- 2 sockets 2.8GHz CPU with total 24 cores, 
CentOS 5.8, 2 FusionIO SLC PCIe cards RAID 
- YCSB measurements performed by SanDisk 
121 
17 
56 
160 
140 
120 
100 
80 
60 
40 
20 
0 
No Read / 100% Write 100 % Read / No Write 
FDF-GigaSpaces on SSDs Stock GigaSpaces in DRAM 
Assumptions: 1TB Flash = $2K; 1TB RAM = $20K 
ZetaScale-GigaSpaces 
1200 
1000 
800 
600 
400 
200 
ZetaScale™ – XAP MemoryXtend 
1:50 
20 
1000 
0 
Capacity 
XAP XAP Extend 
242k Read/Sec
Data is Moving to Cloud 
Source: Managing Storage: Trends, Challenges, and Options (2013-2014). (EMC, 2013)
Orchestration needs to be integrated 
into DataBase solution to make it 
Cloud Ready
Click on the relevant box to get the demo 
Many API’s Same 
Data 
Demo References 
Data Bus (Integration 
with Storm) 
Built In Orchestration
Summary
Nati Shalom 
Check out the slide on http://www.slideshare.net/giganati

Complex Analytics with NoSQL Data Store in Real Time

  • 1.
    Complex Analytics withNoSQL Data Store in Real Time Nested Queries, Projection, Transactions and more Nati Shalom @natishalom slideshare.net/giganati
  • 2.
    What were hereto discuss? Making Sense of the Exploding Data World How that World Could Look Like if Disk is no Longer the Bottleneck Live Demo
  • 3.
    Making Sense ofThe Exploding Data World
  • 4.
    Capacity and PerformanceDrives New Data Management Technologies PB TB GB Data Volume Data Mining Machine Learning Data Business Intelligence Warehouse High Throughput OLTP Yr Mo Day Hr Min Sec MS μS Data Velocity Operational Intelligence Exploratory Analytics OLTP Streaming
  • 5.
    Let’s Look at Tradeoffs of Some Selected Solutions
  • 6.
    SQL Queries •Query: SQL • Semantics: • CRUD • Aggregation • Projection • Partial update • Performance: 100’s/Sec • Consistency: Transactional • Scaling: Mostly Scale-UP • Availability: Disk Based
  • 7.
    NoSQL • Query:Proprietary but rich • Semantics: • CRUD • Limited Aggregation (Map/Reduce) • No Projection • No Partial update • Performance: 1000s/Sec • Consistency: Eventual • Scaling: Mostly Scale-Out • Availability: Based on replication
  • 8.
    IMDG • Query:Propriety but rich • Semantics: • CRUD • Aggregation API + Map/Reduce • Projection (GigaSpaces) • Partial Update (GigaSpaces) • Performance: 100k/sec • Consistency: Transactional • Scaling: Mostly Scale-Out • Availability: Replication
  • 9.
    Key/Value • Query:Key, Value • Semantics: • Mostly Read • No Aggregation • No Projection • No Partial update • Performance: 1M’s/sec • Consistency: Atomic • Scaling: Mostly Scale-Out • Availability: Limited (varies quite substantially between implementations)
  • 10.
    Stream Processing (Storm) • Semantics – Event driven data processing • Used for continues updates Spouts – No need for a costly “SELECT FOR UPDATE” • Performance: 10’sM/sec updates Bolt
  • 11.
    Common Assumption Diskis the bottleneck 100X 10,000X HDD Latency (Seek & Rotate) = Little Improvement 2010 Performance^10 2000 2020 Source: GigaOM Research
  • 12.
    Capacity and PerformanceDrives New Data Management Technologies (Source: IDC, 2013) Big Data (Hadoop) NoSQL In Memory, Stream Processing RDBMS
  • 13.
    There’s No OneSize Fits All
  • 14.
    A Typical AppLooks Like This.. Front End Analytics RT STORM Batch The Data Flow Complexity
  • 15.
    What if DiskWas no Longer the Bottleneck? FLASH Closes the CPU to Storage Gap
  • 16.
    Our Application CloudLook Like This.. Front End High Speed Data Store (Using Flash/NVM) Key/Value SQL Document Graph Map/Reduce Transactional Disk Becomes the new Tape StreamBase Common Data Store serving Multiple Semantics/API
  • 17.
    We're not thereyet .. But..
  • 18.
    We can useHigh Speed Data Bus for Integrating All of our Data Sources Front End Analytics RT STORM Batch High Speed Data Bus (Built-In Caching) RT Transactional Data Access Direct Access RT Streaming Hadoop Synch MySQL Synch Mongo Synch
  • 19.
    High Speed DataBus (Zoom In)
  • 20.
    Designed for Transactionaland Analytics Scenarios.. Homeland Security Real Time Search Social eCommerce User Tracking & Engagement Financial Services
  • 21.
    Many API’s –Same Data Key/Value SQL Document Graph Map/Reduce Transactional
  • 22.
    Let’s take acloser look..
  • 23.
    Nested Queries &Projections
  • 24.
  • 25.
    Fast Update … Remains with strong consistency!
  • 26.
  • 27.
    The Performance ofRAM at a Cost/Capacity Closer to Disk Provides 2x – 3.6x Better TPS/$ 1:50 More Capacity ZetaScale-GigaSpaces on SSDs Stock GigaSpaces in DRAM 62 - 1KB object size and uniform distribution - 2 sockets 2.8GHz CPU with total 24 cores, CentOS 5.8, 2 FusionIO SLC PCIe cards RAID - YCSB measurements performed by SanDisk 121 17 56 160 140 120 100 80 60 40 20 0 No Read / 100% Write 100 % Read / No Write FDF-GigaSpaces on SSDs Stock GigaSpaces in DRAM Assumptions: 1TB Flash = $2K; 1TB RAM = $20K ZetaScale-GigaSpaces 1200 1000 800 600 400 200 ZetaScale™ – XAP MemoryXtend 1:50 20 1000 0 Capacity XAP XAP Extend 242k Read/Sec
  • 28.
    Data is Movingto Cloud Source: Managing Storage: Trends, Challenges, and Options (2013-2014). (EMC, 2013)
  • 29.
    Orchestration needs tobe integrated into DataBase solution to make it Cloud Ready
  • 31.
    Click on therelevant box to get the demo Many API’s Same Data Demo References Data Bus (Integration with Storm) Built In Orchestration
  • 32.
  • 33.
    Nati Shalom Checkout the slide on http://www.slideshare.net/giganati

Editor's Notes

  • #5  Some of the emerging NewSQL and NoSQL disk-based databases might have had the ability to deal with the more demanding data volume and variety but… But disk-based databases have always been I/O bound – in other words, keeping up with the new velocity demands of data is much harder. Disks have always gotten in the way of database velocity or throughput. The closer to real-time that transaction throughput or analytics must be, the harder it is for disk-based approaches to keep up.
  • #11 It constructs a processing graph that feeds data from an input source through processing nodes. The processing graph is called a "topology". The input data sources are called "spouts", and the processing nodes are called "bolts". The data model consists of tuples. Tuples flow from Spouts to the bolts, which execute user code.
  • #13 http://www.zdnet.com/storage-in-2014-an-overview-7000024712/
  • #16 http://blogs.technet.com/b/dataplatforminsider/archive/2013/05/01/leveraging-flash-across-the-microsoft-sql-server-stack.aspx
  • #29 http://www.zdnet.com/storage-in-2014-an-overview-7000024712/