Complex Analytics with NoSQL Data Store in Real Time
Upcoming SlideShare
Loading in...5
×
 

Complex Analytics with NoSQL Data Store in Real Time

on

  • 562 views

NOSQL are often limited in the type of queries that they can support due to the distributed nature of the data. In this session we would learn patterns on how we can overcome this limitation and ...

NOSQL are often limited in the type of queries that they can support due to the distributed nature of the data. In this session we would learn patterns on how we can overcome this limitation and combine multiple query semantics with NoSQL based engines.
We will demonstrate specifically a combination of key/value, SQL like, Document model and Graph based queries as well as more advanced topic such as handling partial update and query through projection. We will also demonstrate how we can create a meshaup between those API's i.e. write fast through Key/Value API and execute complex queries on that same data through SQL query.

- See more at: http://nosql2014.dataversity.net/sessionPop.cfm?confid=81&proposalid=6335#sthash.PNSZi5TJ.dpuf

Statistics

Views

Total Views
562
Views on SlideShare
515
Embed Views
47

Actions

Likes
3
Downloads
7
Comments
0

3 Embeds 47

https://twitter.com 43
http://www.slideee.com 3
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • <br /> Some of the emerging NewSQL and NoSQL disk-based databases might have had the ability to deal with the more demanding data volume and variety but… <br /> But disk-based databases have always been I/O bound – in other words, keeping up with the new velocity demands of data is much harder. Disks have always gotten in the way of database velocity or throughput. The closer to real-time that transaction throughput or analytics must be, the harder it is for disk-based approaches to keep up. <br />
  • It constructs a processing graph that feeds data from an input source through processing nodes. <br /> The processing graph is called a "topology". <br /> The input data sources are called "spouts", and the processing nodes are called "bolts". <br /> The data model consists of tuples. <br /> Tuples flow from Spouts to the bolts, which execute user code. <br />
  • http://www.zdnet.com/storage-in-2014-an-overview-7000024712/
  • http://blogs.technet.com/b/dataplatforminsider/archive/2013/05/01/leveraging-flash-across-the-microsoft-sql-server-stack.aspx
  • http://www.zdnet.com/storage-in-2014-an-overview-7000024712/

Complex Analytics with NoSQL Data Store in Real Time Complex Analytics with NoSQL Data Store in Real Time Presentation Transcript

  • Complex Analytics with NoSQL Data Store in Real Time Nested Queries, Projection, Transactions and more Nati Shalom @natishalom slideshare.net/giganati
  • What were here to discuss? Making Sense of the Exploding Data World How that World Could Look Like if Disk is no Longer the Bottleneck Live Demo
  • Making Sense of The Exploding Data World View slide
  • Capacity and Performance Drives New Data Management Technologies PB TB GB Data Volume Data Mining Machine Learning Data Business Intelligence Warehouse High Throughput OLTP Yr Mo Day Hr Min Sec MS μS Data Velocity Operational Intelligence Exploratory Analytics OLTP Streaming View slide
  • Let’s Look at Tradeoffs of Some Selected Solutions
  • SQL Queries • Query: SQL • Semantics: • CRUD • Aggregation • Projection • Partial update • Performance: 100’s/Sec • Consistency: Transactional • Scaling: Mostly Scale-UP • Availability: Disk Based
  • NoSQL • Query: Proprietary but rich • Semantics: • CRUD • Limited Aggregation (Map/Reduce) • No Projection • No Partial update • Performance: 1000s/Sec • Consistency: Eventual • Scaling: Mostly Scale-Out • Availability: Based on replication
  • IMDG • Query: Propriety but rich • Semantics: • CRUD • Aggregation API + Map/Reduce • Projection (GigaSpaces) • Partial Update (GigaSpaces) • Performance: 100k/sec • Consistency: Transactional • Scaling: Mostly Scale-Out • Availability: Replication
  • Key/Value • Query: Key, Value • Semantics: • Mostly Read • No Aggregation • No Projection • No Partial update • Performance: 1M’s/sec • Consistency: Atomic • Scaling: Mostly Scale-Out • Availability: Limited (varies quite substantially between implementations)
  • Stream Processing (Storm) • Semantics – Event driven data processing • Used for continues updates Spouts – No need for a costly “SELECT FOR UPDATE” • Performance: 10’sM/sec updates Bolt
  • Common Assumption Disk is the bottleneck 100X 10,000X HDD Latency (Seek & Rotate) = Little Improvement 2010 Performance^10 2000 2020 Source: GigaOM Research
  • Capacity and Performance Drives New Data Management Technologies (Source: IDC, 2013) Big Data (Hadoop) NoSQL In Memory, Stream Processing RDBMS
  • There’s No One Size Fits All
  • A Typical App Looks Like This.. Front End Analytics RT STORM Batch The Data Flow Complexity
  • What if Disk Was no Longer the Bottleneck? FLASH Closes the CPU to Storage Gap
  • Our Application Cloud Look Like This.. Front End High Speed Data Store (Using Flash/NVM) Key/Value SQL Document Graph Map/Reduce Transactional Disk Becomes the new Tape StreamBase Common Data Store serving Multiple Semantics/API
  • We're not there yet .. But..
  • We can use High Speed Data Bus for Integrating All of our Data Sources Front End Analytics RT STORM Batch High Speed Data Bus (Built-In Caching) RT Transactional Data Access Direct Access RT Streaming Hadoop Synch MySQL Synch Mongo Synch
  • High Speed Data Bus (Zoom In)
  • Designed for Transactional and Analytics Scenarios.. Homeland Security Real Time Search Social eCommerce User Tracking & Engagement Financial Services
  • Many API’s – Same Data Key/Value SQL Document Graph Map/Reduce Transactional
  • Let’s take a closer look..
  • Nested Queries & Projections
  • Aggregations.
  • Fast Update … Remains with strong consistency!
  • Transactions support
  • The Performance of RAM at a Cost/Capacity Closer to Disk Provides 2x – 3.6x Better TPS/$ 1:50 More Capacity ZetaScale-GigaSpaces on SSDs Stock GigaSpaces in DRAM 62 - 1KB object size and uniform distribution - 2 sockets 2.8GHz CPU with total 24 cores, CentOS 5.8, 2 FusionIO SLC PCIe cards RAID - YCSB measurements performed by SanDisk 121 17 56 160 140 120 100 80 60 40 20 0 No Read / 100% Write 100 % Read / No Write FDF-GigaSpaces on SSDs Stock GigaSpaces in DRAM Assumptions: 1TB Flash = $2K; 1TB RAM = $20K ZetaScale-GigaSpaces 1200 1000 800 600 400 200 ZetaScale™ – XAP MemoryXtend 1:50 20 1000 0 Capacity XAP XAP Extend 242k Read/Sec
  • Data is Moving to Cloud Source: Managing Storage: Trends, Challenges, and Options (2013-2014). (EMC, 2013)
  • Orchestration needs to be integrated into DataBase solution to make it Cloud Ready
  • Click on the relevant box to get the demo Many API’s Same Data Demo References Data Bus (Integration with Storm) Built In Orchestration
  • Summary
  • Nati Shalom Check out the slide on http://www.slideshare.net/giganati