SnappyData
A Unified Cluster for Streaming, Transactions,& Interactive Analytics
© Snappydata Inc 2017
www.Snappydata.io
Jags Ramnarayan
CTO, CoFounder
Our Pedigree
2
GTD Ventures Team : ~ 30
- Founded GemFire (In-memory data grid)
- Pivotal Spin out
- 25+ VMWare, Pivotal Database Engineers
Investors 
Mixed Workloads Are Everywhere
3
Stream
Processing
Transaction Interactive
Analytics
Analyticsonmutatingdata
Correlatingand joiningstreams
with large histories
Maintainingstateorcounters
whileingestingstreams
Telco Use Case : Location based Services, network optimization
4
Revenue Generation
Real-time Location based
Mobile Advertising (B2B2C)
Location Based Services (B2C,
B2B, B2B2C)
Revenue Protection
Customer experience
management to reduce churn
Customers Sentiment analysis
Network Efficiency
Network bandwidth optimisation
Network signalling maximisation
• Network optimization
– E.g. re-reroute call to another cell tower if congestion detected
• Location based Ads
– Match incoming event to Subscriber profile; If ‘Opt-in’ show location sensitive Ad
• Challenge: Streaming analytics, interactive real-time dashboards
● Simple rules - (CallDroppedCount > threshold) then alert
● Or, Complex (OLAP like query)
● TopK, Trending, Join with reference data, correlate with history
Stream processor today
Need: Stream Analytics
Stream pipeline
5
Perpetually
Growing
…
Maintain
summaries
Reference data
-- External systems
Time Window
Aggregated
Summary
Ref Tables
Stream
Process
Stream Sink
TableIOT
Devices
….
Cell
• IoT sensors,
• Call Data Record(CDR),
• AdImpressions..
Number of Processing Challenges
6
Perpetually
Growing
…
Maintain
summaries
Reference data
-- External systems
Time Window
Aggregated
Summary
Stream Sink
Table
Ref Tables
Stream
Process
JOIN – streams,
large table,
reference data
- Window could be
large (an hour)
sliding every 2
seconds
- Fast Writes
- Updates
- Exactly once
semantics
- HA
JOIN
Number of Processing Challenges
7
Time Window
Aggregated
Summary
Stream Sink
Ref Tables
Stream
Process
INTERACTIVE
QUERIES (ad-hoc)
- High concurrency
- Point lookup,
scan/aggregations
Why Supporting Mixed Workloads is Difficult?
Data Structures
Query Processing
Paradigm
Scheduling &
Provisioning
Columnar
Batch
Processing
Long-running
Row stores
Point
Lookups
Short-lived
Sketches
Delta /
Incremental
Bursty
Lambda Architecture
9
Query
New
Data
Batch
layer
Master
Datasheet
2
Serving layer
Batch
view
3
Batch
view
Speed
layer
4
Real-time
View
Real-time
View
1
Query
5
Storm,
Spark
Streaming,
Samza…
HDFS,
Hbase
SQL On
Hadoop,
MPP DB
Lambda Architecture is Complex
10
• Complexity
• Learn and master multiple products,
data models, disparate APIs & configs
• Wasted resources
• Slower
• Excessive copying, serialization,
shuffles
• Impossible to achieve interactive-speed
analytics on large or mutating data
• Cannot update
• Repeated for each
User/App
APP1
Spark
Master
Spark Executor (Worker)
Framework for
streaming
SQL, ML…
Immutable
CACHE
APP2
Spark
Master
Spark Executor (Worker)
Framework for
streaming
SQL, ML…
Immutable
CACHE
SQL
NoSQL
Bottleneck
Spark Streaming with NoSQL for State
11
1. Pushdown filter to NoSQL partition
2. Serialize, de-serialize to Spark executor
3. Multiple copies of large data sets
4. Lose optimization - vectorization
Interactive, Continuous queries TOO SLOW
12
Can We
Simplify &
Optimize?
13
Our
Solution
SnappyData
A SingleUnifiedCluster:
OLTP+OLAP+ Streamingforreal-timeanalytics
Our Solution – FUSE Spark with In-memory database
14
Deep Scale,
High Volume
MPP DB
Real-time design
Low latency, HA,
concurrencyBatch design, high
throughput, Rich API,
Eco-system
Maturedover 13 years
Single Unified HA Cluster
OLTP+ OLAP + StreamingforReal-timeAnalytics
• Cannot update
• Repeated for each
User/App
USER 1/APP1
Spark
Master
Spark Executor (Worker)
Framework for
streaming
SQL, ML…
Immutable
CACHE
USER 2/APP2
Spark
Master
Spark Executor (Worker)
Framework for
streaming
SQL, ML…
Immutable
CACHE
HDFS
SQL
NoSQL
Bottleneck
We Transform Spark from a computational engine …
15
… Into an “Always-On” Hybrid Database !
16
Deep Scale,
High Volume
MPP DB
HDFS
SQL
NoSQL
HISTORY
Spark Executor (Worker)JVM
-Long running
Spark Runtime
Stream
process,
SQL,
ML…
Spark
Driver
In-Memory
ROW + COLUMN
In-memory
Indexes
Store
- Mutable,
- TransactionalSpark
Cluster
JDBC
ODBC
SparkJob
Shared Nothing
Persistence
… Into an “Always-On” Hybrid Database !
17
Spark API
(Streaming, ML, Graph)
Transactions
, Indexing
Full SQL HA
DataFrame,
RDD, DataSets
RowsColumnar
IN-MEMORY
Spark Cache
Synopses
(Samples)
Unified Data Access
(Virtual Tables)
Unified CatalogNative Store
SNAPPYDATA
HDFS/HBAS
E
S3
JSON, CSV,
XML
SQL db Cassandra MPP DB
Stream
sources
Spark Jobs, Scala/Java/Python/R API, JDBC/ODBC, Object API (RDD, DataSets)
Overview
18
Cluster Manager
& Scheduler
Snappy Data Server (SparkExecutor+ Store)
Parser
OLAP
TRX
Data Synopsis Engine
Distributed Membership
Service
H
A
Stream Processing
Data Frame
RDD
Low
Latency
High
Latency
HYBRID Store
Probabilistic Rows Columns
Index
Query
Optimizer
Add / Remove
Server
Tables ODBC/JDBC
Unified Data Model & API
19
Cluster Manager
& Scheduler
Snappy Data Server (SparkExecutor+ Store)
Parser
OLAP
TRX
Data Synopsis Engine
Distributed Membership
Service
H
A
Stream Processing
Tables ODBC/JDBCData Frame
RDD
Low
Latency
High
Latency
HYBRID Store
Probabilistic Rows Columns
Index
Query
Optimizer
Add / Remove
Server
• Mutability(DML+Trx)
• Indexing
• SQL-basedstreaming
Hybrid Store
20
Unbounded
Streams Ingestion
Real time
Sampling
Transactional
State Update
Probabilistic
IndexRows
Row
Buffer
Columns
Random Writes
( Reference data )
OLAP
Stream Analytics
Row table
Column table
Sample table
Simple API and Spark Compatible
21
// Use the Spark DATA SOURCE API
val snappy = new org.apache.spark.sql.SnappySession(spark.sparkContext)
snappy.createTable(“TableName”, “Row | Column | Sample”, schema, options )
someDataFrame.write.insertInto(“TableName”)
// Update, Delete, Put using the SnappySession
snappy.update(“tableName”, filterExpr, Row(<newColumnValues>), updatedColumns )
snappy.delete(“tableName”, filterExpr)
// Or, just Use Spark SQL syntax ..
Snappy.sql(“select x from tableName”).count
// Or, JDBC, ODBC to access like a regular Database
jdbcStatement.executeUpdate(“insert into tableName values …”)
Extends Spark SQL
22
CREATE [Temporary] TABLE [IF NOT EXISTS] table_name
(
<column definition>
) USING ‘ROW | COLUMN ’
OPTIONS (
COLOCATE_WITH 'table_name',
PARTITION_BY 'PRIMARY KEY | column name', // Replicated table, by default
REDUNDANCY '1' , // Manage HA
PERSISTENT "DISKSTORE_NAME ASYNCHRONOUS | SYNCHRONOUS",
// Default – only in-memory
OFFHEAP "true | false"
EVICTION_BY "MEMSIZE 200 | COUNT 200 | HEAPPERCENT",
…..
[AS select_statement];
Stream SQL DDL and Continuous queries (based on Spark Streaming)
23
Consume from stream
Transform raw data
Continuous Analytics
Ingest into in-memory Store
Overflow table to HDFS
Create stream table AdImpressionLog
(<Columns>) using directkafka_stream options (
<socket endpoints>
"topics 'adnetwork-topic’ “,
"rowConverter ’ AdImpressionLogAvroDecoder’ )
streamingContext.registerCQ(
"select publisher, geo, avg(bid) as avg_bid, count(*) imps,
count(distinct(cookie)) uniques from AdImpressionLog
window (duration '2' seconds, slide '2' seconds)
where geo != 'unknown' group by publisher, geo”)//
Register CQ
.foreachDataFrame(df => {
df.write.format("column").mode(SaveMode.Append)
.saveAsTable("adImpressions")
Updates & Deletes on Column Tables
24
Column Segment ( t1-t2)
Column Segment ( t2-t3)
0
1
0
0
0
0
1
1
0
K11
K12
.
.
.
.
.
C11
C12
.
.
.
.
.
C21
C22
.
.
.
.
.
Summary Metadata
PeriodicCompaction
One Partition
Time
WRITE
Row Buffer
MVCC
New Segment
Replicate
for HA
Can we use Statistical techniques to shrink data?
25
• Most apps happy to tradeoff 1% accuracy for
200x speedup!
• Can usually get a 99.9% accurate answer by only
looking at a tiny fraction of data!
• Often can make perfectly accurate decisions
with imperfect answers!
• A/B Testing, visualization, ...
• The data itself is usually noisy
• Processing entire data doesn’t necessarily mean exact
answers!
`
Probabilistic Store: Sketches + Uniform & Stratified Samples
Higher resolution for more recent
time ranges
1. Streaming CMS(Count-Min-Sketch)
[t1, t2) [t2, t3) [t3, t4) [t4, now) Time
4T 2T T ≤T
....
Maintain a small sample at each CMS cell
2. Top-K Queries w/ArbitraryFilters
Traditional CMS CMS+Samples
3. Fully Distributed Stratified Samples
Always include timestamp as a stratified column
for streams
Streams
AgingRow Store (In-memory) Column Store (Disk)
timestamp
Overview
27
Cluster Manager
& Scheduler
Snappy Data Server (SparkExecutor+ Store)
Parser
OLAP
Transact
Data Synopsis Engine
Distributed Membership
Service
H
A
Stream Processing
Data Frame
RDD
Low
Latency
High
Latency
HYBRID Store
Probabilistic Rows Columns
Index
Query
Optimizer
Add / Remove
Server
Tables ODBC/JDBC
Supporting Real-time & HA
28
Locator
Lead Node
Executor JVM (Server)
Shared Nothing
Persistence
JDBC/
ODBC
Catalog Service
Managed Driver
SPARK
Contacts
SPARK
Context
SNAPPY
Cluster
Manager
REST
SPARK JOBS
SPARK
Program
Memory Mgmt
BLOCKS SNAPPY STORE
Stream SNAPPY
Tables
Tables
DataFrame
• Spark Executors are
long running. Driver
failure doesn’t shutdown
Executors
• Driver HA – Drivers are
“Managed” by
SnappyData with standby
secondary
• Data HA – Consensus
based clustering
integrated for eager
replication
DataFrame
Peer-2-peer
Overview
29
Cluster Manager
& Scheduler
Snappy Data Server (SparkExecutor+ Store)
Parser
OLAP
Transact
Data Synopsis Engine
Distributed Membership
Service
H
A
Stream Processing
Data Frame
RDD
Low
Latency
High
Latency
HYBRID Store
Probabilistic Rows Columns
Index
Query
Optimizer
Add / Remove
Server
Tables ODBC/JDBC
Transactions
30
• Support for Read Committed & Repeatable Read
• W-W and R-W conflict detection at write time
• MVCC for non blocking reads and snapshot isolation
• Distributed system failure detection integrated with commit protocol
- Evict unresponsive replicas
- Ensure consistency when replicas recover
Overview
31
Cluster Manager
& Scheduler
Snappy Data Server (SparkExecutor+ Store)
Parser
OLAP
Transact
Data Synopsis Engine
Distributed Membership
Service
H
A
Stream Processing
Data Frame
RDD
Low
Latency
High
Latency
HYBRID Store
Probabilistic Rows Columns
Index
Query
Optimizer
Add / Remove
Server
Tables ODBC/JDBC
Query Optimization
• Bypass the scheduler for transactions and low-latency jobs
• Minimize shuffles aggressively
- Dynamic replication for reference data
- Retain ‘join indexes’ whenever possible
- Collocate related data sets
• Optimized ‘Hash Join’, ‘Scan’, ‘GroupBy’ compared to Spark
- Uses more variables in code generation, vectorized structures
• Column segment pruning through statistics
Co-partitioning & Co-location
33
Spark Executor Subscriber A-M
Ref data
Spark Executor Subscriber N-Z
Ref data
Linearlyscalewithpartitionpruning
Subscriber A-M
Subscriber N-Z
KAFKA
Queue
KAFKA
Queue
Overview
34
Cluster Manager
& Scheduler
Snappy Data Server (SparkExecutor+ Store)
Parser
OLAP
Transact
Data Synopsis Engine
Distributed Membership
Service
H
A
Stream Processing
Data Frame
RDD
Low
Latency
High
Latency
HYBRID Store
Probabilistic Rows Columns
Index
Query
Optimizer
Add / Remove
Server
Tables ODBC/JDBC
Approximate Query Processing (AQP): Academia vs. Industry
35
25+yrsofsuccessful research in academia User-facing AQP almost
non-existent in commercial world!
Some approximate features in Infobright, Yahoo’s
Druid, Facebook’s Presto, Oracle 12C, ...
AQUA, Online Aggregation, MapReduce
Online, STRAT, ABS, BlinkDB / G-OLA,
...
WHY ?
BUT:
select geo, avg(bid)
from adImpressions
group by geo having
avg(bid)>10
with error 0.05 at confidence 95
geo avg(bid) error prob_existence
MI 21.5 ± 0.4 0.99
CA 18.3 ± 5.1 0.80
MA 15.6 ± 2.4 0.81
... ... ... ....
1. Incompatible w/ BItools
2. Complex semantics
3. Bad sales pitch!
A First Industrial-Grade AQP Engine
1. Highlevel Accuracy Contract (HAC)
• Concurrency: 10’s of queries in shared clusters
• Resource usage: everyone hates their AWS bill
• Network shuffles
• Immediate results while waiting for final results
2. Fully compatible w/BItools
• Set HAC behavior at JDBC/ODBC connection level
3. Better marketing!
• User picks a single number p, where 0≤p≤1 (by
default p=0.95)
• Snappy guarantees that s/he only sees things that
are at least p% accurate
• Snappy handles (and hides) everything else!
geo avg(bid)
MI 21.5
WI 42.3
NY 65.6
... ...
iSight (Immediate
inSight)
Conclusion
Unified OLAP/OLTP streaming w/ Spark
● Far fewer resources: TB problem becomes GB.
○ CPU contention drops
● Far less complex
○ single cluster for stream ingestion, continuous queries, interactive
queries and machine learning
● Much faster
○ compressed data managed in distributed memory in columnar
form reduces volume and is much more responsive
Lessons Learned
2. A unified cluster is simpler, cheaper, andfaster
- By sharing state across apps, we decouple apps from data servers and provide HA
- Save memory, data copying, serialization, and shuffles
- Co-partitioning and co-location for faster joins and stream analytics
3. Advantages over HTAP engines: Deep stream integration +AQP
1. A unique experience marryingstwo different breeds ofdistributed systems
lineage-based for high-throughput vs. (consensus-) replication-based for low-latency
- Stream processing ≠ stream analytics
- Top-k w/ almost arbitrary predicates + 1-pass stratified sampling over streams
4. Commercializing academic workis lots ofwork but alsolots offun
THANK YOU !
Try our iSight cloud for free:
http://snappydata.io/iSight
iSight: Immediate inSight
iSight’s immediate
answer to the
query: 1.7 secs
Final answer to the
query: 42.7 secs
25x speedup!
Our Solution:
Highlevel Accuracy Contract (HAC)
• A single number 0≤p≤1 (by default p=0.95)
• We guarantee that you only see things that
are at least p% accurate
• We handle (and hide) everything else
– Choose a behavior: REPLACE WITH SPECIAL
SYMBOL (default), DO NOTHING, DROP THE ROW)

SnappyData, the Spark Database. A unified cluster for streaming, transactions & interactive analytics

  • 1.
    SnappyData A Unified Clusterfor Streaming, Transactions,& Interactive Analytics © Snappydata Inc 2017 www.Snappydata.io Jags Ramnarayan CTO, CoFounder
  • 2.
    Our Pedigree 2 GTD VenturesTeam : ~ 30 - Founded GemFire (In-memory data grid) - Pivotal Spin out - 25+ VMWare, Pivotal Database Engineers Investors 
  • 3.
    Mixed Workloads AreEverywhere 3 Stream Processing Transaction Interactive Analytics Analyticsonmutatingdata Correlatingand joiningstreams with large histories Maintainingstateorcounters whileingestingstreams
  • 4.
    Telco Use Case: Location based Services, network optimization 4 Revenue Generation Real-time Location based Mobile Advertising (B2B2C) Location Based Services (B2C, B2B, B2B2C) Revenue Protection Customer experience management to reduce churn Customers Sentiment analysis Network Efficiency Network bandwidth optimisation Network signalling maximisation • Network optimization – E.g. re-reroute call to another cell tower if congestion detected • Location based Ads – Match incoming event to Subscriber profile; If ‘Opt-in’ show location sensitive Ad • Challenge: Streaming analytics, interactive real-time dashboards ● Simple rules - (CallDroppedCount > threshold) then alert ● Or, Complex (OLAP like query) ● TopK, Trending, Join with reference data, correlate with history Stream processor today Need: Stream Analytics
  • 5.
    Stream pipeline 5 Perpetually Growing … Maintain summaries Reference data --External systems Time Window Aggregated Summary Ref Tables Stream Process Stream Sink TableIOT Devices …. Cell • IoT sensors, • Call Data Record(CDR), • AdImpressions..
  • 6.
    Number of ProcessingChallenges 6 Perpetually Growing … Maintain summaries Reference data -- External systems Time Window Aggregated Summary Stream Sink Table Ref Tables Stream Process JOIN – streams, large table, reference data - Window could be large (an hour) sliding every 2 seconds - Fast Writes - Updates - Exactly once semantics - HA JOIN
  • 7.
    Number of ProcessingChallenges 7 Time Window Aggregated Summary Stream Sink Ref Tables Stream Process INTERACTIVE QUERIES (ad-hoc) - High concurrency - Point lookup, scan/aggregations
  • 8.
    Why Supporting MixedWorkloads is Difficult? Data Structures Query Processing Paradigm Scheduling & Provisioning Columnar Batch Processing Long-running Row stores Point Lookups Short-lived Sketches Delta / Incremental Bursty
  • 9.
  • 10.
    Lambda Architecture isComplex 10 • Complexity • Learn and master multiple products, data models, disparate APIs & configs • Wasted resources • Slower • Excessive copying, serialization, shuffles • Impossible to achieve interactive-speed analytics on large or mutating data
  • 11.
    • Cannot update •Repeated for each User/App APP1 Spark Master Spark Executor (Worker) Framework for streaming SQL, ML… Immutable CACHE APP2 Spark Master Spark Executor (Worker) Framework for streaming SQL, ML… Immutable CACHE SQL NoSQL Bottleneck Spark Streaming with NoSQL for State 11 1. Pushdown filter to NoSQL partition 2. Serialize, de-serialize to Spark executor 3. Multiple copies of large data sets 4. Lose optimization - vectorization Interactive, Continuous queries TOO SLOW
  • 12.
  • 13.
  • 14.
    Our Solution –FUSE Spark with In-memory database 14 Deep Scale, High Volume MPP DB Real-time design Low latency, HA, concurrencyBatch design, high throughput, Rich API, Eco-system Maturedover 13 years Single Unified HA Cluster OLTP+ OLAP + StreamingforReal-timeAnalytics
  • 15.
    • Cannot update •Repeated for each User/App USER 1/APP1 Spark Master Spark Executor (Worker) Framework for streaming SQL, ML… Immutable CACHE USER 2/APP2 Spark Master Spark Executor (Worker) Framework for streaming SQL, ML… Immutable CACHE HDFS SQL NoSQL Bottleneck We Transform Spark from a computational engine … 15
  • 16.
    … Into an“Always-On” Hybrid Database ! 16 Deep Scale, High Volume MPP DB HDFS SQL NoSQL HISTORY Spark Executor (Worker)JVM -Long running Spark Runtime Stream process, SQL, ML… Spark Driver In-Memory ROW + COLUMN In-memory Indexes Store - Mutable, - TransactionalSpark Cluster JDBC ODBC SparkJob Shared Nothing Persistence
  • 17.
    … Into an“Always-On” Hybrid Database ! 17 Spark API (Streaming, ML, Graph) Transactions , Indexing Full SQL HA DataFrame, RDD, DataSets RowsColumnar IN-MEMORY Spark Cache Synopses (Samples) Unified Data Access (Virtual Tables) Unified CatalogNative Store SNAPPYDATA HDFS/HBAS E S3 JSON, CSV, XML SQL db Cassandra MPP DB Stream sources Spark Jobs, Scala/Java/Python/R API, JDBC/ODBC, Object API (RDD, DataSets)
  • 18.
    Overview 18 Cluster Manager & Scheduler SnappyData Server (SparkExecutor+ Store) Parser OLAP TRX Data Synopsis Engine Distributed Membership Service H A Stream Processing Data Frame RDD Low Latency High Latency HYBRID Store Probabilistic Rows Columns Index Query Optimizer Add / Remove Server Tables ODBC/JDBC
  • 19.
    Unified Data Model& API 19 Cluster Manager & Scheduler Snappy Data Server (SparkExecutor+ Store) Parser OLAP TRX Data Synopsis Engine Distributed Membership Service H A Stream Processing Tables ODBC/JDBCData Frame RDD Low Latency High Latency HYBRID Store Probabilistic Rows Columns Index Query Optimizer Add / Remove Server • Mutability(DML+Trx) • Indexing • SQL-basedstreaming
  • 20.
    Hybrid Store 20 Unbounded Streams Ingestion Realtime Sampling Transactional State Update Probabilistic IndexRows Row Buffer Columns Random Writes ( Reference data ) OLAP Stream Analytics Row table Column table Sample table
  • 21.
    Simple API andSpark Compatible 21 // Use the Spark DATA SOURCE API val snappy = new org.apache.spark.sql.SnappySession(spark.sparkContext) snappy.createTable(“TableName”, “Row | Column | Sample”, schema, options ) someDataFrame.write.insertInto(“TableName”) // Update, Delete, Put using the SnappySession snappy.update(“tableName”, filterExpr, Row(<newColumnValues>), updatedColumns ) snappy.delete(“tableName”, filterExpr) // Or, just Use Spark SQL syntax .. Snappy.sql(“select x from tableName”).count // Or, JDBC, ODBC to access like a regular Database jdbcStatement.executeUpdate(“insert into tableName values …”)
  • 22.
    Extends Spark SQL 22 CREATE[Temporary] TABLE [IF NOT EXISTS] table_name ( <column definition> ) USING ‘ROW | COLUMN ’ OPTIONS ( COLOCATE_WITH 'table_name', PARTITION_BY 'PRIMARY KEY | column name', // Replicated table, by default REDUNDANCY '1' , // Manage HA PERSISTENT "DISKSTORE_NAME ASYNCHRONOUS | SYNCHRONOUS", // Default – only in-memory OFFHEAP "true | false" EVICTION_BY "MEMSIZE 200 | COUNT 200 | HEAPPERCENT", ….. [AS select_statement];
  • 23.
    Stream SQL DDLand Continuous queries (based on Spark Streaming) 23 Consume from stream Transform raw data Continuous Analytics Ingest into in-memory Store Overflow table to HDFS Create stream table AdImpressionLog (<Columns>) using directkafka_stream options ( <socket endpoints> "topics 'adnetwork-topic’ “, "rowConverter ’ AdImpressionLogAvroDecoder’ ) streamingContext.registerCQ( "select publisher, geo, avg(bid) as avg_bid, count(*) imps, count(distinct(cookie)) uniques from AdImpressionLog window (duration '2' seconds, slide '2' seconds) where geo != 'unknown' group by publisher, geo”)// Register CQ .foreachDataFrame(df => { df.write.format("column").mode(SaveMode.Append) .saveAsTable("adImpressions")
  • 24.
    Updates & Deleteson Column Tables 24 Column Segment ( t1-t2) Column Segment ( t2-t3) 0 1 0 0 0 0 1 1 0 K11 K12 . . . . . C11 C12 . . . . . C21 C22 . . . . . Summary Metadata PeriodicCompaction One Partition Time WRITE Row Buffer MVCC New Segment Replicate for HA
  • 25.
    Can we useStatistical techniques to shrink data? 25 • Most apps happy to tradeoff 1% accuracy for 200x speedup! • Can usually get a 99.9% accurate answer by only looking at a tiny fraction of data! • Often can make perfectly accurate decisions with imperfect answers! • A/B Testing, visualization, ... • The data itself is usually noisy • Processing entire data doesn’t necessarily mean exact answers!
  • 26.
    ` Probabilistic Store: Sketches+ Uniform & Stratified Samples Higher resolution for more recent time ranges 1. Streaming CMS(Count-Min-Sketch) [t1, t2) [t2, t3) [t3, t4) [t4, now) Time 4T 2T T ≤T .... Maintain a small sample at each CMS cell 2. Top-K Queries w/ArbitraryFilters Traditional CMS CMS+Samples 3. Fully Distributed Stratified Samples Always include timestamp as a stratified column for streams Streams AgingRow Store (In-memory) Column Store (Disk) timestamp
  • 27.
    Overview 27 Cluster Manager & Scheduler SnappyData Server (SparkExecutor+ Store) Parser OLAP Transact Data Synopsis Engine Distributed Membership Service H A Stream Processing Data Frame RDD Low Latency High Latency HYBRID Store Probabilistic Rows Columns Index Query Optimizer Add / Remove Server Tables ODBC/JDBC
  • 28.
    Supporting Real-time &HA 28 Locator Lead Node Executor JVM (Server) Shared Nothing Persistence JDBC/ ODBC Catalog Service Managed Driver SPARK Contacts SPARK Context SNAPPY Cluster Manager REST SPARK JOBS SPARK Program Memory Mgmt BLOCKS SNAPPY STORE Stream SNAPPY Tables Tables DataFrame • Spark Executors are long running. Driver failure doesn’t shutdown Executors • Driver HA – Drivers are “Managed” by SnappyData with standby secondary • Data HA – Consensus based clustering integrated for eager replication DataFrame Peer-2-peer
  • 29.
    Overview 29 Cluster Manager & Scheduler SnappyData Server (SparkExecutor+ Store) Parser OLAP Transact Data Synopsis Engine Distributed Membership Service H A Stream Processing Data Frame RDD Low Latency High Latency HYBRID Store Probabilistic Rows Columns Index Query Optimizer Add / Remove Server Tables ODBC/JDBC
  • 30.
    Transactions 30 • Support forRead Committed & Repeatable Read • W-W and R-W conflict detection at write time • MVCC for non blocking reads and snapshot isolation • Distributed system failure detection integrated with commit protocol - Evict unresponsive replicas - Ensure consistency when replicas recover
  • 31.
    Overview 31 Cluster Manager & Scheduler SnappyData Server (SparkExecutor+ Store) Parser OLAP Transact Data Synopsis Engine Distributed Membership Service H A Stream Processing Data Frame RDD Low Latency High Latency HYBRID Store Probabilistic Rows Columns Index Query Optimizer Add / Remove Server Tables ODBC/JDBC
  • 32.
    Query Optimization • Bypassthe scheduler for transactions and low-latency jobs • Minimize shuffles aggressively - Dynamic replication for reference data - Retain ‘join indexes’ whenever possible - Collocate related data sets • Optimized ‘Hash Join’, ‘Scan’, ‘GroupBy’ compared to Spark - Uses more variables in code generation, vectorized structures • Column segment pruning through statistics
  • 33.
    Co-partitioning & Co-location 33 SparkExecutor Subscriber A-M Ref data Spark Executor Subscriber N-Z Ref data Linearlyscalewithpartitionpruning Subscriber A-M Subscriber N-Z KAFKA Queue KAFKA Queue
  • 34.
    Overview 34 Cluster Manager & Scheduler SnappyData Server (SparkExecutor+ Store) Parser OLAP Transact Data Synopsis Engine Distributed Membership Service H A Stream Processing Data Frame RDD Low Latency High Latency HYBRID Store Probabilistic Rows Columns Index Query Optimizer Add / Remove Server Tables ODBC/JDBC
  • 35.
    Approximate Query Processing(AQP): Academia vs. Industry 35 25+yrsofsuccessful research in academia User-facing AQP almost non-existent in commercial world! Some approximate features in Infobright, Yahoo’s Druid, Facebook’s Presto, Oracle 12C, ... AQUA, Online Aggregation, MapReduce Online, STRAT, ABS, BlinkDB / G-OLA, ... WHY ? BUT: select geo, avg(bid) from adImpressions group by geo having avg(bid)>10 with error 0.05 at confidence 95 geo avg(bid) error prob_existence MI 21.5 ± 0.4 0.99 CA 18.3 ± 5.1 0.80 MA 15.6 ± 2.4 0.81 ... ... ... .... 1. Incompatible w/ BItools 2. Complex semantics 3. Bad sales pitch!
  • 36.
    A First Industrial-GradeAQP Engine 1. Highlevel Accuracy Contract (HAC) • Concurrency: 10’s of queries in shared clusters • Resource usage: everyone hates their AWS bill • Network shuffles • Immediate results while waiting for final results 2. Fully compatible w/BItools • Set HAC behavior at JDBC/ODBC connection level 3. Better marketing! • User picks a single number p, where 0≤p≤1 (by default p=0.95) • Snappy guarantees that s/he only sees things that are at least p% accurate • Snappy handles (and hides) everything else! geo avg(bid) MI 21.5 WI 42.3 NY 65.6 ... ... iSight (Immediate inSight)
  • 37.
  • 38.
    Unified OLAP/OLTP streamingw/ Spark ● Far fewer resources: TB problem becomes GB. ○ CPU contention drops ● Far less complex ○ single cluster for stream ingestion, continuous queries, interactive queries and machine learning ● Much faster ○ compressed data managed in distributed memory in columnar form reduces volume and is much more responsive
  • 39.
    Lessons Learned 2. Aunified cluster is simpler, cheaper, andfaster - By sharing state across apps, we decouple apps from data servers and provide HA - Save memory, data copying, serialization, and shuffles - Co-partitioning and co-location for faster joins and stream analytics 3. Advantages over HTAP engines: Deep stream integration +AQP 1. A unique experience marryingstwo different breeds ofdistributed systems lineage-based for high-throughput vs. (consensus-) replication-based for low-latency - Stream processing ≠ stream analytics - Top-k w/ almost arbitrary predicates + 1-pass stratified sampling over streams 4. Commercializing academic workis lots ofwork but alsolots offun
  • 40.
    THANK YOU ! Tryour iSight cloud for free: http://snappydata.io/iSight
  • 41.
    iSight: Immediate inSight iSight’simmediate answer to the query: 1.7 secs Final answer to the query: 42.7 secs 25x speedup!
  • 42.
    Our Solution: Highlevel AccuracyContract (HAC) • A single number 0≤p≤1 (by default p=0.95) • We guarantee that you only see things that are at least p% accurate • We handle (and hide) everything else – Choose a behavior: REPLACE WITH SPECIAL SYMBOL (default), DO NOTHING, DROP THE ROW)

Editor's Notes

  • #5 Many subscribers, lots of 2G/3G/4G voice/data Network events: location events, CDRs, network issues
  • #6 You have stream that represents the current window, a large table that is perpetually growing, maybe the app is reducing data through continuous aggregations(apriori known summaries) and many smaller reference data tables. The ref data is sourced from other systems like say CRM and are likely to be changing in real time as well.
  • #7 Challenges: - Stream analytics: Joins between streams, history, reference data for correlations, trends, pattern matching. - Very fast writes, including updates. Potentially transactional semantics. Of course, you need exactly-once guarantees. - Interactive queries: these tend to be ad-hoc in nature. Could be point lookups or aggregation/scan oriented queries. Only some can be satisfied using pre-fabricated aggregation tables. OLAP cube maintenance in streaming world is very difficult. - Many users could be interacting. So, concurrency is important.
  • #8 Challenges: - Stream analytics: Joins between streams, history, reference data for correlations, trends, pattern matching. - Very fast writes, including updates. Potentially transactional semantics. Of course, you need exactly-once guarantees. - Interactive queries: these tend to be ad-hoc in nature. Could be point lookups or aggregation/scan oriented queries. Only some can be satisfied using pre-fabricated aggregation tables. OLAP cube maintenance in streaming world is very difficult. - Many users could be interacting. So, concurrency is important.
  • #9 when mentioning sketches say “infinite streams”
  • #10 Many purpose built products To derive meaningful insight Apps have to stitch products
  • #15 According to the latest report, 75% of enterprises believe real time ranges from important to critical for business success Why they need realtime: 52% said for delivering real-time dashboards and improving customer experience Define real-time: 81% of enterprises define real time as either second or sub-second speeds
  • #17 Snappy store and Spark Executor share the same process space and JVM memory Reference based access – zero copy
  • #18 Snappy store and Spark Executor share the same process space and JVM memory Reference based access – zero copy
  • #20 Contributions: sharing state across many clients and applications to minimizes (de- )serialization §5.4, providing high-availability through low-latency failure detection and decoupling applications from data servers §6.1, bypassing the scheduler to interleave fine-grained operations with long-running jobs §6.3, ensuring transactional consistency §6.4. 
 share the same process space and memory pool between Spark and Geode §5.3, co-partition tables and streams to minimize data shuffling §5.5. 
 For unified API say: Spark’s DataFrame API allows for: ML, graph, batch & streaming, SQL (selects) SnappyData adds full SQL support and extends DataFrame and DataSource APIs for: Mutability semantics (DML & transactions) Indexing SQL-based streaming
  • #37 No changes to the BI tool; no need to be a statistician; and there’s still an API to ask all sorts of detailed error information if you’re an advanved user