SnappyData, the Spark Database. A unified cluster for streaming, transactions & interactive analytics

SnappyData
A Unified Cluster for Streaming, Transactions,& Interactive Analytics
© Snappydata Inc 2017
www.Snappydata.io
Jags Ramnarayan
CTO, CoFounder

Our Pedigree
2
GTD Ventures Team : ~ 30
- Founded GemFire (In-memory data grid)
- Pivotal Spin out
- 25+ VMWare, Pivotal Database Engineers
Investors 

Mixed Workloads Are Everywhere
3
Stream
Processing
Transaction Interactive
Analytics
Analyticsonmutatingdata
Correlatingand joiningstreams
with large histories
Maintainingstateorcounters
whileingestingstreams

Telco Use Case : Location based Services, network optimization
4
Revenue Generation
Real-time Location based
Mobile Advertising (B2B2C)
Location Based Services (B2C,
B2B, B2B2C)
Revenue Protection
Customer experience
management to reduce churn
Customers Sentiment analysis
Network Efficiency
Network bandwidth optimisation
Network signalling maximisation
• Network optimization
– E.g. re-reroute call to another cell tower if congestion detected
• Location based Ads
– Match incoming event to Subscriber profile; If ‘Opt-in’ show location sensitive Ad
• Challenge: Streaming analytics, interactive real-time dashboards
● Simple rules - (CallDroppedCount > threshold) then alert
● Or, Complex (OLAP like query)
● TopK, Trending, Join with reference data, correlate with history
Stream processor today
Need: Stream Analytics

Stream pipeline
5
Perpetually
Growing
…
Maintain
summaries
Reference data
-- External systems
Time Window
Aggregated
Summary
Ref Tables
Stream
Process
Stream Sink
TableIOT
Devices
….
Cell
• IoT sensors,
• Call Data Record(CDR),
• AdImpressions..

Number of Processing Challenges
6
Perpetually
Growing
…
Maintain
summaries
Reference data
-- External systems
Time Window
Aggregated
Summary
Stream Sink
Table
Ref Tables
Stream
Process
JOIN – streams,
large table,
reference data
- Window could be
large (an hour)
sliding every 2
seconds
- Fast Writes
- Updates
- Exactly once
semantics
- HA
JOIN

Number of Processing Challenges
7
Time Window
Aggregated
Summary
Stream Sink
Ref Tables
Stream
Process
INTERACTIVE
QUERIES (ad-hoc)
- High concurrency
- Point lookup,
scan/aggregations

Why Supporting Mixed Workloads is Difficult?
Data Structures
Query Processing
Paradigm
Scheduling &
Provisioning
Columnar
Batch
Processing
Long-running
Row stores
Point
Lookups
Short-lived
Sketches
Delta /
Incremental
Bursty

Lambda Architecture
9
Query
New
Data
Batch
layer
Master
Datasheet
2
Serving layer
Batch
view
3
Batch
view
Speed
layer
4
Real-time
View
Real-time
View
1
Query
5
Storm,
Spark
Streaming,
Samza…
HDFS,
Hbase
SQL On
Hadoop,
MPP DB

Lambda Architecture is Complex
10
• Complexity
• Learn and master multiple products,
data models, disparate APIs & configs
• Wasted resources
• Slower
• Excessive copying, serialization,
shuffles
• Impossible to achieve interactive-speed
analytics on large or mutating data

• Cannot update
• Repeated for each
User/App
APP1
Spark
Master
Spark Executor (Worker)
Framework for
streaming
SQL, ML…
Immutable
CACHE
APP2
Spark
Master
Framework for
streaming
SQL, ML…
Immutable
CACHE
SQL
NoSQL
Bottleneck
Spark Streaming with NoSQL for State
11
1. Pushdown filter to NoSQL partition
2. Serialize, de-serialize to Spark executor
3. Multiple copies of large data sets
4. Lose optimization - vectorization
Interactive, Continuous queries TOO SLOW

12
Can We
Simplify &
Optimize?

13
Our
Solution
SnappyData
A SingleUnifiedCluster:
OLTP+OLAP+ Streamingforreal-timeanalytics

Our Solution – FUSE Spark with In-memory database
14
Deep Scale,
High Volume
MPP DB
Real-time design
Low latency, HA,
concurrencyBatch design, high
throughput, Rich API,
Eco-system
Maturedover 13 years
Single Unified HA Cluster
OLTP+ OLAP + StreamingforReal-timeAnalytics

• Cannot update
• Repeated for each
User/App
USER 1/APP1
Spark
Master
Framework for
streaming
SQL, ML…
Immutable
CACHE
USER 2/APP2
Spark
Master
Framework for
streaming
SQL, ML…
Immutable
CACHE
HDFS
SQL
NoSQL
Bottleneck
We Transform Spark from a computational engine …
15

… Into an “Always-On” Hybrid Database !
16
Deep Scale,
High Volume
MPP DB
HDFS
SQL
NoSQL
HISTORY
Spark Executor (Worker)JVM
-Long running
Spark Runtime
Stream
process,
SQL,
ML…
Spark
Driver
In-Memory
ROW + COLUMN
In-memory
Indexes
Store
- Mutable,
- TransactionalSpark
Cluster
JDBC
ODBC
SparkJob
Shared Nothing
Persistence

… Into an “Always-On” Hybrid Database !
17
Spark API
(Streaming, ML, Graph)
Transactions
, Indexing
Full SQL HA
DataFrame,
RDD, DataSets
RowsColumnar
IN-MEMORY
Spark Cache
Synopses
(Samples)
Unified Data Access
(Virtual Tables)
Unified CatalogNative Store
SNAPPYDATA
HDFS/HBAS
E
S3
JSON, CSV,
XML
SQL db Cassandra MPP DB
Stream
sources
Spark Jobs, Scala/Java/Python/R API, JDBC/ODBC, Object API (RDD, DataSets)

Overview
18
Cluster Manager
& Scheduler
Snappy Data Server (SparkExecutor+ Store)
Parser
OLAP
TRX
Data Synopsis Engine
Distributed Membership
Service
H
A
Stream Processing
Data Frame
RDD
Low
Latency
High
Latency
HYBRID Store
Probabilistic Rows Columns
Index
Query
Optimizer
Add / Remove
Server
Tables ODBC/JDBC

Unified Data Model & API
19
Cluster Manager
& Scheduler
Parser
OLAP
TRX
Service
H
A
Stream Processing
Tables ODBC/JDBCData Frame
RDD
Low
Latency
High
Latency
HYBRID Store
Index
Query
Optimizer
Add / Remove
Server
• Mutability(DML+Trx)
• Indexing
• SQL-basedstreaming

Hybrid Store
20
Unbounded
Streams Ingestion
Real time
Sampling
Transactional
State Update
Probabilistic
IndexRows
Row
Buffer
Columns
Random Writes
( Reference data )
OLAP
Stream Analytics
Row table
Column table
Sample table

Simple API and Spark Compatible
21
// Use the Spark DATA SOURCE API
val snappy = new org.apache.spark.sql.SnappySession(spark.sparkContext)
snappy.createTable(“TableName”, “Row | Column | Sample”, schema, options )
someDataFrame.write.insertInto(“TableName”)
// Update, Delete, Put using the SnappySession
snappy.update(“tableName”, filterExpr, Row(<newColumnValues>), updatedColumns )
snappy.delete(“tableName”, filterExpr)
// Or, just Use Spark SQL syntax ..
Snappy.sql(“select x from tableName”).count
// Or, JDBC, ODBC to access like a regular Database
jdbcStatement.executeUpdate(“insert into tableName values …”)

Extends Spark SQL
22
CREATE [Temporary] TABLE [IF NOT EXISTS] table_name
(
<column definition>
) USING ‘ROW | COLUMN ’
OPTIONS (
COLOCATE_WITH 'table_name',
PARTITION_BY 'PRIMARY KEY | column name', // Replicated table, by default
REDUNDANCY '1' , // Manage HA
PERSISTENT "DISKSTORE_NAME ASYNCHRONOUS | SYNCHRONOUS",
// Default – only in-memory
OFFHEAP "true | false"
EVICTION_BY "MEMSIZE 200 | COUNT 200 | HEAPPERCENT",
…..
[AS select_statement];

Stream SQL DDL and Continuous queries (based on Spark Streaming)
23
Consume from stream
Transform raw data
Continuous Analytics
Ingest into in-memory Store
Overflow table to HDFS
Create stream table AdImpressionLog
(<Columns>) using directkafka_stream options (
<socket endpoints>
"topics 'adnetwork-topic’ “,
"rowConverter ’ AdImpressionLogAvroDecoder’ )
streamingContext.registerCQ(
"select publisher, geo, avg(bid) as avg_bid, count(*) imps,
count(distinct(cookie)) uniques from AdImpressionLog
window (duration '2' seconds, slide '2' seconds)
where geo != 'unknown' group by publisher, geo”)//
Register CQ
.foreachDataFrame(df => {
df.write.format("column").mode(SaveMode.Append)
.saveAsTable("adImpressions")

Updates & Deletes on Column Tables
24
Column Segment ( t1-t2)
Column Segment ( t2-t3)
0
1
0
0
0
0
1
1
0
K11
K12
.
.
.
.
.
C11
C12
.
.
.
.
.
C21
C22
.
.
.
.
.
Summary Metadata
PeriodicCompaction
One Partition
Time
WRITE
Row Buffer
MVCC
New Segment
Replicate
for HA

Can we use Statistical techniques to shrink data?
25
• Most apps happy to tradeoff 1% accuracy for
200x speedup!
• Can usually get a 99.9% accurate answer by only
looking at a tiny fraction of data!
• Often can make perfectly accurate decisions
with imperfect answers!
• A/B Testing, visualization, ...
• The data itself is usually noisy
• Processing entire data doesn’t necessarily mean exact
answers!

`
Probabilistic Store: Sketches + Uniform & Stratified Samples
Higher resolution for more recent
time ranges
1. Streaming CMS(Count-Min-Sketch)
[t1, t2) [t2, t3) [t3, t4) [t4, now) Time
4T 2T T ≤T
....
Maintain a small sample at each CMS cell
2. Top-K Queries w/ArbitraryFilters
Traditional CMS CMS+Samples
3. Fully Distributed Stratified Samples
Always include timestamp as a stratified column
for streams
Streams
AgingRow Store (In-memory) Column Store (Disk)
timestamp

Overview
27
Cluster Manager
& Scheduler
Parser
OLAP
Transact
Service
H
A
Stream Processing
Data Frame
RDD
Low
Latency
High
Latency
HYBRID Store
Index
Query
Optimizer
Add / Remove
Server
Tables ODBC/JDBC

Supporting Real-time & HA
28
Locator
Lead Node
Executor JVM (Server)
Shared Nothing
Persistence
JDBC/
ODBC
Catalog Service
Managed Driver
SPARK
Contacts
SPARK
Context
SNAPPY
Cluster
Manager
REST
SPARK JOBS
SPARK
Program
Memory Mgmt
BLOCKS SNAPPY STORE
Stream SNAPPY
Tables
Tables
DataFrame
• Spark Executors are
long running. Driver
failure doesn’t shutdown
Executors
• Driver HA – Drivers are
“Managed” by
SnappyData with standby
secondary
• Data HA – Consensus
based clustering
integrated for eager
replication
DataFrame
Peer-2-peer

Overview
29
Cluster Manager
& Scheduler
Parser
OLAP
Transact
Service
H
A
Stream Processing
Data Frame
RDD
Low
Latency
High
Latency
HYBRID Store
Index
Query
Optimizer
Add / Remove
Server
Tables ODBC/JDBC

Transactions
30
• Support for Read Committed & Repeatable Read
• W-W and R-W conflict detection at write time
• MVCC for non blocking reads and snapshot isolation
• Distributed system failure detection integrated with commit protocol
- Evict unresponsive replicas
- Ensure consistency when replicas recover

Overview
31
Cluster Manager
& Scheduler
Parser
OLAP
Transact
Service
H
A
Stream Processing
Data Frame
RDD
Low
Latency
High
Latency
HYBRID Store
Index
Query
Optimizer
Add / Remove
Server
Tables ODBC/JDBC

Query Optimization
• Bypass the scheduler for transactions and low-latency jobs
• Minimize shuffles aggressively
- Dynamic replication for reference data
- Retain ‘join indexes’ whenever possible
- Collocate related data sets
• Optimized ‘Hash Join’, ‘Scan’, ‘GroupBy’ compared to Spark
- Uses more variables in code generation, vectorized structures
• Column segment pruning through statistics

Co-partitioning & Co-location
33
Spark Executor Subscriber A-M
Ref data
Spark Executor Subscriber N-Z
Ref data
Linearlyscalewithpartitionpruning
Subscriber A-M
Subscriber N-Z
KAFKA
Queue
KAFKA
Queue

Overview
34
Cluster Manager
& Scheduler
Parser
OLAP
Transact
Service
H
A
Stream Processing
Data Frame
RDD
Low
Latency
High
Latency
HYBRID Store
Index
Query
Optimizer
Add / Remove
Server
Tables ODBC/JDBC

Approximate Query Processing (AQP): Academia vs. Industry
35
25+yrsofsuccessful research in academia User-facing AQP almost
non-existent in commercial world!
Some approximate features in Infobright, Yahoo’s
Druid, Facebook’s Presto, Oracle 12C, ...
AQUA, Online Aggregation, MapReduce
Online, STRAT, ABS, BlinkDB / G-OLA,
...
WHY ?
BUT:
select geo, avg(bid)
from adImpressions
group by geo having
avg(bid)>10
with error 0.05 at confidence 95
geo avg(bid) error prob_existence
MI 21.5 ± 0.4 0.99
CA 18.3 ± 5.1 0.80
MA 15.6 ± 2.4 0.81
... ... ... ....
1. Incompatible w/ BItools
2. Complex semantics
3. Bad sales pitch!

A First Industrial-Grade AQP Engine
1. Highlevel Accuracy Contract (HAC)
• Concurrency: 10’s of queries in shared clusters
• Resource usage: everyone hates their AWS bill
• Network shuffles
• Immediate results while waiting for final results
2. Fully compatible w/BItools
• Set HAC behavior at JDBC/ODBC connection level
3. Better marketing!
• User picks a single number p, where 0≤p≤1 (by
default p=0.95)
• Snappy guarantees that s/he only sees things that
are at least p% accurate
• Snappy handles (and hides) everything else!
geo avg(bid)
MI 21.5
WI 42.3
NY 65.6
... ...
iSight (Immediate
inSight)

Unified OLAP/OLTP streaming w/ Spark
● Far fewer resources: TB problem becomes GB.
○ CPU contention drops
● Far less complex
○ single cluster for stream ingestion, continuous queries, interactive
queries and machine learning
● Much faster
○ compressed data managed in distributed memory in columnar
form reduces volume and is much more responsive

Lessons Learned
2. A unified cluster is simpler, cheaper, andfaster
- By sharing state across apps, we decouple apps from data servers and provide HA
- Save memory, data copying, serialization, and shuffles
- Co-partitioning and co-location for faster joins and stream analytics
3. Advantages over HTAP engines: Deep stream integration +AQP
1. A unique experience marryingstwo different breeds ofdistributed systems
lineage-based for high-throughput vs. (consensus-) replication-based for low-latency
- Stream processing ≠ stream analytics
- Top-k w/ almost arbitrary predicates + 1-pass stratified sampling over streams
4. Commercializing academic workis lots ofwork but alsolots offun

THANK YOU !
Try our iSight cloud for free:
http://snappydata.io/iSight

iSight: Immediate inSight
iSight’s immediate
answer to the
query: 1.7 secs
Final answer to the
query: 42.7 secs
25x speedup!

Our Solution:
Highlevel Accuracy Contract (HAC)
• A single number 0≤p≤1 (by default p=0.95)
• We guarantee that you only see things that
are at least p% accurate
• We handle (and hide) everything else
– Choose a behavior: REPLACE WITH SPECIAL
SYMBOL (default), DO NOTHING, DROP THE ROW)

SnappyData, the Spark Database. A unified cluster for streaming, transactions & interactive analytics

More Related Content

What's hot

Viewers also liked

Similar to SnappyData, the Spark Database. A unified cluster for streaming, transactions & interactive analytics

Recently uploaded

SnappyData, the Spark Database. A unified cluster for streaming, transactions & interactive analytics

Editor's Notes