SlideShare a Scribd company logo
1 of 42
Download to read offline
SnappyData	
  
Building Continuous Applications Driven By Real Time Insights
Version  0.1    |  ©  SnappyData  Inc  2017  
www.snappydata.io  
Sudhir  Menon  ,  Founder,  COO  
 	
  	
  
Who Are We?
2	
  
•  New  Spark-­‐based  open  source  project  started  by  Pivotal  GemFire  
founders  +  engineers  
•  Decades  of  in-­‐memory  data  management  experience  
•  Focus  on  real-­‐Ome,  operaOonal  analyOcs:  Spark  based  OLTP+OLAP  
database	
  
Spinout  
SnappyData  
Funded  by  
Pivotal,  GE,  GTD  
Capital  
www.snappydata.io  
The Big Data Market Is Facing Disruption (Again)
•  Higher	
  Data	
  Volumes	
  
•  Growth	
  in	
  Streaming	
  	
  
	
  	
  	
  	
  	
  	
  Workloads	
  
•  Analy;cs	
  on	
  Live	
  Data	
  
•  Growth	
  in	
  unstructured	
  
	
  	
  	
  	
  	
  	
  	
  	
  data	
  
•  Machine	
  Learning/	
  AI	
  as	
  	
  
	
  	
  	
  	
  	
  	
  first	
  class	
  workloads	
  
Need	
  to	
  reduce	
  complexity	
  and	
  cloud	
  
costs	
  	
  
©	
  SnappyData	
  Inc.	
  2017	
   4	
  
Spark  –  Key  To  emerging  baAleground  in  AnalyCcs  and  BI  
-­‐	
  Mul;-­‐model	
  data	
  
-­‐	
  Integrate	
  Streaming	
  data	
  
Phase1	
  
Phase2	
  
Phase3	
  ?	
  
Source:	
  Gartner	
  
 	
  	
  
Mixed Workloads Are Everywhere
5	
  
Stream  
Processing  
TransacCon  
(point  lookups,  small  
updates)  
InteracCve  
AnalyCcs  
Analytics on
mutating data
Correlating and
joining streams with
large histories
Maintaining state or
counters while
ingesting streams
www.snappydata.io  
•  Elapsed time from event occurrence to event
analytics matters
•  Latency in using information for learning matters
•  Concurrency matters
•  Recovery time matters
•  User-kernel crossings matter
In short, liveness of data matters when it comes to
making decisions based on current information
Time Value of Information – Why does it matter
www.snappydata.io  
•  Applications that are intelligent, proactive, learn from past interactions, and are
context aware in their decision making
•  Fast and reliable ingestion capabilities
•  Support high memory density
•  Utilize memory to reduce response time
•  Support high concurrency
•  Work on live data
•  Support data mutability
What Is Happening With Applications
www.snappydata.io  
•  Market Surveillance Systems (Trading exchanges, Market makers)
•  Real Time Scoring Systems (Product recommendations, real time offers)
•  Telco Analytics (Location based services, Predictive analytics)
•  Sensor Analytics (Real time alerting for parking management, lighting etc.)
•  Ad analytics + Ad placement systems
•  Credit Card Fraud
•  Detecting and Stopping Malware
Lets Discuss Some Use Cases
 	
  	
  
Mixed Workloads in Industrial IOT
9	
  
IOT  
Devices  
Anomaly  detecOon  –  
score  against  models  
-­‐  Map  sensors  to  tags  
-­‐  Monitor  temperature  
-­‐  Send  alerts  
Correlate  current  
temperature  trend  with  
history….    
Interact  using  
  dynamic  queries….    
Event Stream
www.snappydata.io  
•  Is Spark ready for real time? Enterprise?
•  Lacks mutable State management
•  Not designed for high concurrency and mixed workloads
•  Inadequate SQL support; Limited ODBC/JDBC access
•  Fault tolerant not HA
•  Near impossible to do Live Analytics on NoSQL stores
•  Pattern today – periodic copy of state into some analytic DB/Hadoop
•  Stale Insight, not continuous or real time
•  Interactive dynamic aggregations not possible
•  Data models makes support for BI tools like Tableau difficult
•  Most Stream processors not capable of true Analytics
•  Deep stream analytics augmenting stream processing
Pain  points  we  come  across  today  
11	
  
PaAern  1:  NaCve  data  store  for  Spark  …  Fast,  Simple  
Scalable	
  NoSQL	
  
Spark	
  Analy;cs	
  
(compute)	
  
	
  
	
  
Mul;-­‐model,	
  distributed	
  in-­‐memory	
  data	
  store	
  na;vely	
  designed	
  for	
  Spark	
  
Immutable	
  
Cache	
  
NoSQL,	
  
Hadoop	
  
-­‐  20X	
  faster	
  than	
  Spark	
  for	
  Analy;c	
  queries	
  
-­‐  1000X	
  faster	
  than	
  Spark+NoSQL	
  
-­‐  Mutable	
  DataFrames,	
  transac;ons,	
  indexing	
  
-­‐  Rich,	
  complete	
  SQL	
  +	
  All	
  Spark	
  APIs	
  
-­‐  Highly	
  Available	
  data,	
  Spark	
  Driver	
  
-­‐  Enterprise	
  grade	
  security	
  
-­‐  Na;ve	
  support	
  for	
  ML/DL	
  
Too	
  much	
  copying	
  …	
  too	
  slow	
  for	
  real	
  
;me,	
  Interac;ve	
  analy;cs	
  
Scalable	
  NoSQL	
  
Spark	
  Analy;cs	
  
(compute)	
  
	
  
	
  
	
  SnappyData	
  
Data	
  sources	
  
PaAern  2:  InteracCve  analyCcs  on  live  data  in  NoSQL  stores  
Problem:	
  Interac;ve	
  Analy;cs	
  at	
  scale	
  and	
  concurrency	
  for	
  LIVE	
  data	
  sets	
  
e.g.	
  Sensor	
  data,	
  Customer	
  interac;ons	
  
Scalable	
  
NoSQL	
  
Opera;onal	
  
Live,	
  Data	
  
NoSQL	
  
Scalable	
  
NoSQL	
  Hadoop,	
  
MPP	
  DB	
  
	
  	
  	
  	
   Cubes,	
  aggrega;ons	
  
-­‐  MongoDB	
  BI	
  connector	
  
-­‐  Custom	
  BI	
  like	
  SlamData	
  
Tableau	
  Extracts	
  
Tableau	
  
MPP	
  ETL	
  
Expensive,	
  complex,	
  batch	
  
Difficult	
  to	
  deal	
  with	
  semi-­‐structured	
  
Read	
  only,	
  stale	
  Insight	
  
Inflexible,	
  Slow	
  
Con;nuous	
  
updates	
  
	
  	
  	
  	
  
13	
  
CDC  
Streams  
Hadoop	
  
NoSQL	
  
Rich	
  SPARK	
  APIs	
  
window	
  
Spark	
  
Transform	
  
(Data	
  Prep)	
  
-­‐  Live	
  updates	
  propagated	
  to	
  in-­‐memory	
  Analy;cs	
  Cluster	
  in	
  SnappyData	
  	
  
Micro	
  
Service	
  1	
  
Micro	
  
Service	
  2	
  
Micro	
  
Service	
  3	
  
In-­‐memory	
  	
  
Row-­‐Column	
  
Tables	
  
Virtual	
  Tables	
  
NoSQL	
  Connectors	
  SQL	
  
Visualize	
  on	
  any	
  tool	
  
Micro	
  
Service	
  3	
   Sensor	
  stream	
  
PaAern  2  :  Live  AnalyCcs  on  Polyglot  NotOnlySQL  stores  
SnappyData	
  
14	
  
Unbounded    
Streams  
State  Update  
Index  
OLAP  
Column	
  table	
  
Hadoop	
  
NoSQL	
  
Stream	
  App	
  
window	
  
KV	
  Store	
  
-­‐  KV	
  stores	
  offer	
  lihle	
  to	
  no	
  analy;c	
  operators	
  
-­‐  Joins,	
  aggrega;ons	
  across	
  mul;ple	
  DBs	
  not	
  possible	
  
-­‐  Too	
  slow	
  
PaAern  3:  Streaming  AnalyCcs  not  just  simple  processing  
Streaming  in  Flink,  DataFlow,  Apex  …	
  
15	
  
•  Sensor  streams    
•  CDC  streams  
•  TransacCon  
Streams…  
Rich	
  SPARK	
  APIs	
  
Stream	
  
Streaming	
  deeply	
  integrated	
  with	
  Analy:cs	
  DB	
  
PaAern  3:  Deep  integraCon  of  stream  processing  with  OLAP  
In-­‐memory	
  	
  
Row-­‐Column	
  
Tables	
  
NoSQL	
  Connectors	
   SQL	
  
Pull	
  history	
  
on	
  Demand	
  Con;nuously	
  
summarize	
  
-­‐  Con;nuous	
  queries	
  on	
  stream	
  +	
  history	
  +	
  enterprise	
  data	
  
-­‐  Simple:	
  Build	
  stream+analy;cs	
  apps	
  using	
  single	
  model	
  
-­‐  Much	
  faster	
  than	
  s;tching	
  
Tableau,	
  Zeppelin	
  
 	
  	
  
How Mixed Workloads Are Supported Today
16	
  
Query  
New            
Data  
Batch  layer  
Master  
Datasheet  
2  
Serving  layer  
Batch  view  
3  
Batch  view  
Speed  layer  
4  
Real-­‐Cme  View   Real-­‐Cme  View  
1  
Query  
5  
 	
  	
  
Lambda Architecture is Complex
17	
  
KAFKA  
STORM  
CASSANDRA  
.....
SOURCE  APPS  
•  Complexity: learn  and  master  mulOple  
products,  data  models,  disparate  APIs,  
configs  
•  Slower
•  Wasted resources
18	
  
Can	
  We	
  
Simplify	
  &	
  
Op:mize?	
  
 	
  	
  	
  
19	
  
How about a single clustered DB that can manage
stream state, transactional data & run OLAP queries?
Stream  processing  
Scalable writes, point reads, OLAP queries
Apps  
Framework  for  Stream  Processing,  etc  
RDB  
MPP  DB  
HDFS  
Tables  
Txn  
20	
  ©  Snappydata  INC  2017  
  
Our  
SoluCon  
SnappyData	
  
A Single Unified Cluster:
OLTP + OLAP + Streaming for real-time analytics
 	
  	
  
Our Solution
21	
  
Deep  Scale,  
High  Volume  
MPP  DB  
Real-­‐Cme  design  
Low  latency,  HA,    
Concurrency,  replicaOion  
based  consensus  driven  
  
Batch  design,  high  
throughput,  lineage  
based  system  
  
Rapidly Maturing Matured over 13 years
Single  Unified  HA  Cluster  
OLTP + OLAP + Streaming for real-time analytics
 	
  	
  
A  Spark  Based  Big  Data  AnalyCcs  Pla_orm  
22	
  
Spark  API  
(Streaming,  ML,  Graph)  
TransacOons,  
Indexing  
Full  SQL     HA  
DataFrame,  
RDD,  DataSets  
Rows  Columnar  
IN-­‐MEMORY	
  
Spark  Cache  
Synopses  
(Samples)  
Unified  Data  Access  
(Virtual  Tables)  
Unified  Catalog  NaOve  Store  
SNAPPYDATA
HDFS/
HBASE	
  
S3	
  
JSON,	
  CSV,	
  
XML	
  
SQL	
  db	
   Cassandra	
   MPP	
  DB	
  
Stream	
  
sources	
  
Spark	
  Jobs,	
  Scala/Java/Python/R	
  API,	
  JDBC/ODBC,	
  Object	
  API	
  (RDD,	
  DataSets)	
  
 	
  	
  
We transform Spark from this…
23	
  
Deep  Scale,  
High  Volume  
MPP  DB  
USER 1 / APP 1
SPARK  
MASTER  
Spark  ExecuCon  (Worker)  
Framework  for  
streaming  SQL,  
ML…  
Immutable  
CACHE  
USER 2 / APP 2
SPARK  
MASTER  
Spark  ExecuCon  (Worker)  
Framework  for  
streaming  SQL,  
ML…  
Immutable  
CACHE  
HDFS  
SQL  
NoSQL  
  
•  Cannot  update  
•  Repeated  for  each  User/
APP  
Boaleneck  
 	
  	
  
… Into “an always-on hybrid database !
24	
  
Deep  Scale,  
High  Volume  
MPP  DB  
HDFS  
SQL  
NoSQL  
  
HISTORY  
Spark  ExecuCon  (Worker)  JVM
- Long running
Framework  for  
streaming  SQL,  
ML…  
Spark  
Driver  
IN-­‐Memory  
ROW  +  COLUMN  
Start  with  
Indexing  
Store  
-  Mutable,
-  TransactionalSPARK  
Cluster  
JDBC  
ODBC  
Spark Job
Shared  Nothing  
Persistence  
  
 	
  	
  
Architecture  
25	
  
Cluster  Manager    
&  Scheduler  
Snappy  Data  Server  (Spark Executor + Store)
Parser  
OLAP  
TXN  
Synopsis  Data  Engine  
Distributed  Membership    
Service  
H
A
Stream  Processing  
Data  Frame  
RDD  
Low  
Latency  
High  
Latency  
HYBRID  Store  
ProbabilisOc   Rows   Columns  
Index  
Query  
OpOmizer  
Add  /  Remove  
Server  
Tables   ODBC/JDBC  
 	
  	
  
Unified API
26	
  
•  ML,  graph,  batch  &  streaming,  SQL  (selects)	
  
Spark’s  DataFrame  API  allows  for:  
	
  
•  Mutability  semanOcs  (DML  &  transacOons)  
•  Indexing    
•  SQL-­‐based  streaming	
  
SnappyData  adds  full  SQL  support  and  extends  DataFrame  and  DataSource  APIs  for:	
  
 	
  	
  
Can we use Statistical techniques to shrink data?
27	
  
•  Most	
  apps	
  happy	
  to	
  tradeoff	
  1%	
  accuracy	
  for	
  
200x	
  speedup!	
  	
  
•  Can	
  usually	
  get	
  a	
  99.9%	
  accurate	
  answer	
  by	
  only	
  
looking	
  at	
  a	
  ;ny	
  frac;on	
  of	
  data!	
  	
  	
  
•  Oqen	
  can	
  make	
  perfectly	
  accurate	
  decisions	
  
with	
  imperfect	
  answers!	
  	
  
•  A/B	
  Tes;ng,	
  visualiza;on,	
  ...	
  	
  
•  The	
  data	
  itself	
  is	
  usually	
  noisy	
  
•  Processing	
  en;re	
  data	
  doesn’t	
  necessarily	
  mean	
  exact	
  
answers!	
  
 	
  `	
  
Probabilistic Store: Sketches + Uniform & Stratified Samples
Higher  resoluOon  for  more  recent  
Ome  ranges  
1. Streaming CMS
(Count-Min-Sketch)
[t1,  t2)                          [t2,  t3)                            [t3,  t4)                        [t4,  now)   Time
4T	
   2T	
   T	
   ≤T	
  
....	
  
Maintain  a  small  sample  at  each  CMS  cell  
2. Top-K Queries w/ Arbitrary Filters
Tradi2onal  CMS                            CMS+Samples  
3. Fully Distributed Stratified Samples
Always  include  Omestamp  as  a  straOfied  column  
for  streams  
Streams  
Aging  Row  Store  (In-­‐memory)   Column  Store  (Disk)  
timestamp
 	
  	
  
High-Level Accuracy Guarantees
29	
  
1 0 1 1 0 0
2 1 2 0 0 1
2 0 0 0 1 1
0 1 0 2 0 2
Quality  cer2fied  
Approx  Answers  
Query  Engine  
HAC  
Bias  Es2mate  
Variance  Es2mate  
STREAMS  
Aging  
SNAPPY  STORE  
Stra2fied  Samples   Stra2fied  Samples  
Interac2ve  Query  
Con2nuous  Query  
Pipelined  
bootstrapped  
operator  
Row  store  Memory   Column  Store  Disk  
 	
  	
  
30	
  
Deep  Fusion  
w/  Spark  Extreme  
Speed  
Synopsis  
Data  
Engine  
Deep  Fusion  with  Spark  
Elas;c,	
  highly	
  available	
  in-­‐memory	
  store	
  for	
  OLTP	
  fused	
  with	
  
Spark’s	
  memory	
  manager	
  and	
  the	
  Catalyst/Tungsten	
  engine.	
  	
  
The	
  store	
  itself	
  is	
  exposed	
  as	
  na;ve	
  Spark	
  data	
  frames.
Extreme  Speed  thru  CPU  code  gen,  vectorizaCon  
Extend	
  Spark’s	
  Tungsten	
  engine	
  with	
  beher	
  code	
  genera;on,	
  
coloca;on	
  schemes,	
  ..
Use	
  Sta;s;cal	
  techniques	
  to	
  reduce	
  data	
  by	
  100-­‐1000x	
  
Answer	
  queries	
  in	
  frac;on	
  of	
  ;me	
  and	
  resources	
  
Synopses  Data  Engine  
What is unique
 	
  	
  
Cloud Ready
31	
  
Dealing with Credit Card Fraud
SnappyData  Cluster  
Credit  Card  
transacOon  
stream  
User  History  
PredicOon  
Model  
Streaming  ApplicaOon  
……….  
Black    
Listed    
Cards  
Data
Lake
No;fica;on	
  to	
  
owner	
  
No;fica;on	
  to	
  
merchant	
  
SnappyData  Cluster  
Customers  
Approaching    
Limit  
Plan    
Info  
CDR  Stream  
Schedule  callback  
through  call  center  
Streaming  ApplicaOon  Immediate  SMS    
to  customer  
Data
Lake
Preventing Bill Shock, Real Time Upgrades
The	
  system	
  detects	
  approaching	
  usage	
  
limits,	
  no;fies	
  users	
  and	
  gives	
  them	
  	
  
a	
  chance	
  to	
  buy	
  a	
  one	
  ;me	
  upgrade	
  or	
  
a	
  new	
  plan,	
  increasing	
  loyalty	
  &	
  revenue	
  
www.snappydata.io  
Stream  IngesOon  
Reference	
  
Data	
  
•  Stream  analyOcs  
•  Insider  detecOon  
•  Apply  Rules  
•  Detect  Market  
ManipulaOon  
Alert  &  NoOfy  Downstream  
Systems  
Trigger  InvesOgaOons  
Spark	
  Streaming	
  
SQL	
  Querying	
  
Con;nuous	
  Queries	
  
Par;;oned	
  Stream	
  
Inges;on	
  
Summaries	
  &	
  Alerts	
  
Messaging	
  
Machine	
  Learning	
  
Market Surveillance For Market Makers
Connected Car Real Time Data Flow
SnappyData  Cluster  
Kava	
  	
  
Receiver	
   Vehicle  Time    
Series  Data  
Vehicle  
History  
Driver    
History  
Streaming  ApplicaOon  
HDFS,	
  HBase	
  
Raw	
  Data	
  Store	
  
Custom	
  
Summary	
  
Dashboard	
  
No;fica;on	
  to	
  
owner	
  
……….  
System    
KPIs  
  
Asset    
Metadata  
Offline
Analysis
REAL TIME MATCHING ENGINE
MATCHING  
ENGINE  
Customer  
History  
NoOficaOon  
Sub-­‐system  
!  
Historical  Customer  
Profiles  
User  by    
Geo  locaOon  
PERSONALIZED  
CAMPAIGNS  TO  
USERS  
	
  	
  	
  	
  	
  	
  	
  Ingest  Stream  
REAL    
TIME  
OFFERS  
  
  
from  
Merchants  
Real Time Marketing Campaigns
A	
  stream	
  matching	
  engine	
  that	
  uses	
  customer	
  
history,	
  their	
  current	
  loca;on	
  and	
  relevant	
  offers	
  to	
  
Effec;vely	
  target	
  users	
  creates	
  differen;a;on	
  &	
  generates	
  revenue	
  
www.snappydata.io  
000’s data points/sec
Emergency Shutdown
Tuning & Optimization,
Monitor & Control
Continuous Real-time
Analysis
Maintenance
Billing
Sensor Analytics
Message	
  Bus	
  
Stream  IngesOon  
Reference	
  
Data	
  
ETL	
  
•  OLAP	
  and	
  Low	
  
Latency	
  
Querying	
  in	
  SQL	
  	
  
•  Machine	
  
Learning	
  in	
  Spark	
  
RFQs/Trades/Quotes streams
Analytic Dashboards
SnappyData
RFQ Analytics
 	
  	
  
Ad Analytics
39	
  
1.5-­‐2x        faster ingestion, faster trx
7-­‐142×    faster analytics (at 300M records)
 	
  	
  
Data Synopsis Engine
40	
  
 	
  	
  
TPCH
41	
  
Avg  Latency  
  
SnappyData  
  
MemSQL  
  
Spark  
5.7s  
100 GB
12.0s  
66.9s  
THANK  YOU  !  
Try  it  out:  hAp://snappydata.io/download  
Resources:  hAp://www.snappydata.io/
resources	
  

More Related Content

What's hot

SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017Jags Ramnarayan
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataData Con LA
 
Jags Ramnarayan's presentation
Jags Ramnarayan's presentationJags Ramnarayan's presentation
Jags Ramnarayan's presentationpunesparkmeetup
 
Sumedh Wale's presentation
Sumedh Wale's presentationSumedh Wale's presentation
Sumedh Wale's presentationpunesparkmeetup
 
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...Sumeet Singh
 
Pandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySparkPandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySparkLi Jin
 
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...Spark Summit
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream DataDataWorks Summit
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment Databricks
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopTony Ng
 
Spark Summit EU talk by Zoltan Zvara
Spark Summit EU talk by Zoltan ZvaraSpark Summit EU talk by Zoltan Zvara
Spark Summit EU talk by Zoltan ZvaraSpark Summit
 
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenDatabricks
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkCloudera, Inc.
 
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)Sudhir Mallem
 
What’s New in the Upcoming Apache Spark 3.0
What’s New in the Upcoming Apache Spark 3.0What’s New in the Upcoming Apache Spark 3.0
What’s New in the Upcoming Apache Spark 3.0Databricks
 
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of AmazonBig Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of AmazonData Con LA
 
High concurrency,
Low latency analytics
using Spark/Kudu
 High concurrency,
Low latency analytics
using Spark/Kudu High concurrency,
Low latency analytics
using Spark/Kudu
High concurrency,
Low latency analytics
using Spark/KuduChris George
 

What's hot (20)

SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and Snappydata
 
Jags Ramnarayan's presentation
Jags Ramnarayan's presentationJags Ramnarayan's presentation
Jags Ramnarayan's presentation
 
Sumedh Wale's presentation
Sumedh Wale's presentationSumedh Wale's presentation
Sumedh Wale's presentation
 
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
 
Pandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySparkPandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySpark
 
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
 
Spark Technology Center IBM
Spark Technology Center IBMSpark Technology Center IBM
Spark Technology Center IBM
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream Data
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on Hadoop
 
Spark Summit EU talk by Zoltan Zvara
Spark Summit EU talk by Zoltan ZvaraSpark Summit EU talk by Zoltan Zvara
Spark Summit EU talk by Zoltan Zvara
 
Is hadoop for you
Is hadoop for youIs hadoop for you
Is hadoop for you
 
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with Amundsen
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache Spark
 
Conviva spark
Conviva sparkConviva spark
Conviva spark
 
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
 
What’s New in the Upcoming Apache Spark 3.0
What’s New in the Upcoming Apache Spark 3.0What’s New in the Upcoming Apache Spark 3.0
What’s New in the Upcoming Apache Spark 3.0
 
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of AmazonBig Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
 
High concurrency,
Low latency analytics
using Spark/Kudu
 High concurrency,
Low latency analytics
using Spark/Kudu High concurrency,
Low latency analytics
using Spark/Kudu
High concurrency,
Low latency analytics
using Spark/Kudu
 

Similar to SnappyData Toronto Meetup Nov 2017

High performance Spark distribution on PKS by SnappyData
High performance Spark distribution on PKS by SnappyDataHigh performance Spark distribution on PKS by SnappyData
High performance Spark distribution on PKS by SnappyDataCarlos Andrés García
 
High performance Spark distribution on PKS by SnappyData
High performance Spark distribution on PKS by SnappyDataHigh performance Spark distribution on PKS by SnappyData
High performance Spark distribution on PKS by SnappyDataVMware Tanzu
 
Efficient State Management With Spark 2.x And Scale-Out Databases
Efficient State Management With Spark 2.x And Scale-Out DatabasesEfficient State Management With Spark 2.x And Scale-Out Databases
Efficient State Management With Spark 2.x And Scale-Out DatabasesSnappyData
 
Getting Spark ready for real-time, operational analytics
Getting Spark ready for real-time, operational analyticsGetting Spark ready for real-time, operational analytics
Getting Spark ready for real-time, operational analyticsairisData
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analyticskgshukla
 
DoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics PlatformDoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics Platformmartinbpeters
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureLuan Moreno Medeiros Maciel
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to sparkHome
 
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...Michael Stack
 
HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...
HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...
HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...Chetan Khatri
 
[WSO2Con USA 2018] The Rise of Streaming SQL
[WSO2Con USA 2018] The Rise of Streaming SQL[WSO2Con USA 2018] The Rise of Streaming SQL
[WSO2Con USA 2018] The Rise of Streaming SQLWSO2
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Jason Dai
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Value Association
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineData Con LA
 
Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overviewRohit Jain
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekVenkata Naga Ravi
 

Similar to SnappyData Toronto Meetup Nov 2017 (20)

High performance Spark distribution on PKS by SnappyData
High performance Spark distribution on PKS by SnappyDataHigh performance Spark distribution on PKS by SnappyData
High performance Spark distribution on PKS by SnappyData
 
High performance Spark distribution on PKS by SnappyData
High performance Spark distribution on PKS by SnappyDataHigh performance Spark distribution on PKS by SnappyData
High performance Spark distribution on PKS by SnappyData
 
Efficient State Management With Spark 2.x And Scale-Out Databases
Efficient State Management With Spark 2.x And Scale-Out DatabasesEfficient State Management With Spark 2.x And Scale-Out Databases
Efficient State Management With Spark 2.x And Scale-Out Databases
 
Getting Spark ready for real-time, operational analytics
Getting Spark ready for real-time, operational analyticsGetting Spark ready for real-time, operational analytics
Getting Spark ready for real-time, operational analytics
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
 
Glint with Apache Spark
Glint with Apache SparkGlint with Apache Spark
Glint with Apache Spark
 
DoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics PlatformDoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics Platform
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 
Dev Ops Training
Dev Ops TrainingDev Ops Training
Dev Ops Training
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
 
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
 
HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...
HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...
HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...
 
The Rise of Streaming SQL
The Rise of Streaming SQLThe Rise of Streaming SQL
The Rise of Streaming SQL
 
[WSO2Con USA 2018] The Rise of Streaming SQL
[WSO2Con USA 2018] The Rise of Streaming SQL[WSO2Con USA 2018] The Rise of Streaming SQL
[WSO2Con USA 2018] The Rise of Streaming SQL
 
Nike tech talk.2
Nike tech talk.2Nike tech talk.2
Nike tech talk.2
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICS
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
 
Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overview
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
 

Recently uploaded

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

SnappyData Toronto Meetup Nov 2017

  • 1. SnappyData   Building Continuous Applications Driven By Real Time Insights Version  0.1    |  ©  SnappyData  Inc  2017   www.snappydata.io   Sudhir  Menon  ,  Founder,  COO  
  • 2.       Who Are We? 2   •  New  Spark-­‐based  open  source  project  started  by  Pivotal  GemFire   founders  +  engineers   •  Decades  of  in-­‐memory  data  management  experience   •  Focus  on  real-­‐Ome,  operaOonal  analyOcs:  Spark  based  OLTP+OLAP   database   Spinout   SnappyData   Funded  by   Pivotal,  GE,  GTD   Capital  
  • 3. www.snappydata.io   The Big Data Market Is Facing Disruption (Again) •  Higher  Data  Volumes   •  Growth  in  Streaming                Workloads   •  Analy;cs  on  Live  Data   •  Growth  in  unstructured                  data   •  Machine  Learning/  AI  as                first  class  workloads   Need  to  reduce  complexity  and  cloud   costs    
  • 4. ©  SnappyData  Inc.  2017   4   Spark  –  Key  To  emerging  baAleground  in  AnalyCcs  and  BI   -­‐  Mul;-­‐model  data   -­‐  Integrate  Streaming  data   Phase1   Phase2   Phase3  ?   Source:  Gartner  
  • 5.       Mixed Workloads Are Everywhere 5   Stream   Processing   TransacCon   (point  lookups,  small   updates)   InteracCve   AnalyCcs   Analytics on mutating data Correlating and joining streams with large histories Maintaining state or counters while ingesting streams
  • 6. www.snappydata.io   •  Elapsed time from event occurrence to event analytics matters •  Latency in using information for learning matters •  Concurrency matters •  Recovery time matters •  User-kernel crossings matter In short, liveness of data matters when it comes to making decisions based on current information Time Value of Information – Why does it matter
  • 7. www.snappydata.io   •  Applications that are intelligent, proactive, learn from past interactions, and are context aware in their decision making •  Fast and reliable ingestion capabilities •  Support high memory density •  Utilize memory to reduce response time •  Support high concurrency •  Work on live data •  Support data mutability What Is Happening With Applications
  • 8. www.snappydata.io   •  Market Surveillance Systems (Trading exchanges, Market makers) •  Real Time Scoring Systems (Product recommendations, real time offers) •  Telco Analytics (Location based services, Predictive analytics) •  Sensor Analytics (Real time alerting for parking management, lighting etc.) •  Ad analytics + Ad placement systems •  Credit Card Fraud •  Detecting and Stopping Malware Lets Discuss Some Use Cases
  • 9.       Mixed Workloads in Industrial IOT 9   IOT   Devices   Anomaly  detecOon  –   score  against  models   -­‐  Map  sensors  to  tags   -­‐  Monitor  temperature   -­‐  Send  alerts   Correlate  current   temperature  trend  with   history….     Interact  using    dynamic  queries….     Event Stream
  • 10. www.snappydata.io   •  Is Spark ready for real time? Enterprise? •  Lacks mutable State management •  Not designed for high concurrency and mixed workloads •  Inadequate SQL support; Limited ODBC/JDBC access •  Fault tolerant not HA •  Near impossible to do Live Analytics on NoSQL stores •  Pattern today – periodic copy of state into some analytic DB/Hadoop •  Stale Insight, not continuous or real time •  Interactive dynamic aggregations not possible •  Data models makes support for BI tools like Tableau difficult •  Most Stream processors not capable of true Analytics •  Deep stream analytics augmenting stream processing Pain  points  we  come  across  today  
  • 11. 11   PaAern  1:  NaCve  data  store  for  Spark  …  Fast,  Simple   Scalable  NoSQL   Spark  Analy;cs   (compute)       Mul;-­‐model,  distributed  in-­‐memory  data  store  na;vely  designed  for  Spark   Immutable   Cache   NoSQL,   Hadoop   -­‐  20X  faster  than  Spark  for  Analy;c  queries   -­‐  1000X  faster  than  Spark+NoSQL   -­‐  Mutable  DataFrames,  transac;ons,  indexing   -­‐  Rich,  complete  SQL  +  All  Spark  APIs   -­‐  Highly  Available  data,  Spark  Driver   -­‐  Enterprise  grade  security   -­‐  Na;ve  support  for  ML/DL   Too  much  copying  …  too  slow  for  real   ;me,  Interac;ve  analy;cs   Scalable  NoSQL   Spark  Analy;cs   (compute)        SnappyData   Data  sources  
  • 12. PaAern  2:  InteracCve  analyCcs  on  live  data  in  NoSQL  stores   Problem:  Interac;ve  Analy;cs  at  scale  and  concurrency  for  LIVE  data  sets   e.g.  Sensor  data,  Customer  interac;ons   Scalable   NoSQL   Opera;onal   Live,  Data   NoSQL   Scalable   NoSQL  Hadoop,   MPP  DB           Cubes,  aggrega;ons   -­‐  MongoDB  BI  connector   -­‐  Custom  BI  like  SlamData   Tableau  Extracts   Tableau   MPP  ETL   Expensive,  complex,  batch   Difficult  to  deal  with  semi-­‐structured   Read  only,  stale  Insight   Inflexible,  Slow   Con;nuous   updates          
  • 13. 13   CDC   Streams   Hadoop   NoSQL   Rich  SPARK  APIs   window   Spark   Transform   (Data  Prep)   -­‐  Live  updates  propagated  to  in-­‐memory  Analy;cs  Cluster  in  SnappyData     Micro   Service  1   Micro   Service  2   Micro   Service  3   In-­‐memory     Row-­‐Column   Tables   Virtual  Tables   NoSQL  Connectors  SQL   Visualize  on  any  tool   Micro   Service  3   Sensor  stream   PaAern  2  :  Live  AnalyCcs  on  Polyglot  NotOnlySQL  stores   SnappyData  
  • 14. 14   Unbounded     Streams   State  Update   Index   OLAP   Column  table   Hadoop   NoSQL   Stream  App   window   KV  Store   -­‐  KV  stores  offer  lihle  to  no  analy;c  operators   -­‐  Joins,  aggrega;ons  across  mul;ple  DBs  not  possible   -­‐  Too  slow   PaAern  3:  Streaming  AnalyCcs  not  just  simple  processing   Streaming  in  Flink,  DataFlow,  Apex  …  
  • 15. 15   •  Sensor  streams     •  CDC  streams   •  TransacCon   Streams…   Rich  SPARK  APIs   Stream   Streaming  deeply  integrated  with  Analy:cs  DB   PaAern  3:  Deep  integraCon  of  stream  processing  with  OLAP   In-­‐memory     Row-­‐Column   Tables   NoSQL  Connectors   SQL   Pull  history   on  Demand  Con;nuously   summarize   -­‐  Con;nuous  queries  on  stream  +  history  +  enterprise  data   -­‐  Simple:  Build  stream+analy;cs  apps  using  single  model   -­‐  Much  faster  than  s;tching   Tableau,  Zeppelin  
  • 16.       How Mixed Workloads Are Supported Today 16   Query   New             Data   Batch  layer   Master   Datasheet   2   Serving  layer   Batch  view   3   Batch  view   Speed  layer   4   Real-­‐Cme  View   Real-­‐Cme  View   1   Query   5  
  • 17.       Lambda Architecture is Complex 17   KAFKA   STORM   CASSANDRA   ..... SOURCE  APPS   •  Complexity: learn  and  master  mulOple   products,  data  models,  disparate  APIs,   configs   •  Slower •  Wasted resources
  • 18. 18   Can  We   Simplify  &   Op:mize?  
  • 19.         19   How about a single clustered DB that can manage stream state, transactional data & run OLAP queries? Stream  processing   Scalable writes, point reads, OLAP queries Apps   Framework  for  Stream  Processing,  etc   RDB   MPP  DB   HDFS   Tables   Txn  
  • 20. 20  ©  Snappydata  INC  2017     Our   SoluCon   SnappyData   A Single Unified Cluster: OLTP + OLAP + Streaming for real-time analytics
  • 21.       Our Solution 21   Deep  Scale,   High  Volume   MPP  DB   Real-­‐Cme  design   Low  latency,  HA,     Concurrency,  replicaOion   based  consensus  driven     Batch  design,  high   throughput,  lineage   based  system     Rapidly Maturing Matured over 13 years Single  Unified  HA  Cluster   OLTP + OLAP + Streaming for real-time analytics
  • 22.       A  Spark  Based  Big  Data  AnalyCcs  Pla_orm   22   Spark  API   (Streaming,  ML,  Graph)   TransacOons,   Indexing   Full  SQL     HA   DataFrame,   RDD,  DataSets   Rows  Columnar   IN-­‐MEMORY   Spark  Cache   Synopses   (Samples)   Unified  Data  Access   (Virtual  Tables)   Unified  Catalog  NaOve  Store   SNAPPYDATA HDFS/ HBASE   S3   JSON,  CSV,   XML   SQL  db   Cassandra   MPP  DB   Stream   sources   Spark  Jobs,  Scala/Java/Python/R  API,  JDBC/ODBC,  Object  API  (RDD,  DataSets)  
  • 23.       We transform Spark from this… 23   Deep  Scale,   High  Volume   MPP  DB   USER 1 / APP 1 SPARK   MASTER   Spark  ExecuCon  (Worker)   Framework  for   streaming  SQL,   ML…   Immutable   CACHE   USER 2 / APP 2 SPARK   MASTER   Spark  ExecuCon  (Worker)   Framework  for   streaming  SQL,   ML…   Immutable   CACHE   HDFS   SQL   NoSQL     •  Cannot  update   •  Repeated  for  each  User/ APP   Boaleneck  
  • 24.       … Into “an always-on hybrid database ! 24   Deep  Scale,   High  Volume   MPP  DB   HDFS   SQL   NoSQL     HISTORY   Spark  ExecuCon  (Worker)  JVM - Long running Framework  for   streaming  SQL,   ML…   Spark   Driver   IN-­‐Memory   ROW  +  COLUMN   Start  with   Indexing   Store   -  Mutable, -  TransactionalSPARK   Cluster   JDBC   ODBC   Spark Job Shared  Nothing   Persistence    
  • 25.       Architecture   25   Cluster  Manager     &  Scheduler   Snappy  Data  Server  (Spark Executor + Store) Parser   OLAP   TXN   Synopsis  Data  Engine   Distributed  Membership     Service   H A Stream  Processing   Data  Frame   RDD   Low   Latency   High   Latency   HYBRID  Store   ProbabilisOc   Rows   Columns   Index   Query   OpOmizer   Add  /  Remove   Server   Tables   ODBC/JDBC  
  • 26.       Unified API 26   •  ML,  graph,  batch  &  streaming,  SQL  (selects)   Spark’s  DataFrame  API  allows  for:     •  Mutability  semanOcs  (DML  &  transacOons)   •  Indexing     •  SQL-­‐based  streaming   SnappyData  adds  full  SQL  support  and  extends  DataFrame  and  DataSource  APIs  for:  
  • 27.       Can we use Statistical techniques to shrink data? 27   •  Most  apps  happy  to  tradeoff  1%  accuracy  for   200x  speedup!     •  Can  usually  get  a  99.9%  accurate  answer  by  only   looking  at  a  ;ny  frac;on  of  data!       •  Oqen  can  make  perfectly  accurate  decisions   with  imperfect  answers!     •  A/B  Tes;ng,  visualiza;on,  ...     •  The  data  itself  is  usually  noisy   •  Processing  en;re  data  doesn’t  necessarily  mean  exact   answers!  
  • 28.    `   Probabilistic Store: Sketches + Uniform & Stratified Samples Higher  resoluOon  for  more  recent   Ome  ranges   1. Streaming CMS (Count-Min-Sketch) [t1,  t2)                          [t2,  t3)                            [t3,  t4)                        [t4,  now)   Time 4T   2T   T   ≤T   ....   Maintain  a  small  sample  at  each  CMS  cell   2. Top-K Queries w/ Arbitrary Filters Tradi2onal  CMS                            CMS+Samples   3. Fully Distributed Stratified Samples Always  include  Omestamp  as  a  straOfied  column   for  streams   Streams   Aging  Row  Store  (In-­‐memory)   Column  Store  (Disk)   timestamp
  • 29.       High-Level Accuracy Guarantees 29   1 0 1 1 0 0 2 1 2 0 0 1 2 0 0 0 1 1 0 1 0 2 0 2 Quality  cer2fied   Approx  Answers   Query  Engine   HAC   Bias  Es2mate   Variance  Es2mate   STREAMS   Aging   SNAPPY  STORE   Stra2fied  Samples   Stra2fied  Samples   Interac2ve  Query   Con2nuous  Query   Pipelined   bootstrapped   operator   Row  store  Memory   Column  Store  Disk  
  • 30.       30   Deep  Fusion   w/  Spark  Extreme   Speed   Synopsis   Data   Engine   Deep  Fusion  with  Spark   Elas;c,  highly  available  in-­‐memory  store  for  OLTP  fused  with   Spark’s  memory  manager  and  the  Catalyst/Tungsten  engine.     The  store  itself  is  exposed  as  na;ve  Spark  data  frames. Extreme  Speed  thru  CPU  code  gen,  vectorizaCon   Extend  Spark’s  Tungsten  engine  with  beher  code  genera;on,   coloca;on  schemes,  .. Use  Sta;s;cal  techniques  to  reduce  data  by  100-­‐1000x   Answer  queries  in  frac;on  of  ;me  and  resources   Synopses  Data  Engine   What is unique
  • 31.       Cloud Ready 31  
  • 32. Dealing with Credit Card Fraud SnappyData  Cluster   Credit  Card   transacOon   stream   User  History   PredicOon   Model   Streaming  ApplicaOon   ……….   Black     Listed     Cards   Data Lake No;fica;on  to   owner   No;fica;on  to   merchant  
  • 33. SnappyData  Cluster   Customers   Approaching     Limit   Plan     Info   CDR  Stream   Schedule  callback   through  call  center   Streaming  ApplicaOon  Immediate  SMS     to  customer   Data Lake Preventing Bill Shock, Real Time Upgrades The  system  detects  approaching  usage   limits,  no;fies  users  and  gives  them     a  chance  to  buy  a  one  ;me  upgrade  or   a  new  plan,  increasing  loyalty  &  revenue  
  • 34. www.snappydata.io   Stream  IngesOon   Reference   Data   •  Stream  analyOcs   •  Insider  detecOon   •  Apply  Rules   •  Detect  Market   ManipulaOon   Alert  &  NoOfy  Downstream   Systems   Trigger  InvesOgaOons   Spark  Streaming   SQL  Querying   Con;nuous  Queries   Par;;oned  Stream   Inges;on   Summaries  &  Alerts   Messaging   Machine  Learning   Market Surveillance For Market Makers
  • 35. Connected Car Real Time Data Flow SnappyData  Cluster   Kava     Receiver   Vehicle  Time     Series  Data   Vehicle   History   Driver     History   Streaming  ApplicaOon   HDFS,  HBase   Raw  Data  Store   Custom   Summary   Dashboard   No;fica;on  to   owner   ……….   System     KPIs     Asset     Metadata  
  • 36. Offline Analysis REAL TIME MATCHING ENGINE MATCHING   ENGINE   Customer   History   NoOficaOon   Sub-­‐system   !   Historical  Customer   Profiles   User  by     Geo  locaOon   PERSONALIZED   CAMPAIGNS  TO   USERS                Ingest  Stream   REAL     TIME   OFFERS       from   Merchants   Real Time Marketing Campaigns A  stream  matching  engine  that  uses  customer   history,  their  current  loca;on  and  relevant  offers  to   Effec;vely  target  users  creates  differen;a;on  &  generates  revenue  
  • 37. www.snappydata.io   000’s data points/sec Emergency Shutdown Tuning & Optimization, Monitor & Control Continuous Real-time Analysis Maintenance Billing Sensor Analytics
  • 38. Message  Bus   Stream  IngesOon   Reference   Data   ETL   •  OLAP  and  Low   Latency   Querying  in  SQL     •  Machine   Learning  in  Spark   RFQs/Trades/Quotes streams Analytic Dashboards SnappyData RFQ Analytics
  • 39.       Ad Analytics 39   1.5-­‐2x        faster ingestion, faster trx 7-­‐142×    faster analytics (at 300M records)
  • 40.       Data Synopsis Engine 40  
  • 41.       TPCH 41   Avg  Latency     SnappyData     MemSQL     Spark   5.7s   100 GB 12.0s   66.9s  
  • 42. THANK  YOU  !   Try  it  out:  hAp://snappydata.io/download   Resources:  hAp://www.snappydata.io/ resources